Ch6 Test Bank Intermediate Statistical Investigations Test - Intermediate Statistical Investigations 1st Ed - Exam Bank by Nathan Tintle. DOCX document preview.
Chapter 6
Intermediate Statistical Investigations Test Bank
Question types: FIB = Fill in the blank Calc = Calculation
Ma = Matching MS = Multiple select
MC = Multiple choice TF = True-false
CHAPTER 6 TERMINAL LEARNING OUTCOMES
TLO6-1: Review descriptive and inferential methods for comparing groups with a categorical response variable, including comparing and contrasting different statistics for evaluating group differences.
TLO6-2: Motivate and utilize a logistic regression model using a categorical or quantitative explanatory variable.
TLO6-3: Utilize a logistic regression model using multiple categorical and/or quantitative explanatory variables.
Section 6.1: Comparing Proportions
LO6.1-1: Review descriptive and inferential methods for comparing groups with a categorical response variable.
LO6.1-2: Compare and contrast different statistics for evaluating group differences on a binary response variable.
Questions 1 through 4: An experiment was conducted with beagles ages 7-11 to determine whether diet and exercise can help keep old dogs mentally sharp. 20 dogs were randomly assigned to two groups: 12 were assigned to the treatment group, which was fed a special diet and given opportunities for extra exercise and social play; 8 were assigned to the control group, which received standard care and diet. After six weeks, all dogs attempted a new task. All 12 dogs in the treatment group were able to solve the task, but only 2 of the 8 dogs in the control group could do so. You plan to use a simulation-based test to decide if the difference between the groups is statistically significant.
- Which of the following statistics could be used to summarize the sample? Select all that apply.
- Difference in means
- Relative risk
- Chi-squared statistic
- F-statistic
- A simulation was conducted by repeatedly shuffling cards, dealing them into groups of size 12 and 8, and calculating a summary statistic. Describe the cards that would be used in this physical simulation.
- 20 cards, each marked with the ID number of the dog attempting to solve the task
- 20 cards, each marked with the time it took the dog to solve the task
- 12 blue cards to represent the treatment group, 8 green cards to represent the control group
- 14 blue cards to represent the dogs who solved the task, 6 green cards to represent the dogs who failed to solve the task
- The histogram below shows 10,000 simulated differences: .
Based on the histogram of shuffled differences, which of the following p-values is reasonable?
- p-value < 0.001
- p-value = 0.019
- p-value = 0.211
- p-value = 0.458
- Is it reasonable to use a Normal distribution to calculate a p-value in this scenario?
- Yes, because a sample size of 20 is large enough, as long as the data are reasonably symmetric.
- No, because the number of successes and failures in each group is not large enough.
- Yes, because the dogs were randomly assigned to treatment groups.
- No, because the dogs were not randomly selected from the population.
Questions 5 through 9: An observational study was conducted to explore the relationship between development of myopia (nearsightedness) and the use of night-lights with infants. For each of the 404 children in the study, the researchers recorded whether they slept in darkness or with a night-light, and whether the child was nearsighted.
Darkness | Night-light | Total | |
Nearsighted | 18 | 78 | 96 |
Not Nearsighted | 154 | 154 | 308 |
Total | 172 | 232 | 404 |
- Based on the segmented bar graph below, is there an association between use of night-lights and nearsightedness in this sample?
- Yes, because in both groups, less than half of the children were nearsighted.
- No, because in both groups, less than half of the children were nearsighted.
- Yes, because the proportion of children who are nearsighted is lower in the darkness group than in the night-light group.
- No, because the proportion of children who are nearsighted is lower in the darkness group than in the night-light group.
- Calculate the relative risk of nearsightedness for the night-light group compared to the darkness group. Keep three decimal places in your answer.
Sol:
- Using symbols, state the null and alternative hypotheses for testing the association between use of night-lights and nearsightedness.
- Which theory-based inference method(s) could be used to test the association between use of night-lights and nearsightedness? Select all that apply.
- Two proportion z-test
- Two sample t-test
- Chi-square test
- ANOVA F-test
- The p-value for testing the association between use of night-lights and nearsightedness is less than 0.001. Which of the following conclusions is appropriate? You may assume that the sample is representative of some population. Select all that apply.
- This study provides strong evidence to suggest that sleeping with a night-light causes a higher risk of nearsightedness for infants.
- This study provides strong evidence to suggest that there is an association between night-lights and nearsightedness in the population.
- This study suggests that the difference in the rate of nearsightedness observed in this study would be unlikely to occur by chance alone.
- This study suggests that the difference in the rate of nearsightedness observed in this study would be likely to occur by chance alone.
Questions 10 and 11: 48 cocaine abusers were randomly assigned to receive one of two treatments for six weeks: 24 subjects received the antidepressant drug desipramine hydrochloride and 24 subjects received lithium carbonate (the usual treatment). In the desipramine group, 10 had a relapse during the six-week period, compared to 18 in the lithium group.
- Calculate the odds ratio to compare the odds of relapse for someone treated with lithium compared to the odds of relapse for someone treated with desipramine.
Sol:
- Suppose that the sample sizes were doubled but the proportion who relapsed stayed the same in each group.
The odds ratio would _______ (increase/decrease/stay the same), and the chi-square statistic would _______ (increase/decrease/stay the same), because the odds ratio measures the strength of the _________ (association/evidence), and the chi-square statistic measures the strength of the _________ (association/evidence).
Questions 12 and 13: Suppose we want to test whether there is an association between gender and whether or not a person supports a particular bill currently being debated in Congress. Two possible contingency tables are shown below.
Table 1 | Table 2 | |||||||
Support Bill | Support Bill | |||||||
Gender | Yes | No | Total | Gender | Yes | No | Total | |
Male | 31 | 19 | 50 | Male | 42 | 8 | 50 | |
Female | 29 | 21 | 50 | Female | 18 | 32 | 50 | |
Total | 60 | 40 | 100 | Total | 60 | 40 | 100 |
- If we assume that gender and support for the bill are independent, the expected count of males who support the bill is _____. The expected counts are the same for the two tables.
- Which of the tables would result in a larger chi-squared statistic? Hint: You should be able to answer without calculating the chi-square statistic.
- Table 1
- Table 2
- Both tables would yield the same chi-squared statistic.
- We do not have enough information to determine which chi-squared statistic would be larger.
- Researchers interviewed 159 patients with a type of cancer called mesothelioma. They also interviewed 159 controls who were similar to the mesothelioma patients in terms of age, sex, lifestyle, etc. but did not have the disease. Both groups were asked whether they had been exposed to asbestos in their home or workplace. To describe the association between asbestos and mesothelioma in this case-control study, the researchers should use the ______________ (relative risk/odds ratio).
- Describe how relative risk measures strength of association.
Relative risk values that are far from ________ (0, 1) indicate a ________ (strong/weak) association. However, the numerical value depends on which proportion is in the numerator; a relative risk of 2 indicates the same strength as a relative risk of ______ (-2, 0, 0.5).
If you change which category is considered a “success,” the relative risk value ________ (changes/stays the same).
- If the probability of success is the same for two treatments, we expect the odds ratio to have a value of _____.
- Which of the following is a benefit of using an odds ratio rather than a relative risk? Select all that apply.
- The odds ratio does not change based on which proportion is in the numerator.
- The odds ratio does not change based on which category is considered a “success.”
- The odds ratio does not change based on which variable is treated as the “response”.
- The odds ratio is easier to understand, because it can be directly interpreted in terms of “likelihood” or “chance of success.”
- True or False: The simulated two-sided p-value will be the same regardless of whether you use a difference of proportion, relative risk, odds ratio, z-statistic, or chi-squared statistic to summarize the data.
Section 6.2: Introduction to Logistic Regression
LO6.2-1: Explain the motivation and need for logistic regression.
LO6.2-2: Utilize a logistic regression model using categorical or quantitative explanatory variables.
Questions 1 through 5: The manager of a baseball team used a logistic regression model to predict the likelihood of winning a game based on the number of runs scored by the opponent. Note: log() in the formula below represents the natural log.
Predicted log-odds of winning
- Calculate the odds of winning when the opponent scores 5 runs.
Sol:
- Calculate the probability of winning when the opponent scores 5 runs.
Sol:
- In this dataset, the largest number of runs scored by an opponent was 11.
True or False: This model should not be used to extrapolate when the number of runs scored by an opponent is very large, because the predicted probability of winning may be less than 0.
- Interpret the slope: As the number of runs scored by the opponent increases by 1,
- the predicted odds of winning decrease by 0.598.
- the predicted odds of winning are multiplied by -0.598.
- the predicted odds of winning decrease by .
- the predicted odds of winning are multiplied by .
- The p-value for the slope in this logistic regression model is 0.0004. Interpret the p-value.
The data provides ________ (strong/weak) evidence that there is ___________ (no association/an association) between runs scored by the opponent and winning.
Term | Coeff. | SE | Chi-square | p-value | 95% CI |
Intercept | -1.9924 | 0.1946 | (1.6109, 2.3740) | ||
SubstanceB | 0.5030 | 0.2540 | (0.0050, 1.0009) |
Questions 6 through 10: An experiment is done to test the effect of two toxic substances on insects. 250 insects are exposed to each substance and the number of insects that die is counted. Logistic regression is used to describe the connection between type of substance and the likelihood of death. SubstanceB is an indicator variable (1=Substance B, 0=Substance A). A partially filled in table of coefficients is given below.
- Calculate the predicted odds of dying for an insect exposed to Substance A.
Sol:
- Interpret the slope.
- The probability of dying when exposed to Substance B is 0.5030 higher than the probability of dying when exposed to Substance A.
- The odds of dying when exposed to Substance B are 0.5030 times as high as the odds of dying when exposed to Substance A.
- The probability of dying when exposed to Substance B is 1.6537 higher than the probability of dying when exposed to Substance A.
- The odds of dying when exposed to Substance B are 1.6537 times as high as the odds of dying when exposed to Substance A.
- At the significance level, is the association between death and type of substance statistically significant?
- Yes, the association is significant, because the confidence interval for the slope does not include 0.
- No, the association is not significant, because the confidence interval for the slope does not include 0.
- Yes, the association is significant, because the confidence interval for the slope includes 1.
- No, the association is not significant, because the confidence interval for the slope includes 1.
- Calculate a 95% confidence interval for the population odds ratio.
The odds of death when exposed to Substance B are between _____ and ______ times as high as the odds of death when exposed to Substance A.
- Suppose the indicator variable for substance had been defined differently:
Calculate the intercept and slope for the logistic regression model.
Sol:
- Use answer choices (A) – (C) to fill in the blanks. Choices may be used more than once or not at all.
In a chi-squared test, the response variable ____. A. Must be categorical
In a chi-squared test, the explanatory variable ____. B. Must be quantitative
C. May be categorical or quant- itative
In logistic regression, the response variable ____.
In logistic regression, the explanatory variable ____.
- Which of the following models is used in logistic regression?
- Based on logistic regression, which of the following formulas could be used to predict the probability of success for a particular x value?
- True or False: When using a logistic regression model, the predicted probability of success is guaranteed to fall between 0 and 1.
- True or False: Given the confidence interval for the slope coefficient in logistic regression, taking the natural log of the endpoints gives you a confidence interval for the population odds ratio.
Section 6.3: Multiple Logistic Regression Models
LO6.3-1: Utilize a logistic regression model using multiple categorical and/or quantitative explanatory variables.
Questions 1 through 7: In schools that employ “tracking,” students are assigned to classes based on their perceived ability/achievement with separate classes for high- and low-performing students. Some schools are experimenting with “detracking,” putting more heterogeneous groups of students in class together.
Burris et al. (2008) explored the effect of detracking (1 = detracked classes, 0 = tracked classed) on the likelihood of a student receiving an international baccalaureate (IB) diploma. They controlled for aptitude (a standardized quantitative variable) and whether or not the student was in special education (1 = yes, 0 = no). The researchers considered several models, finally choosing the model below.
- Initially, the researchers included a Detracking Aptitude interaction term in the model, but they removed the term, because it was not statistically significant. Explain what the assumption of “no interaction” means in this context.
- The likelihood of a student studying in detracked classes is the same, regardless of student aptitude level.
- The likelihood of a student studying in detracked classes changes based on student aptitude level.
- The odds ratio comparing the odds of earning an IB diploma for students in tracked and detracked classes is the same, regardless of student aptitude level.
- The odds ratio comparing the odds of earning an IB diploma for students in tracked and detracked classes changes based on student aptitude level.
- Estimate the probability that a student with an aptitude score of 1.5 who is not in special education and studies in detracked classes will receive an IB diploma.
Sol:
- Interpret the slope corresponding to the Detracking variable.
The __________ (chances/odds) of earning an IB diploma for students in detracked classes are _______ times as high as the __________ (chances/odds) for someone in tracked classes, after adjusting for aptitude and special education.
- Interpret the slope corresponding to the Aptitude variable.
As aptitude increases by 1 ______ (unit/SD), the __________ (chances/odds) of earning an IB diploma are multiplied by _______, after adjusting for special education and detracking.
- The standard error for the Detracking coefficient was 0.158. Calculate the chi-square statistic for this variable.
Sol:
- What if the researchers decided to use a simple logistic regression model with only Detracking as a predictor?
- The coefficient and p-value for the Detracking variable could both change.
- The coefficient for the Detracking variable could change, but the p-value would stay the same.
- The p-value for the Detracking variable could change, but the coefficient would stay the same.
- The coefficient and p-value for the Detracking variable would both stay the same.
- These data were collected at a single school before and after they detracked all their classes. Does this study support causal conclusions about the relationship between detracking and earning IB diplomas at this school?
- Yes, because the analysis adjusted for possible confounders, aptitude and special education.
- Yes, because students were assigned to either tracked or detracked classes. They were not allowed to choose for themselves.
- No, because students were not randomly selected. All students in the study attended the same school.
- No, because other changes may have occurred at this school at the same time as detracking.
Questions 8 through 11: A multiple logistic regression model can be used to predict which passengers survive the sinking of the Titanic based on sex (1=female, 0=male) and age (in years).
- Interpret the odds ratio for Sex.
- The predicted odds of survival for female passengers were 2.4660 times as high as the predicted odds of survival for male passengers of the same age.
- Overall, the predicted odds of survival for female passengers were 2.4660 times as high as the predicted odds of survival for male passengers.
- The predicted odds of survival for female passengers were times as high as the predicted odds of survival for male passengers of the same age.
- Overall, the predicted odds of survival for female passengers were times as high as the predicted odds of survival for male passengers.
- True or False: Based on the logistic regression model above, a change of one year has the same impact on the predicted probability of survival at any point on the Age scale.
- The parallel boxplots below shows the relationship between Age and Sex for passengers on the Titanic.
In the model shown above, the Age term is not statistically significant. If Age were removed from the logistic regression model, how would you expect the coefficient for Sex to change?
- The association between Sex and Age is weak, so the coefficient for Sex would change dramatically.
- The association between Sex and Age is weak, so the coefficient for Sex would not change dramatically.
- The association between Sex and Age is strong, so the coefficient for Sex would change dramatically.
- The association between Sex and Age is strong, so the coefficient for Sex would not change dramatically.
- The multiple regression model above does not allow for interaction. When an AgeSex interaction term was added to the model, the term had a very small p-value. How do you interpret the p-value?
- There is strong evidence that the relationship between Age and Survival is the same, regardless of Sex.
- There is strong evidence that the relationship between Age and Survival changes depending on Sex.
- There is weak evidence that the relationship between Age and Survival is the same, regardless of Sex.
- There is weak evidence that the relationship between Age and Survival changes depending on Sex.
- Suppose researchers want to use a multiple logistic regression model to estimate the probability that a person has a gym membership. What kind of explanatory variables may be used in this model?
- All explanatory variables must be categorical.
- All explanatory variables must be quantitative.
- Explanatory variables may be categorical and/or quantitative.
- Regardless of which explanatory variables are chosen, a multiple logistic regression model is not appropriate in this scenario.
- A multiple logistic regression model was used to predict whether individuals would be admitted into medical school. The prediction table for the model is shown below.
Predicted to be Admitted | Predicted to be Rejected | Total | |
Actually Admitted | 80 | 22 | 102 |
Actually Rejected | 70 | 826 | 896 |
Total | 150 | 848 | 998 |
Calculate the correct classification rate.
Sol:
- In logistic regression, what numerical summary is used to measure the unexplained variability?
- Coefficient of covariation
- Deviance statistic
- Standard error of the residuals
- Sum of squared errors
- Suppose that two explanatory variables – one categorical and one quantitative – are being used to predict a categorical outcome. Which of the following conditions is necessary for Simpson’s paradox to occur?
- A substantial association between the two explanatory variables
- A very weak association between the two explanatory variables
- A substantial interaction between the two explanatory variables
- A very weak interaction between the two explanatory variables
Document Information
Connected Book
Intermediate Statistical Investigations 1st Ed - Exam Bank
By Nathan Tintle