Ch5 Exam Prep Intermediate Statistical Investigations Test - Intermediate Statistical Investigations 1st Ed - Exam Bank by Nathan Tintle. DOCX document preview.
Chapter 5
Intermediate Statistical Investigations Test Bank
Question types: FIB = Fill in the blank Calc = Calculation
Ma = Matching MS = Multiple select
MC = Multiple choice TF = True-false
CHAPTER 5 TERMINAL LEARNING OUTCOMES
TLO5-1: Visualize and interpret relationships among three or more quantitative variables using response surfaces.
TLO5-2: Visualize adjusted associations when adjusting for a quantitative variable, and explore potential problems when using explanatory variables that are linearly related.
TLO5-3: Fit a polynomial model for nonlinear associations, and assess the appropriateness of the model.
TLO5-4: Assess transformations of the response and or explanatory variable(s) to meet model conditions of linearity, symmetry, and equal variation.
Section 5.1: Experiments with Multiple Quantitative Explanatory Variables
LO5.1-1: Consider design issues with quantitative explanatory variables.
LO5.1-2: Visualize relationships among three or more quantitative variables.
LO5.1-3: Interpret a “response surface” with quantitative explanatory variables.
LO5.1-4: Describe interactions between quantitative variables.
Questions 1 through 3: A computer science student is using an evolutionary algorithm to solve the “traveling salesman problem,” which aims to find the shortest route for visiting a list of cities and returning to the original city. A number of parameters must be set in evolutionary algorithms, including tournament size and mutation rate, and the student wants to find the ideal set of algorithm parameters for this task.
The student uses a balanced, full-factorial design with three values of each parameter, tournament sizes of 8, 16, and 32, and mutation rates of 1%, 5%, and 10%. The response variable is the distance of the best (shortest) route that the algorithm finds in a fixed number of iterations. The student used a linear regression model to predict best distance:
- True or False: The student’s model assumes that the relationship between tournament size and best distance is the same, regardless of the mutation rate.
- Do the graphs below indicate a violation of the validity conditions for a multiple linear regression model?
- Yes, the linearity condition is severely violated.
- Yes, the independence condition is severely violated.
- Yes, the equal variance condition is severely violated.
- No, these plots do not reveal any severe violations of the validity conditions.
- Based on this model, what algorithmic parameters should the student choose? Remember, the goal is to find the shortest route.
He should use a tournament size that is relatively ______ (large/small) and a mutation rate that is relatively ______ (large/small).
More research may be necessary before setting parameters outside the explanatory variable region since that would be considered ___________ (extrapolation/interpolation).
Questions 4 through 6: A balanced, full-factorial design was used to investigate the impact of water temperature (25°C, 35°C) and salinity (10%, 25%, 40%) on the weight gain of shrimp (in mg). A 3D scatterplot of the data is shown below. The plane shows the predicted weight gains based on a linear regression model with no interaction.
- Using the 3D scatterplot, describe the relationships among the variables.
After adjusting for salinity, there is a _________ (positive/negative) association between temperature and weight gain.
After adjusting for temperature, there is a _________ (positive/negative) association between salinity and weight gain.
- Does this model provide accurate predictions?
For the shrimp raised in water that was 25°C with 10% salinity, the residuals are all _________ (positive/negative), which suggests that the predicted value based on a linear regression model tends to be too _________ (high/low).
- Based on the scatterplot matrix below, are you concerned about the linearity condition being met?
- Yes, because there is no linear relationship between temperature and salinity. The slope of the regression line would be 0.
- Yes, because the spread of the weight gain values is larger for low temperatures than for high temperatures.
- Yes, because changing from 10% to 25% salinity leads to an increase in weight gain, but changing from 25% to 40% salinity leads to a decrease in weight gain.
- No, this scatterplot matrix does not reveal any issues with the linearity condition.
Questions 7 through 11: Two drugs, labeled Drug A and Drug B, are often prescribed together as a treatment for a certain disease. In a randomized experiment, 120 patients were prescribed one of three dosages of Drug A (10, 20, or 30 mg) and one of four dosages of Drug B (200, 400, 600, or 800 mg). The response variable is a quantitative indicator of disease activity (DAI). The partially filled in table of coefficients is given below.
Term | Coefficient | SE | t-stat. | p-value |
Intercept | 100.398 | 5.395 | ||
Drug A dosage | -0.969 | 0.250 | ||
Drug B dosage | -0.097 | 0.010 | ||
Drug A dosage Drug B dosage | <0.0001 |
- Fill in the degrees of freedom for this linear regression model.
Source | DF |
Model | |
Error | |
Total |
- Describe the interaction shown in the graph below.
In the explanatory variable region of this study, higher dosages of Drug B are associated with _________ (higher/lower) DAI values; however, the slope relating DAI to Drug B dosage is _________ (flatter/steeper) when the Drug A dosage is high.
In the table above, the missing coefficient for the interaction would be ____________ (positive/negative).
- True or False: In terms of predicted DAI, the choice of dosage for Drug A is more important than the choice of dosage for Drug B.
- True or False: If the explanatory variables are standardized, the p-value corresponding to the interaction may change.
- The partially-filled in table of coefficients corresponds to a model where the explanatory variables have been standardized.
Term | Coefficient | SE | t-stat. | p-value |
Intercept | 50.59 | 0.83 | ||
Std. Drug A dosage | -0.41 | 0.84 | ||
Std. Drug B dosage | -13.67 | 0.84 | ||
Std. Drug A dosage Std. Drug B dosage |
Interpret the intercept and the slope for Drug B.
The predicted DAI for a person who receives _______ mg of Drug A and _______ mg of Drug B is 50.59.
As the dosage of Drug B increases by 1 _______ (mg / SD), DAI is predicted to _______ (increase/decrease) by _______, for a person receiving the average dosage of Drug A.
- Match each term to its definition. One of the definitions will not be used.
Full-factorial design: A. Each treatment has the same number of observations.
B. Each factor has the same number of levels.
Balanced design: C. Each level of a factor is combined with each level of another factor.
- Which of the following statements is true? Select all that apply.
- Balanced designs always eliminate the association between two explanatory variables.
- Full-factorial designs always eliminate the association between two explanatory variables.
- Balanced, full-factorial designs always eliminate the association between two explanatory variables.
- It is possible to have an association between the explanatory variables in any of the designs mentioned above.
- True or False: In a balanced, full-factorial design, the adjusted slope coefficients in the two-predictor model are identical to the unadjusted slope coefficients in the one-predictor models.
- How could you represent the graph of a regression model that uses two quantitative explanatory variables and an interaction term to predict a quantitative response?
- A line
- A plane
- A surface (which may be curved)
- A sphere
- If you standardize the explanatory variables in a multiple linear regression model with no interaction, which numerical values do you expect to change?
- The intercept and slopes
- The p-values corresponding to the intercept and slopes
- The percentage of variability explained by the model
- The standard error of the residuals
- Which of the following are advantages of using standardized explanatory variables? Select all that apply.
- The intercept is more likely to have a meaningful interpretation.
- It is easier to compare the impact of explanatory variables with different scales.
- The validity conditions are more likely to be satisfied.
- The interaction product variable will be independent of the explanatory variables involved in the interaction.
Section 5.2: Observational Studies with Multiple Quantitative Explanatory Variables
LO5.2-1: Visualize adjusted associations when adjusting for a quantitative variable.
- Create and interpret added variable plots.
- Interpret model coefficients.
- Interpret adjusted sums of squares.
LO5.2-2: Explore potential problems when using explanatory variables that are linearly related.
Questions 1 and 2: Suppose you want to predict a student’s final grade in a course (scale of 0-100) based on their pre-test score (test on a scale of 0-100 taken before the semester began) and the number of optional study sessions they attended (0-15). In the sample, all three pairs of variables have positive correlations.
- In a simple regression model, the unadjusted slope relating final exam score to number of study sessions attended is a positive value, . If you added pre-test scores to the model, would you expect the slope corresponding to study sessions to change?
The adjusted slope for study sessions in the multiple regression model is expected to be _________ (larger than/smaller than/equal to) the unadjusted slope for study sessions in the simple regression model.
- In a simple regression model that uses number of study sessions as the only predictor of final exam scores, the value is 30%. In a simple regression model that uses pre-test scores as the only predictor of final exam scores, the value is 40%. In the multiple regression model that uses both number of study sessions and pre-test scores as predictors, how large would you expect the value to be?
- You would expect to be 30% or lower.
- You would expect to be between 30% and 40%.
- You would expect to be between 40% and 70%.
- You cannot calculate for a multiple regression model.
Questions 3 through 7: One commonly cited metric in car commercials is time to accelerate from 0 to 60 mph. What car features might explain faster acceleration (shorter times)? The dataset includes weight (in pounds), horsepower, and acceleration (in seconds) for 392 different types of cars.
Suppose you fit a multiple regression model to predict acceleration. The equation and scatterplot are given below.
- Interpret the slope coefficient corresponding to weight.
As _________ (acceleration/weight) increases by 1 unit, _________ (acceleration/weight) is predicted to increase by 0.0023 units, _________ (ignoring/adjusting for) horsepower.
- If horsepower were removed from the model, how would the slope coefficient corresponding to weight change?
- The slope would be larger than 0.0023.
- The slope would be smaller than 0.0023, but it would still positive.
- The slope would be negative.
- The slope would not change.
- The value for the multiple regression model is 0.602. If horsepower were removed from the model, how would the value change?
- The value would be larger than 0.602.
- The value would be smaller than 0.602, but it would still be positive.
- The value would be negative.
- value would not change.
- Does the graph below indicate any problems with model fit?
- Yes, because the graph shows a curved pattern.
- No, because the pattern allows you to predict the residuals and reduce error.
- Yes, because most of the predicted values are higher than 15 (not symmetric).
- No, because roughly half of the residuals are below 0 (symmetric).
- Is there an interaction between weight and horsepower in this sample?
- Yes, because heavier cars tend to have higher horsepower and faster acceleration times.
- Yes, because the relationship between weight and acceleration differs slightly based on the level of horsepower.
- Yes, because if horsepower were removed from the model, the slope relating acceleration to weight would change substantially.
- No, because none of the regression lines cross within the explanatory variable region.
Questions 8 and 9: Suppose you want to use age and experience (both measured in years) to predict salary (in dollars) at a small company.
- The ANOVA table is shown below. You’ll notice that SSModel is much larger than the sum of SSExperience and SSAge. Further, the overall model is statistically significant, but neither of the individual predictors is statistically significant. Which of the following is the best explanation for these features of the ANOVA table?
Source | df | SS | MS | F | p-value |
Model | 2 | 715.11 | 357.55 | 11.89 | 0.0056 |
Error | 7 | 210.49 | 30.07 | ||
Total | 9 | 925.60 |
- There is substantial covariation between experience and age.
- There is substantial interaction between experience and age.
- There is another confounding variable not being accounted for by the model.
- The only reasonable explanation is a calculation error.
- The scatterplot below is the added variable plot predicting salary from age, before and after adjusting for experience. Note that the graph shows two regression lines. Which is which?
The ______ (gray/purple) line is the original regression line relating salary to age, and the ______ (gray/purple) line is the regression line relating salary to age after adjusting for experience.
Questions 10 through 13: Consider a model for predicting the weight of fish (in grams) based on their length and width. The length and width were originally measured in centimeters, but the variables have been standardized. The table of coefficients is given below.
Term | Estimate | SE | t-stat. | p-value |
Intercept | 297.01 | 8.95 | 33.19 | <0.0001 |
Std. Length | 203.52 | 27.42 | 7.42 | <0.0001 |
Std. Width | 107.55 | 27.92 | 3.85 | 0.0003 |
Std. Length Std. Width | 88.99 | 7.01 | 12.69 | <0.0001 |
- Is the interaction between standardized length and standardized width significant?
- Yes, because the p-value for Std. Length Std. Width is small.
- No, because the slope for Std. Length Std. Width is small relative to the other slope and intercept coefficients.
- There is not enough information to answer, because the residual standard error is not given.
- There is not enough information to answer, because no information is given about the fit of the model that excludes the interaction term.
- Interpret the coefficient for Std. Length.
As length increases by 1 _______ (cm/SD), _________ (weight//width) is predicted to increase by 203.52 units, for a fish of average ___________ (weight/length/width).
- True or False: In this scenario, interpretation of the intercept involves extrapolation.
- Suppose the explanatory variables had not been standardized. Which of the following would you expect to change? Select all that apply.
- The slope coefficients for the two explanatory variables, length and width.
- The p-values for the two explanatory variables, length and width.
- The coefficient for the length width interaction.
- The p-value for the length width interaction.
- Using data from an observational study, a regression model predicts a response variable based on two explanatory variables, and . Which of the following is true?
- Adjusting for will not change the strength of the association between and .
- After adjusting for , the association between and will be stronger.
- After adjusting for , the association between and will be weaker.
- After adjusting for , the association between and may be stronger or weaker.
- Describe how a large degree of covariation among explanatory variables can impact the model.
When explanatory variables (including interactions) have a strong linear association with other explanatory variables, this can lead to __________ (larger/smaller) standard errors on the slope coefficients and a __________ (larger/smaller) residual standard error for the model overall.
- True or False: If two explanatory variables are associated with each other, using standardized versions of those variables will reduce the linear association between them.
- True or False: Using standardized variables in an interaction will reduce the linear association between the interaction term and the variables involved in the interaction.
Section 5.3: Modeling Nonlinear Associations Part I – Polynomial models
LO5.3-1: Fit a polynomial model to model a nonlinear association.
LO5.3-2: Assess when a polynomial model is appropriate.
Questions 1 through 4: Consider three different models that could be used to predict the weight of fish (in grams) based only on fish length (in cm).
Model 1:
Model 2:
Model 3:
- A line showing predicted values from Model 1 has been added to the scatterplot below.
This model tends to ________ (overestimate/underestimate) weight for the longest and shortest fish in the dataset, but it tends to ________ (overestimate/underestimate) weight for fish of average length.
- A smoother has been added to the scatterplot below. What is the purpose of adding a smoother?
- Smoothers are helpful for exploring the form of the association.
- Smoothers provide a simple mathematical model for making predictions.
- Smoothers serve both of the purposes mentioned above.
- Smothers serve neither of the purposes mentioned above.
- The tables below describe Models 2 and 3.
Model 2 | Term | Coeff. | SE | t-stat. | p-value |
Intercept | 128.35 | 78.78 | 1.63 | 0.1092 | |
Length | -21.02 | 5.42 | -3.88 | 0.0003 | |
Length^2 | 0.9086 | 0.0869 | 10.46 | <0.0001 |
Model 3 | Term | Coeff. | SE | t-stat. | p-value |
Intercept | 304.98 | 160.82 | 1.90 | 0.0635 | |
Length | -43.40 | 18.59 | -2.33 | 0.0235 | |
Length^2 | 1.7699 | 0.6903 | 2.56 | 0.0133 | |
Length^3 | -0.0101 | 0.0081 | -1.26 | 0.2141 |
Using significance of model terms as your criterion, which model would you use choose?
- Model 1
- Model 2
- Model 3
- There is not enough information to choose, since the table for Model 1 is not given.
- Based on the values given below, which model would you choose?
Model 1: 0.9207
Model 2: 0.9741
Model 3: 0.9749
- Model 1, because you should always choose the model with the lowest .
- Model 3, because you should always choose the model with the highest .
- Model 2, because the accuracy of predictions is very similar to Model 3, but the model is simpler and easier to interpret.
- Model 3, because the accuracy of predictions is very similar to Model 2, but the model is more flexible and allows for more “turns” in the data.
Questions 5 through 7: A quadratic model was used to predict height of children (in inches) based on age (in months). The table of coefficients is shown below.
Term | Coeff. | SE | t-stat. | p-value |
Intercept | 7.6557 | 4.8070 | 1.59 | 0.1129 |
Age | 0.5372 | 0.0622 | 8.63 | <0.0001 |
Age^2 | -0.0012 | 0.0002 | -6.27 | <0.0001 |
- The children in the dataset are all between 99 and 221 months of age. Consider whether it is reasonable to extrapolate based on this model.
- This model assumes that height increases at a constant rate for all ages, so extrapolation will lead to unreasonable estimates.
- This model assumes that height increases up to a point and then starts decreasing, so extrapolation will lead to unreasonable estimates.
- This model assumes that height increases with age but that growth slows down until height “levels off” in adulthood, so extrapolation is reasonable.
- This model assumes that height takes three “turns” – increasing then decreasing then increasing again – so extrapolation will lead to unreasonable estimates.
- Does the table above suggest that the quadratic model is significantly better than a linear model of degree 1 in this context?
- The coefficient for the Age^2 term is very small, which suggests that the quadratic term is unnecessary. A linear model would be preferred.
- The t-statistic for the Age term is larger in absolute value than the t-statistic for the Age^2 term which suggests that the linear model is better than the quadratic model.
- The p-value for the Age^2 term is very small, which suggests that the quadratic model is significantly better than the linear model.
- The p-value for the Age and Age^2 terms are both very small, which suggests that both the linear model and the quadratic model provide accurate predictions.
- The coefficient for Age^2 is very small, so rounding decisions can have a big impact. What should you do to address this issue?
- Remove the Age^2 term. The coefficient is so small that the term is not significant.
- Use standardized versions of the variable, Std. Age and (Std. Age)^2.
- Control for confounding variables that are “masking” the effect of Age^2.
- Add an interaction term to model the changing slope between Age and Height.
- Suppose you are choosing between a linear model and a quadratic model based on the residual plots below.
Based on the residual plots, the __________ (linear/quadratic) model is preferred. However, __________ (independence/constant variance/normality) is still a concern in the chosen model.
- The scatterplot below shows the relationship between two quantitative variables, and , with a smoother added to the plot.
This relationship is best modeled by a polynomial of degree _____ (1, 2, or 3), also known as a ___________ (quadratic/cubic) model.
- What do you call models that include terms to help model nonlinear behavior in the scatterplot?
- Additive models
- Exponential models
- Multiplicative models
- Polynomial models
- True or False: A polynomial model can be considered a type of linear model, because it is of the form where , , and so on.
- Which of the following are useful steps for comparing two polynomial models to decide which is most appropriate? Select all that apply.
- Fit the more complex model and find the p-value for the highest order term.
- Fit the simpler model and find the p-value for the highest order term.
- Compare the residual plots for the two models to assess model validity.
- Compare the values to see if the more complex model greatly improves the accuracy of predictions.
- Which of the following are advantages of standardizing the explanatory variable in a quadratic model? Select all that apply.
- It ensures that there is no association between and .
- It ensures that the association between and is nonlinear.
- It put explanatory variables on a more reasonable scale, so rounding is less impactful.
- It puts explanatory variables on a more reasonable scale, so the coefficient for is larger and more likely to be statistically significant.
- True or False: Extrapolation is less of a concern for polynomial models than it is for linear models of degree 1.
- True or False: When comparing two polynomial models, you should always choose the model with the higher value.
Section 5.4: Modeling Nonlinear Associations Part II - Transformations
LO5.4-1: Transform the response variable to meet model conditions.
LO5.4-2: Assess different model transformations.
Questions 1 through 4: You want to create a model that predicts a country’s income (GDP/capita in dollars) based on the country’s fertility rate (babies per woman). The prediction equations and residual plots for two models are shown below.
Model 1:
Model 2:
- Using Model 1, which of the validity conditions are violated? Select all that apply.
- The linearity condition is violated.
- The equal variance condition is violated.
- The normality condition is violated.
- None of the validity conditions are violated.
- Are the predicted values based on Model 1 accurate?
For countries with the highest fertility rates (say, 6 or more babies/woman), Model 1 tends to __________ (overestimate/underestimate/accurately predict) the country’s income.
- Using Model 2, predict the income of a country whose fertility rate is 6 babies per woman.
Note that log() denotes the natural log. Round to the nearest dollar.
Sol:
- Using Model 2, the 95% confidence interval for the slope is (-0.7555, -0.5981). Which of the following interpretations is correct? Note that log() denotes the natural log.
- As fertility rate increases by 1 unit, predicted income decreases by 0.5981 to 0.7555 units.
- As fertility rate increases by 1 unit, predicted income decreases by 0.1218 to 0.2232 units.
- As fertility rate increases by 1 unit, predicted income is multiplied by 1.8187 to 2.1287.
- As fertility rate increases by 1 unit, predicted income is multiplied by 0.4698 to 0.5499.
Questions 5 through 7: Suppose we want to use an animal’s body weight (in kg) to predict its brain weight (in kg). Consider three different models:
Model 1:
Model 2:
Model 3:
- You want to assess the fit of these three models. Is it appropriate to compare the models based on the standard error of the residuals?
- Yes, we can use the standard error of the residuals to compare models. The model with the highest standard error of the residuals is preferred.
- Yes, we can use the standard error of the residuals to compare models. The model with the lowest standard error of the residuals is preferred.
- No, it is not appropriate to compare models based on the standard error of the residuals, because the response variable has been transformed in Models 2 and 3.
- No, it is not appropriate to compare models based on the standard error of the residuals, because the explanatory variable has been transformed in Model 3.
- Using Model 3, predict the brain weight of an animal that weighs 45 kg. Note that log() denotes the natural log.
Model 3:
Sol:
- In the scatterplot below, the dotted line marks the 95% prediction interval for individual animals in this population based on Model 3. Is it appropriate to interpret this prediction interval?
- Yes, it is appropriate to interpret this interval as long as you back-transform the predictions using natural logarithms: (.
- Yes, it is appropriate to interpret this interval as long as you back-transform the predictions using exponents: (.
- No, it is not appropriate to interpret this interval, because the response variable has been transformed.
- No, it is not appropriate to interpret this interval, because the explanatory variable has been transformed.
Questions 8 and 9: You want to build a model that describes how the population of the United States (in millions) has changed between 1790 and 2000. The residual plots for two different models are shown below.
- Do these models provide accurate predictions?
If Model 1 were used to predict the U.S. population in 2020, we would expect it to _________ (overestimate/underestimate) the true population.
If Model 2 were used to predict the U.S. population in 2020, we would expect it to _________ (overestimate/underestimate) the true population.
- The best transformation is often found through systematic trial and error. Having tried these two models, which transformation would you try next?
- Consider a petri dish that contains bacteria. The growth of the bacteria over time can be modeled using an exponential relationship of the following form:
True or False: Taking the natural logarithm of the response variable will lead to a linear relationship between the transformed variable ln(bacteria) and time.
- How can data transformations be used to address the violations of validity conditions?
- You can transform any (or all) of the explanatory variables and/or the response variable.
- You can transform any (or all) of the explanatory variables, but you should leave the response variable on the original scale.
- You can transform the response variable, but you should leave the explanatory variable(s) on the original scale.
- You can transform the residuals, but you should leave the explanatory and response variables on the original scale.
- You want to compare the fit of two regression models. In one of the models, the response variable has been transformed. Is it appropriate to compare the two models using ?
- Yes, we can always use to compare the performance of two models. The model with the higher value is preferred.
- Yes, we can always use to compare the performance of two models. The model with the lower value is preferred.
- Yes, we can use to compare the performance of two models as long as the explanatory variables have not been transformed.
- No, we can only use to compare the performance of two models with the same response variable.
- What is the purpose of using data transformations?
- Transformations can create a linear association between variables.
- Transformations can reduce skewness in conditional distributions of residuals.
- Transformations can adjust for measurement errors in the data collection process.
- Transformations can create more similar conditional variances of residuals.
- How do you decide which variable(s) to transform?
If linearity is the only issue, it is often better to transform the __________ (explanatory/ response) variable(s); if multiple validity conditions are violated, it is often better to transform the __________ (explanatory/ response) variable(s).
- Based on the scatterplot below, what kind of linearizing transformation would you try?
You should try a transformation that ____________ (increases/decreases) the power of the response variable or a transformation that ____________ (increases/decreases) the power of the explanatory variable.
Document Information
Connected Book
Intermediate Statistical Investigations 1st Ed - Exam Bank
By Nathan Tintle
Explore recommendations drawn directly from what you're reading
Chapter 3 Intermediate Statistical Investigations Test Bank
DOCX Ch. 3
Chapter 4 Intermediate Statistical Investigations Test Bank
DOCX Ch. 4
Chapter 5 Intermediate Statistical Investigations Test Bank
DOCX Ch. 5 Current
Chapter 6 Intermediate Statistical Investigations Test Bank
DOCX Ch. 6
Preliminaries Test Bank 1e
DOCX Ch. All in one