Ch5 Exam Prep Intermediate Statistical Investigations Test - Intermediate Statistical Investigations 1st Ed - Exam Bank by Nathan Tintle. DOCX document preview.

Ch5 Exam Prep Intermediate Statistical Investigations Test

Chapter 5

Intermediate Statistical Investigations Test Bank

Question types: FIB = Fill in the blank Calc = Calculation

Ma = Matching MS = Multiple select

MC = Multiple choice TF = True-false

CHAPTER 5 TERMINAL LEARNING OUTCOMES

TLO5-1: Visualize and interpret relationships among three or more quantitative variables using response surfaces.

TLO5-2: Visualize adjusted associations when adjusting for a quantitative variable, and explore potential problems when using explanatory variables that are linearly related.

TLO5-3: Fit a polynomial model for nonlinear associations, and assess the appropriateness of the model.

TLO5-4: Assess transformations of the response and or explanatory variable(s) to meet model conditions of linearity, symmetry, and equal variation.

Section 5.1: Experiments with Multiple Quantitative Explanatory Variables

LO5.1-1: Consider design issues with quantitative explanatory variables.

LO5.1-2: Visualize relationships among three or more quantitative variables.

LO5.1-3: Interpret a “response surface” with quantitative explanatory variables.

LO5.1-4: Describe interactions between quantitative variables.

Questions 1 through 3: A computer science student is using an evolutionary algorithm to solve the “traveling salesman problem,” which aims to find the shortest route for visiting a list of cities and returning to the original city. A number of parameters must be set in evolutionary algorithms, including tournament size and mutation rate, and the student wants to find the ideal set of algorithm parameters for this task.

The student uses a balanced, full-factorial design with three values of each parameter, tournament sizes of 8, 16, and 32, and mutation rates of 1%, 5%, and 10%. The response variable is the distance of the best (shortest) route that the algorithm finds in a fixed number of iterations. The student used a linear regression model to predict best distance:

  1. True or False: The student’s model assumes that the relationship between tournament size and best distance is the same, regardless of the mutation rate.
  2. Do the graphs below indicate a violation of the validity conditions for a multiple linear regression model?

A dotplot describes the residual plots for predicted values. The dotplot has the horizontal axis labeled Predicted Value and has markings 250 to 375 in increments of 25. The vertical axis is labeled Residual and has markings from negative 80 to 80 in increments of 40. For predicted value 255, the dots are closely plotted from negative 50 to 65 and 1 dot above 80 on the vertical axis. For predicted value 270, a series of dots are plotted from negative 45 to 50 and 1 dot above 55, 65, and 100. For predicted value 275, a series of dots are plotted from negative 45 to 40 and 1 dot above 50, 60, and 70. For predicted value 295, a series of dots are plotted from negative 70 to 40 and 1 dot above 50, 55, and 80. For predicted value 310, a series of dots are plotted from negative 70 to 60 and 1 dot above negative 75 and negative 85. For predicted value 315, a series of dots are plotted from negative 60 to 45 and 1 dot above negative 75, 55, and 90. For predicted value 345, a series of dots are plotted from negative 70 to 40 and 1 dot above negative 85, 50, 55, 65, 80, and 95. For predicted value 360, a series of dots are plotted from negative 60 to 60 and 1 dot above 95. For predicted value 365, a series of dots are plotted from negative 70 to 90. A horizontal line starts from 0 on the vertical axis and extends toward the right passing through the dots. All values are approximate.  A histogram describes the residual plot. The horizontal axis is labeled Residual and has markings from negative 120 to 120 in increments of 30. The distribution of the bars is approximately normal, it starts from negative 90 and ends at 105. The longest bar is at 0 on the horizontal axis and the bars progressively decrease in height to the left of negative 15 and to the right of 0. There are no bars at negative 120 and 120. All values are approximate.

    1. Yes, the linearity condition is severely violated.
    2. Yes, the independence condition is severely violated.
    3. Yes, the equal variance condition is severely violated.
    4. No, these plots do not reveal any severe violations of the validity conditions.
  1. Based on this model, what algorithmic parameters should the student choose? Remember, the goal is to find the shortest route.

He should use a tournament size that is relatively ______ (large/small) and a mutation rate that is relatively ______ (large/small).

More research may be necessary before setting parameters outside the explanatory variable region since that would be considered ___________ (extrapolation/interpolation).

Questions 4 through 6: A balanced, full-factorial design was used to investigate the impact of water temperature (25°C, 35°C) and salinity (10%, 25%, 40%) on the weight gain of shrimp (in mg). A 3D scatterplot of the data is shown below. The plane shows the predicted weight gains based on a linear regression model with no interaction.

A three-dimensional scatterplot plots the relationship between temperature, salinity, and weight gain. The horizontal axis is labeled Temperature and has markings from 24 to 36 in increments of 2. Another horizontal axis is labeled Salinity and has markings from 10 to 35 in increments of 5. The vertical axis is labeled Weight gain and ranges from 0 to 600 in increments of 100. A horizontal dotted plane is plotted at 230 on the vertical axis. For temperature of 25 and salinity of 10, the dots are plotted above 50, 75, and 90 on the vertical axis. For temperature of 35 and salinity of 10, the dots are plotted above 300, 310, 350, 360, and 430 on the vertical axis. For temperature of 25 and salinity of 25, the dots are plotted above 340, 500, 510, 600, and 650 on the vertical axis. For temperature of 35 and salinity of 25, the dots are plotted above 380, 400, 440, 460, and 490 on the vertical axis. For temperature of 25 and salinity of 40, the dots are plotted above 500, 510, 520, 560, 640, and 650 on the vertical axis. For temperature of 35 and salinity of 40, the dots are plotted above 450, 470, 490, 510, 530, and 550 on the vertical axis. All values are approximate.

  1. Using the 3D scatterplot, describe the relationships among the variables.

After adjusting for salinity, there is a _________ (positive/negative) association between temperature and weight gain.

After adjusting for temperature, there is a _________ (positive/negative) association between salinity and weight gain.

  1. Does this model provide accurate predictions?

For the shrimp raised in water that was 25°C with 10% salinity, the residuals are all _________ (positive/negative), which suggests that the predicted value based on a linear regression model tends to be too _________ (high/low).

  1. Based on the scatterplot matrix below, are you concerned about the linearity condition being met?

"Two side by side dotplots describe the relationship between temperature, salinity, and weight gain. In the first dotplot, two dot plots are placed one above the other. The horizontal axis is labeled Temperature and has markings from 20 to 35 in increments of 5. The vertical axis has two markings, salinity and weight gain, in the order from top to bottom. The vertical axis labeled weight gain, ranges from 0 to 500 in increments of 100 and the vertical axis labeled salinity, ranges from 0 to 40 in increments of 10. For weight gain, temperature 25, the dots are plotted as follows: 1 dot above 50, 70, 90, 210, 240, 260, 290, 370, 400, 480, and 550 on the vertical axis. For weight gain, temperature 35, the dots are plotted as follows: 1 dot above 180, 200, 220, 240, 260, 280, 300, 310, 330, 360, and 440 on the vertical axis. There are no dots above 20, and 30. For salinity, temperature 25, the dots are plotted as follows: 1 dot above 10, 25, and 40 on the vertical axis. For salinity, temperature 35, the dots are plotted as follows: 1 dot above 10, 25, and 40 on the vertical axis. There are no dots above 20, and 30. All values are approximate.
To its right, is the second dotplot, the horizontal axis is labeled salinity and ranges from 0 to 40 in increments of 10. The vertical axis is labeled weight gain and ranges from 0 to 500 in increments of 100. For salinity 10, the dots are plotted as follows: 1 dot above 50, 70, 90, 300, 320, 350, 360, and 440 on the vertical axis. For salinity 25, the dots are plotted as follows: 1 dot above 200, 240, 260, 300, 310, 330, 360, 380, 480, and 550 on the vertical axis. For salinity 40, the dots are plotted as follows: 1 dot above 180, 200, 220, 240, 260, 280, 390, and 400 on the vertical axis. There are no dots above 0, 20, and 30. All values are approximate."

    1. Yes, because there is no linear relationship between temperature and salinity. The slope of the regression line would be 0.
    2. Yes, because the spread of the weight gain values is larger for low temperatures than for high temperatures.
    3. Yes, because changing from 10% to 25% salinity leads to an increase in weight gain, but changing from 25% to 40% salinity leads to a decrease in weight gain.
    4. No, this scatterplot matrix does not reveal any issues with the linearity condition.

Questions 7 through 11: Two drugs, labeled Drug A and Drug B, are often prescribed together as a treatment for a certain disease. In a randomized experiment, 120 patients were prescribed one of three dosages of Drug A (10, 20, or 30 mg) and one of four dosages of Drug B (200, 400, 600, or 800 mg). The response variable is a quantitative indicator of disease activity (DAI). The partially filled in table of coefficients is given below.

Term

Coefficient

SE

t-stat.

p-value

Intercept

100.398

5.395

Drug A dosage

-0.969

0.250

Drug B dosage

-0.097

0.010

Drug A dosage Drug B dosage

<0.0001

  1. Fill in the degrees of freedom for this linear regression model.

Source

DF

Model

Error

Total

  1. Describe the interaction shown in the graph below.

"A scatterplot describes the relationship between drug B dosage and D A I. The horizontal axis is labeled Drug B Dosage and has markings from 200 to 800 in increments of 200. The vertical axis is labeled D A I and has markings from 30 to 90 in increments of 20. A red line and red dots denote drug A dosage 10, a green line and green dots denote drug A dosage 20, and a blue line and blue dots denote drug A dosage 30. 
The red line starts from (200, 75), slopes downward and ends at (800, 27) such that some of the red dots lie above the line, and some of the red dots lie below the line. The red dots are plotted above 200, 400, 600, and 800 on the horizontal axis and from 15 to 87 on the vertical axis. All values are approximate.
The green line starts from (200, 70), slopes downward and ends at (800, 32) such that some of the green dots lie above the line, and some of the green dots lie below the line. The green dots are plotted above 200, 400, 600, and 800 on the horizontal axis and from 20 to 84 on the vertical axis. All values are approximate.
The blue line starts from (200, 63), slopes downward and ends at (800, 38) such that some of the blue dots lie above the line, and some of the blue dots lie below the line. The blue dots are plotted above 200, 400, 600, and 800 on the horizontal axis and from 18 to 80 on the vertical axis. All values are approximate."

In the explanatory variable region of this study, higher dosages of Drug B are associated with _________ (higher/lower) DAI values; however, the slope relating DAI to Drug B dosage is _________ (flatter/steeper) when the Drug A dosage is high.

In the table above, the missing coefficient for the interaction would be ____________ (positive/negative).

  1. True or False: In terms of predicted DAI, the choice of dosage for Drug A is more important than the choice of dosage for Drug B.
  2. True or False: If the explanatory variables are standardized, the p-value corresponding to the interaction may change.
  3. The partially-filled in table of coefficients corresponds to a model where the explanatory variables have been standardized.

Term

Coefficient

SE

t-stat.

p-value

Intercept

50.59

0.83

Std. Drug A dosage

-0.41

0.84

Std. Drug B dosage

-13.67

0.84

Std. Drug A dosage Std. Drug B dosage

Interpret the intercept and the slope for Drug B.

The predicted DAI for a person who receives _______ mg of Drug A and _______ mg of Drug B is 50.59.

As the dosage of Drug B increases by 1 _______ (mg / SD), DAI is predicted to _______ (increase/decrease) by _______, for a person receiving the average dosage of Drug A.

  1. Match each term to its definition. One of the definitions will not be used.

Full-factorial design: A. Each treatment has the same number of observations.

B. Each factor has the same number of levels.

Balanced design: C. Each level of a factor is combined with each level of another factor.

  1. Which of the following statements is true? Select all that apply.
    1. Balanced designs always eliminate the association between two explanatory variables.
    2. Full-factorial designs always eliminate the association between two explanatory variables.
    3. Balanced, full-factorial designs always eliminate the association between two explanatory variables.
    4. It is possible to have an association between the explanatory variables in any of the designs mentioned above.
  2. True or False: In a balanced, full-factorial design, the adjusted slope coefficients in the two-predictor model are identical to the unadjusted slope coefficients in the one-predictor models.
  3. How could you represent the graph of a regression model that uses two quantitative explanatory variables and an interaction term to predict a quantitative response?
    1. A line
    2. A plane
    3. A surface (which may be curved)
    4. A sphere
  4. If you standardize the explanatory variables in a multiple linear regression model with no interaction, which numerical values do you expect to change?
    1. The intercept and slopes
    2. The p-values corresponding to the intercept and slopes
    3. The percentage of variability explained by the model
    4. The standard error of the residuals
  5. Which of the following are advantages of using standardized explanatory variables? Select all that apply.
    1. The intercept is more likely to have a meaningful interpretation.
    2. It is easier to compare the impact of explanatory variables with different scales.
    3. The validity conditions are more likely to be satisfied.
    4. The interaction product variable will be independent of the explanatory variables involved in the interaction.

Section 5.2: Observational Studies with Multiple Quantitative Explanatory Variables

LO5.2-1: Visualize adjusted associations when adjusting for a quantitative variable.

  • Create and interpret added variable plots.
  • Interpret model coefficients.
  • Interpret adjusted sums of squares.

LO5.2-2: Explore potential problems when using explanatory variables that are linearly related.

Questions 1 and 2: Suppose you want to predict a student’s final grade in a course (scale of 0-100) based on their pre-test score (test on a scale of 0-100 taken before the semester began) and the number of optional study sessions they attended (0-15). In the sample, all three pairs of variables have positive correlations.

  1. In a simple regression model, the unadjusted slope relating final exam score to number of study sessions attended is a positive value, . If you added pre-test scores to the model, would you expect the slope corresponding to study sessions to change?

The adjusted slope for study sessions in the multiple regression model is expected to be _________ (larger than/smaller than/equal to) the unadjusted slope for study sessions in the simple regression model.

  1. In a simple regression model that uses number of study sessions as the only predictor of final exam scores, the value is 30%. In a simple regression model that uses pre-test scores as the only predictor of final exam scores, the value is 40%. In the multiple regression model that uses both number of study sessions and pre-test scores as predictors, how large would you expect the value to be?
    1. You would expect to be 30% or lower.
    2. You would expect to be between 30% and 40%.
    3. You would expect to be between 40% and 70%.
    4. You cannot calculate for a multiple regression model.

Questions 3 through 7: One commonly cited metric in car commercials is time to accelerate from 0 to 60 mph. What car features might explain faster acceleration (shorter times)? The dataset includes weight (in pounds), horsepower, and acceleration (in seconds) for 392 different types of cars.

Suppose you fit a multiple regression model to predict acceleration. The equation and scatterplot are given below.

"A scatterplot describes the relationship between weight and acceleration. The horizontal axis is labeled Weight and has markings from 1500 to 5500 in increments of 500. The vertical axis is labeled Acceleration and has markings from 8 to 24 in increments of 4. A blue line denotes horse power of 46 to 74, a red line denotes horse power of 74 to 89, a green line denotes horse power of 89 to 102, a purple line denotes horse power of 102 to 142, and a brown line denotes horse power of 142 to 230. 
The blue line starts from (1500, 16), increases toward right and ends at (3500, 22) such that some of the blue dots lie above the line, some of the blue dots lie below the line, and few blue dots lie on the line. The blue dots are plotted from 1550 to 3450 on the horizontal axis and from 12 to 25 on the vertical axis. All values are approximate.
The red line starts from (1750, 14) , increases toward right and ends at (3620, 19) such that some of the red dots lie above the line, some of the red dots lie below the line, and few red dots lie on the line. The red dots are plotted from 1800 to 3600 on the horizontal axis and from 11 to 22 on the vertical axis. All values are approximate.
The green line starts from (2000, 14) , increases toward right and ends at (3750, 18) such that some of the green dots lie above the line, some of the green dots lie below the line, and few green dots lie on the line. The green dots are plotted from 2050 to 3700 on the horizontal axis and from 12 to 22 on the vertical axis. All values are approximate.
The purple line starts from (2120, 13) , increases toward right and ends at(4750, 17) such that some of the purple dots lie above the line, some of the purple dots lie below the line, and few purple dots lie on the line. The purple dots are plotted from 2200 to 4700 on the horizontal axis and from 10 to 21 on the vertical axis. All values are approximate.
The brown line starts from (3000, 11) , increases toward right and ends at(5250, 13.5) such that some of the brown dots lie above the line, some of the brown dots lie below the line, and few brown dots lie on the line. The brown dots are plotted from 3050 to 5150 on the horizontal axis and from 8 to 18 on the vertical axis. All values are approximate."

  1. Interpret the slope coefficient corresponding to weight.

As _________ (acceleration/weight) increases by 1 unit, _________ (acceleration/weight) is predicted to increase by 0.0023 units, _________ (ignoring/adjusting for) horsepower.

  1. If horsepower were removed from the model, how would the slope coefficient corresponding to weight change?
    1. The slope would be larger than 0.0023.
    2. The slope would be smaller than 0.0023, but it would still positive.
    3. The slope would be negative.
    4. The slope would not change.
  2. The value for the multiple regression model is 0.602. If horsepower were removed from the model, how would the value change?
  3. The value would be larger than 0.602.
  4. The value would be smaller than 0.602, but it would still be positive.
  5. The value would be negative.
  6. value would not change.
  7. Does the graph below indicate any problems with model fit?

A scatterplot describes the residual plots for predicted values. The horizontal axis is labeled Predicted and has markings from 5 to 20 in increments of 5. The vertical axis is labeled Residual and has markings from negative 4 to 8 in increments of 4. Dots are randomly scattered throughout the graph. A regression horizontal line starts from 0 on the vertical axis and extends toward the right passing through the dots. The dots are plotted such that some of the dots lie above the line, some of the dots lie below the line, and few dots lie on the line. The dots are plotted from 4 to 20 on the horizontal axis and from negative 4 to 7 on the vertical axis. The concentration of dots is more between 12.5 and 18 on the horizontal axis and between negative 3 and 3 on the vertical axis. All values are approximate.

    1. Yes, because the graph shows a curved pattern.
    2. No, because the pattern allows you to predict the residuals and reduce error.
    3. Yes, because most of the predicted values are higher than 15 (not symmetric).
    4. No, because roughly half of the residuals are below 0 (symmetric).
  1. Is there an interaction between weight and horsepower in this sample?
    1. Yes, because heavier cars tend to have higher horsepower and faster acceleration times.
    2. Yes, because the relationship between weight and acceleration differs slightly based on the level of horsepower.
    3. Yes, because if horsepower were removed from the model, the slope relating acceleration to weight would change substantially.
    4. No, because none of the regression lines cross within the explanatory variable region.

Questions 8 and 9: Suppose you want to use age and experience (both measured in years) to predict salary (in dollars) at a small company.

  1. The ANOVA table is shown below. You’ll notice that SSModel is much larger than the sum of SSExperience and SSAge. Further, the overall model is statistically significant, but neither of the individual predictors is statistically significant. Which of the following is the best explanation for these features of the ANOVA table?

Source

df

SS

MS

F

p-value

Model
  Experience
  Age

2
1
1

715.11
43.41
0.01

357.55
43.41
0.01

11.89
1.44
0.00

0.0056
0.2686
0.9865

Error

7

210.49

30.07

Total

9

925.60

    1. There is substantial covariation between experience and age.
    2. There is substantial interaction between experience and age.
    3. There is another confounding variable not being accounted for by the model.
    4. The only reasonable explanation is a calculation error.
  1. The scatterplot below is the added variable plot predicting salary from age, before and after adjusting for experience. Note that the graph shows two regression lines. Which is which?

A scatterplot plots the experience-adjusted age against experience-adjusted salary. The horizontal axis is labeled Experience-adjusted Age and has markings from 30 to 50 in increments of 10. The vertical axis is labeled Experience-adjusted Salary and has markings from 35 to 60 in increments of 5. A color scale labeled experience ranges from dark red to blue, in which red denotes 10, yellow denotes 20, and blue denotes 30. The dots with mentioned color in experience are randomly scattered throughout the graph, and are plotted from 38 to 45 on the horizontal axis and from 42 to 59 on the vertical axis. A black line starts from (26, 36), increases toward right and ends at (55, 64). A purple line starts at 52 on the vertical axis and extends toward the right through the dots. All values are approximate.

The ______ (gray/purple) line is the original regression line relating salary to age, and the ______ (gray/purple) line is the regression line relating salary to age after adjusting for experience.

Questions 10 through 13: Consider a model for predicting the weight of fish (in grams) based on their length and width. The length and width were originally measured in centimeters, but the variables have been standardized. The table of coefficients is given below.

Term

Estimate

SE

t-stat.

p-value

Intercept

297.01

8.95

33.19

<0.0001

Std. Length

203.52

27.42

7.42

<0.0001

Std. Width

107.55

27.92

3.85

0.0003

Std. Length Std. Width

88.99

7.01

12.69

<0.0001

  1. Is the interaction between standardized length and standardized width significant?
    1. Yes, because the p-value for Std. Length Std. Width is small.
    2. No, because the slope for Std. Length Std. Width is small relative to the other slope and intercept coefficients.
    3. There is not enough information to answer, because the residual standard error is not given.
    4. There is not enough information to answer, because no information is given about the fit of the model that excludes the interaction term.
  2. Interpret the coefficient for Std. Length.

As length increases by 1 _______ (cm/SD), _________ (weight//width) is predicted to increase by 203.52 units, for a fish of average ___________ (weight/length/width).

  1. True or False: In this scenario, interpretation of the intercept involves extrapolation.
  2. Suppose the explanatory variables had not been standardized. Which of the following would you expect to change? Select all that apply.
    1. The slope coefficients for the two explanatory variables, length and width.
    2. The p-values for the two explanatory variables, length and width.
    3. The coefficient for the length width interaction.
    4. The p-value for the length width interaction.
  3. Using data from an observational study, a regression model predicts a response variable based on two explanatory variables, and . Which of the following is true?
    1. Adjusting for will not change the strength of the association between and .
    2. After adjusting for , the association between and will be stronger.
    3. After adjusting for , the association between and will be weaker.
    4. After adjusting for , the association between and may be stronger or weaker.
  4. Describe how a large degree of covariation among explanatory variables can impact the model.

When explanatory variables (including interactions) have a strong linear association with other explanatory variables, this can lead to __________ (larger/smaller) standard errors on the slope coefficients and a __________ (larger/smaller) residual standard error for the model overall.

  1. True or False: If two explanatory variables are associated with each other, using standardized versions of those variables will reduce the linear association between them.
  2. True or False: Using standardized variables in an interaction will reduce the linear association between the interaction term and the variables involved in the interaction.

Section 5.3: Modeling Nonlinear Associations Part I – Polynomial models

LO5.3-1: Fit a polynomial model to model a nonlinear association.

LO5.3-2: Assess when a polynomial model is appropriate.

Questions 1 through 4: Consider three different models that could be used to predict the weight of fish (in grams) based only on fish length (in cm).

Model 1:

Model 2:

Model 3:

  1. A line showing predicted values from Model 1 has been added to the scatterplot below.

A scatterplot plots length against weight. The horizontal axis is labeled Length and has markings from 5 to 50 in increments of 5. The vertical axis is labeled Weight and has markings from negative 500 to 1000 in increments of 500. Dots are plotted in an increasing trend from left to right in the graph. A regression line starts from (7, negative 400), increases to the right such that some of the dots lie above the line, some of the dots lie below the line, and a few dots lie on the line, and ends at (48, 1000). The dots are plotted from 9 to 46 on the horizontal axis and from 0 to 1100 on the vertical axis. An outlier is plotted at (8, 0). All values are approximate.

This model tends to ________ (overestimate/underestimate) weight for the longest and shortest fish in the dataset, but it tends to ________ (overestimate/underestimate) weight for fish of average length.

  1. A smoother has been added to the scatterplot below. What is the purpose of adding a smoother?

A scatterplot plots length against weight. The horizontal axis is labeled Length and has markings from 5 to 50 in increments of 5. The vertical axis is labeled Weight and ranges from 0 to 1200 in increments of 200. Dots are plotted in an increasing trend from left to right in the graph. A regression curve starts from (8, 0) increases to the right such that some of the dots lie above the line, some of the dots lie below the line, and a few dots lie on the line, and ends at (46, 1100). The dots are plotted from 8 to 46 on the horizontal axis and from 0 to 1100 on the vertical axis. All values are approximate.

    1. Smoothers are helpful for exploring the form of the association.
    2. Smoothers provide a simple mathematical model for making predictions.
    3. Smoothers serve both of the purposes mentioned above.
    4. Smothers serve neither of the purposes mentioned above.
  1. The tables below describe Models 2 and 3.

Model 2

Term

Coeff.

SE

t-stat.

p-value

Intercept

128.35

78.78

1.63

0.1092

Length

-21.02

5.42

-3.88

0.0003

Length^2

0.9086

0.0869

10.46

<0.0001

Model 3

Term

Coeff.

SE

t-stat.

p-value

Intercept

304.98

160.82

1.90

0.0635

Length

-43.40

18.59

-2.33

0.0235

Length^2

1.7699

0.6903

2.56

0.0133

Length^3

-0.0101

0.0081

-1.26

0.2141

Using significance of model terms as your criterion, which model would you use choose?

    1. Model 1
    2. Model 2
    3. Model 3
    4. There is not enough information to choose, since the table for Model 1 is not given.
  1. Based on the values given below, which model would you choose?

Model 1: 0.9207

Model 2: 0.9741

Model 3: 0.9749

    1. Model 1, because you should always choose the model with the lowest .
    2. Model 3, because you should always choose the model with the highest .
    3. Model 2, because the accuracy of predictions is very similar to Model 3, but the model is simpler and easier to interpret.
    4. Model 3, because the accuracy of predictions is very similar to Model 2, but the model is more flexible and allows for more “turns” in the data.

Questions 5 through 7: A quadratic model was used to predict height of children (in inches) based on age (in months). The table of coefficients is shown below.

Term

Coeff.

SE

t-stat.

p-value

Intercept

7.6557

4.8070

1.59

0.1129

Age

0.5372

0.0622

8.63

<0.0001

Age^2

-0.0012

0.0002

-6.27

<0.0001

  1. The children in the dataset are all between 99 and 221 months of age. Consider whether it is reasonable to extrapolate based on this model.
    1. This model assumes that height increases at a constant rate for all ages, so extrapolation will lead to unreasonable estimates.
    2. This model assumes that height increases up to a point and then starts decreasing, so extrapolation will lead to unreasonable estimates.
    3. This model assumes that height increases with age but that growth slows down until height “levels off” in adulthood, so extrapolation is reasonable.
    4. This model assumes that height takes three “turns” – increasing then decreasing then increasing again – so extrapolation will lead to unreasonable estimates.
  2. Does the table above suggest that the quadratic model is significantly better than a linear model of degree 1 in this context?
    1. The coefficient for the Age^2 term is very small, which suggests that the quadratic term is unnecessary. A linear model would be preferred.
    2. The t-statistic for the Age term is larger in absolute value than the t-statistic for the Age^2 term which suggests that the linear model is better than the quadratic model.
    3. The p-value for the Age^2 term is very small, which suggests that the quadratic model is significantly better than the linear model.
    4. The p-value for the Age and Age^2 terms are both very small, which suggests that both the linear model and the quadratic model provide accurate predictions.
  3. The coefficient for Age^2 is very small, so rounding decisions can have a big impact. What should you do to address this issue?
    1. Remove the Age^2 term. The coefficient is so small that the term is not significant.
    2. Use standardized versions of the variable, Std. Age and (Std. Age)^2.
    3. Control for confounding variables that are “masking” the effect of Age^2.
    4. Add an interaction term to model the changing slope between Age and Height.
  4. Suppose you are choosing between a linear model and a quadratic model based on the residual plots below.

"Two side by side scatterplots describe the residual plots for predicted values. The first scatterplot is titled, Linear Model. The horizontal axis is labeled Predicted and has markings from negative 500 to 1000 in increments of 500. The vertical axis is labeled Residual and has markings from negative 200 to 400 in increments of 200. A horizontal line starts at 0 on the horizontal axis and extends toward the right such that some of the dots lie above the line, some of the dots lie below the line, and few dots lie on the line. The dots are plotted from negative 300 to 1000 on the horizontal axis and from negative 150 to 350 on the vertical axis. The concentration of dots is more between negative 250 and 1000 on the horizontal axis and between negative 150 and 200 on the vertical axis. An outlier is plotted at (negative 320, 350). All values are approximate.
The second scatterplot is titled, Quadratic Model. The horizontal axis is labeled Predicted and ranges from 0 to 1000 in increments of 500. The vertical axis is labeled Residual and has markings from negative 200 to 200 in increments of 100. A horizontal line starts at 0 on the horizontal axis and extends toward the right such that some of the dots lie above the line, some of the dots lie below the line, and few dots lie on the line. The dots are plotted from 0 to 1100 on the horizontal axis and from negative 180 to 200 on the vertical axis. The concentration of dots is more between 0 and 400 on the horizontal axis and between negative 100 and 50 on the vertical axis. All values are approximate."

Based on the residual plots, the __________ (linear/quadratic) model is preferred. However, __________ (independence/constant variance/normality) is still a concern in the chosen model.

  1. The scatterplot below shows the relationship between two quantitative variables, and , with a smoother added to the plot.

A scatterplot plots the relationship between a set of data. The horizontal axis is labeled x and has markings from negative 5.0 to 5.0 in increments of 2.5. The vertical axis is labeled y and has markings from negative 200 to 400 in increments of 200. Dots are plotted in a decreasing trend from left to right in the graph. A regression curve starts from (negative 5, 400) decreases to the right, slightly increases and decreases again, such that some of the dots lie above the curve, some of the dots lie below the curve, and a few dots lie on the curve, and ends at (5, negative 200). The dots are plotted from negative 5 to 5 on the horizontal axis and from negative 350 to 500 on the vertical axis. All values are approximate.

This relationship is best modeled by a polynomial of degree _____ (1, 2, or 3), also known as a ___________ (quadratic/cubic) model.

  1. What do you call models that include terms to help model nonlinear behavior in the scatterplot?
    1. Additive models
    2. Exponential models
    3. Multiplicative models
    4. Polynomial models
  2. True or False: A polynomial model can be considered a type of linear model, because it is of the form where , , and so on.
  3. Which of the following are useful steps for comparing two polynomial models to decide which is most appropriate? Select all that apply.
    1. Fit the more complex model and find the p-value for the highest order term.
    2. Fit the simpler model and find the p-value for the highest order term.
    3. Compare the residual plots for the two models to assess model validity.
    4. Compare the values to see if the more complex model greatly improves the accuracy of predictions.
  4. Which of the following are advantages of standardizing the explanatory variable in a quadratic model? Select all that apply.
    1. It ensures that there is no association between and .
    2. It ensures that the association between and is nonlinear.
    3. It put explanatory variables on a more reasonable scale, so rounding is less impactful.
    4. It puts explanatory variables on a more reasonable scale, so the coefficient for is larger and more likely to be statistically significant.
  5. True or False: Extrapolation is less of a concern for polynomial models than it is for linear models of degree 1.
  6. True or False: When comparing two polynomial models, you should always choose the model with the higher value.

Section 5.4: Modeling Nonlinear Associations Part II - Transformations

LO5.4-1: Transform the response variable to meet model conditions.

LO5.4-2: Assess different model transformations.

Questions 1 through 4: You want to create a model that predicts a country’s income (GDP/capita in dollars) based on the country’s fertility rate (babies per woman). The prediction equations and residual plots for two models are shown below.

Model 1:

A scatterplot describes the residual plots for predicted values. The horizontal axis is labeled Predicted and has markings from negative 20000 to 40000 in increments of 20000. The vertical axis is labeled Residual and has markings from negative 20000 to 100000 in increments of 20000. Dots are randomly scattered in the bottom half and right half of the graph. A regression horizontal line starts from 0 on the vertical axis and extends toward the right passing through the dots, such that some of the dots lie above the line, some of the dots lie below the line, and few dots lie on the line. The dots are plotted from negative 18000 to 32000 on the horizontal axis and from negative 20000 to 90000 on the vertical axis. The concentration of dots is more between negative 7000 and 30000 on the horizontal axis and between negative 20000 and 10000 on the vertical axis. All values are approximate. A histogram describes the residual plot. The horizontal axis is labeled Residuals and has markings from negative 20000 to 100000 in increments of 20000. The distribution of the bars is approximately right-skewed, it starts from negative 30000 and ends at 100000. The longest bar is at 0 on the horizontal axis and the bars decrease progressively in height to the left of negative 10000 and to the right of 0. There are no bars at 80000 and 90000.

Model 2:

A scatterplot describes the residual plots for predicted values. The horizontal axis is labeled Predicted and has markings from 6 to 11 in increments of 1. The vertical axis is labeled Residuals and has markings from negative 3 to 2 in increments of 1. Dots are randomly scattered throughout the graph. A regression horizontal line starts from 0 on the vertical axis and extends toward the right passing through the dots, such that some of the dots lie above the line, some of the dots lie below the line, and few dots lie on the line. The dots are plotted from 6 to 10.5 on the horizontal axis and from negative 2.5 to 2.2 on the vertical axis. The concentration of dots is more between 9 and 10.5 on the horizontal axis and between negative 1 and 1.5 on the vertical axis. The dots are widely spread between 6 and 9 on the horizontal axis and between negative 2 and 2 on the vertical axis. All values are approximate. A histogram describes the residual plot. The horizontal axis is labeled Residuals and has markings from negative 3 to 2 in increments of 1. The distribution of the bars is approximately bell-shaped, it starts from negative 3 and ends at 3. The longest bar is at 0.5 on the horizontal axis and the bars decrease in height to the left of 0 and to the right of 0.5. There is no bar at negative 2.

  1. Using Model 1, which of the validity conditions are violated? Select all that apply.
    1. The linearity condition is violated.
    2. The equal variance condition is violated.
    3. The normality condition is violated.
    4. None of the validity conditions are violated.
  2. Are the predicted values based on Model 1 accurate?

For countries with the highest fertility rates (say, 6 or more babies/woman), Model 1 tends to __________ (overestimate/underestimate/accurately predict) the country’s income.

  1. Using Model 2, predict the income of a country whose fertility rate is 6 babies per woman.

Note that log() denotes the natural log. Round to the nearest dollar.

Sol:

  1. Using Model 2, the 95% confidence interval for the slope is (-0.7555, -0.5981). Which of the following interpretations is correct? Note that log() denotes the natural log.
    1. As fertility rate increases by 1 unit, predicted income decreases by 0.5981 to 0.7555 units.
    2. As fertility rate increases by 1 unit, predicted income decreases by 0.1218 to 0.2232 units.
    3. As fertility rate increases by 1 unit, predicted income is multiplied by 1.8187 to 2.1287.
    4. As fertility rate increases by 1 unit, predicted income is multiplied by 0.4698 to 0.5499.

Questions 5 through 7: Suppose we want to use an animal’s body weight (in kg) to predict its brain weight (in kg). Consider three different models:

Model 1:

Model 2:

Model 3:

  1. You want to assess the fit of these three models. Is it appropriate to compare the models based on the standard error of the residuals?
    1. Yes, we can use the standard error of the residuals to compare models. The model with the highest standard error of the residuals is preferred.
    2. Yes, we can use the standard error of the residuals to compare models. The model with the lowest standard error of the residuals is preferred.
    3. No, it is not appropriate to compare models based on the standard error of the residuals, because the response variable has been transformed in Models 2 and 3.
    4. No, it is not appropriate to compare models based on the standard error of the residuals, because the explanatory variable has been transformed in Model 3.
  2. Using Model 3, predict the brain weight of an animal that weighs 45 kg. Note that log() denotes the natural log.

Model 3:

Sol:

  1. In the scatterplot below, the dotted line marks the 95% prediction interval for individual animals in this population based on Model 3. Is it appropriate to interpret this prediction interval?

A scatterplot plots the relationship between body weight and brain weight of animals. The horizontal axis is labeled Log (Body weight) and ranges from 0 to 12 in increments of 3. The vertical axis is labeled Log (Brain weight) and has markings from negative 9 to 3 in increments of 3. Dots are plotted in an increasing trend from left to right in the graph. A regression line starts from (negative 2, negative 6.5), increases to the right such that some of the dots lie above the line, some of the dots lie below the line, and a few dots lie on the line, and ends at (12, 3). The dots are plotted from negative 2 to 10.5 on the horizontal axis and from negative 9 to 2 on the vertical axis. A dotted line starts from (negative 2, negative 3), increases to the right, and ends at (7, 3). Another dotted line starts from (negative 1.5, negative 9.5), increases to the right, and ends at (12, 0). All values are approximate.

    1. Yes, it is appropriate to interpret this interval as long as you back-transform the predictions using natural logarithms: (.
    2. Yes, it is appropriate to interpret this interval as long as you back-transform the predictions using exponents: (.
    3. No, it is not appropriate to interpret this interval, because the response variable has been transformed.
    4. No, it is not appropriate to interpret this interval, because the explanatory variable has been transformed.

Questions 8 and 9: You want to build a model that describes how the population of the United States (in millions) has changed between 1790 and 2000. The residual plots for two different models are shown below.

"Two side by side scatterplots describe the residual plots for two different models. The first scatterplot is titled, Model 1, Predicted population equals b subscript 0 plus b subscript 1 times year. The horizontal axis is labeled Year and has markings from 1800 to 2050 in increments of 50. The vertical axis is labeled Residual and has markings from negative 30 to 50 in increments of 20. A dotted horizontal line starts at 0 on the horizontal axis and extends toward the right such that some of the dots lie above the line, and some of the dots lie below the line. The dots are plotted such that they start from (1790, 45), decrease to a minimum point at (1900, negative 25), then further increases to the right and ends at (2000, 50). All values are approximate.
The second scatterplot is titled, Model 2, Predicted log (population) equals b subscript 0 plus b subscript 1 times year. The horizontal axis is labeled Year and has markings from 1800 to 2050 in increments of 50. The vertical axis is labeled Residual and has markings from negative 0.4 to 0.4 in increments of 0.2. A dotted horizontal line starts at 0.0 on the horizontal axis and extends toward the right such that some of the dots lie above the line, and some of the dots lie below the line. The dots are plotted such that they start from (1790, negative 0.45), increase to a peak at (1888, 0.3), then further decreases to the right and ends at (2000, negative 0.4). All values are approximate."

  1. Do these models provide accurate predictions?

If Model 1 were used to predict the U.S. population in 2020, we would expect it to _________ (overestimate/underestimate) the true population.

If Model 2 were used to predict the U.S. population in 2020, we would expect it to _________ (overestimate/underestimate) the true population.

  1. The best transformation is often found through systematic trial and error. Having tried these two models, which transformation would you try next?
  2. Consider a petri dish that contains bacteria. The growth of the bacteria over time can be modeled using an exponential relationship of the following form:

True or False: Taking the natural logarithm of the response variable will lead to a linear relationship between the transformed variable ln(bacteria) and time.

  1. How can data transformations be used to address the violations of validity conditions?
    1. You can transform any (or all) of the explanatory variables and/or the response variable.
    2. You can transform any (or all) of the explanatory variables, but you should leave the response variable on the original scale.
    3. You can transform the response variable, but you should leave the explanatory variable(s) on the original scale.
    4. You can transform the residuals, but you should leave the explanatory and response variables on the original scale.
  2. You want to compare the fit of two regression models. In one of the models, the response variable has been transformed. Is it appropriate to compare the two models using ?
    1. Yes, we can always use to compare the performance of two models. The model with the higher value is preferred.
    2. Yes, we can always use to compare the performance of two models. The model with the lower value is preferred.
    3. Yes, we can use to compare the performance of two models as long as the explanatory variables have not been transformed.
    4. No, we can only use to compare the performance of two models with the same response variable.
  3. What is the purpose of using data transformations?
    1. Transformations can create a linear association between variables.
    2. Transformations can reduce skewness in conditional distributions of residuals.
    3. Transformations can adjust for measurement errors in the data collection process.
    4. Transformations can create more similar conditional variances of residuals.
  4. How do you decide which variable(s) to transform?

If linearity is the only issue, it is often better to transform the __________ (explanatory/ response) variable(s); if multiple validity conditions are violated, it is often better to transform the __________ (explanatory/ response) variable(s).

  1. Based on the scatterplot below, what kind of linearizing transformation would you try?

A scatterplot plots the relationship between a set of data. The horizontal axis is labeled x and ranges from 0.0 to 10.0 in increments of 2.5. The vertical axis is labeled y and has markings from 3 to 8 in increments of 1. Dots are plotted in an increasing trend from left to right in the graph. A regression curve starts from (0, 3.5), increases to the right such that some of the dots lie above the line, some of the dots lie below the line, and a few dots lie on the line, and ends at (10, 7.4). The dots are plotted from 0 to 10 on the horizontal axis and from 2 to 7.8 on the vertical axis. Two outliers are plotted at (0, 2) and (0.1, 2.4). All values are approximate.

You should try a transformation that ____________ (increases/decreases) the power of the response variable or a transformation that ____________ (increases/decreases) the power of the explanatory variable.

Document Information

Document Type:
DOCX
Chapter Number:
5
Created Date:
Aug 21, 2025
Chapter Name:
Chapter 5 Intermediate Statistical Investigations Test Bank
Author:
Nathan Tintle

Connected Book

Intermediate Statistical Investigations 1st Ed - Exam Bank

By Nathan Tintle

Test Bank General
View Product →

$24.99

100% satisfaction guarantee

Buy Full Test Bank

Benefits

Immediately available after payment
Answers are available after payment
ZIP file includes all related files
Files are in Word format (DOCX)
Check the description to see the contents of each ZIP file
We do not share your information with any third party