Chapter 10 Two Quantitative Variables Exam Questions - Test Bank + Answers | Statistical Investigations 2e by Nathan Tintle. DOCX document preview.

Chapter 10 Two Quantitative Variables Exam Questions

Chapter 10

Introduction to Statistical Investigations Test Bank

Note: TE = Text entry TE-N = Text entry - Numeric

Ma = Matching MS = Multiple select

MC = Multiple choice TF = True-False

DD = Drop-down

CHAPTER 10 LEARNING OBJECTIVES

10-1: Explore scatterplots and comment on the direction, form, and strength of the association between two quantitative variables.

10-2: Carry out a simulation-based analysis for a correlation coefficient.

10-3: Understand how to find the least squares regression line and what the slope of this line conveys.

10-4: Carry out a simulation-based analysis for a slope coefficient.

10-5: Carry out a theory-based analysis for a slope coefficient.

Section 10.1: Two Quantitative Variables: Scatterplots and Correlation

10.1-1: Recognize that a scatterplot is the appropriate graph for displaying the relationship between two quantitative variables, and create a scatterplot from raw data.

10.1-2: Summarize the characteristics of a scatterplot by describing its direction, form, and strength, as well as whether there are any unusual observations.

10.1-3: Recognize that a correlation coefficient of 0 means that there is no linear association between the two variables and that a correlation coefficient of –1 or 1 means that the scatterplot is exactly a straight line.

10.1-4: Estimate the value of the correlation coefficient within ± 0.3 by looking at a scatterplot.

10.1-5: Recognize that the correlation coefficient is appropriate only for summarizing the strength and direction of a scatterplot that has linear form.

10.1-6: Understand that the correlation coefficient is not resistant to extreme observations.

10.1-7: Recognize how the association between two variables may change when data are split into smaller groups.

Questions 1 through 6: Babies born with low birth weights (less than 2500 grams) are at an increased risk for many infant diseases. Researchers in North Carolina collected data to see what variables may influence the birth weight (in grams) of a child, including whether the mother drank alcohol during pregnancy, whether the mother smoked during pregnancy, the mother’s age (years), the gestation of the pregnancy (number of weeks from conception until birth), the mother’s race, the length of the birth (hours), and several others. The plot below summarizes some of the variables measured.

A scatterplot plots the relationship between birth weight of child and gestation of the pregnancy. The horizontal axis is labeled Gestation and has markings from 33 to 43 in increments of 1. The vertical axis is labeled Weight and has markings from 2400 to 3600 in increments of 200. The circle denotes smoke, no and the triangles denote smoke, yes. The circles and the triangles are plotted in an increasing trend from left to right. The plotted circles start at (34, 2450), increase to the right, with a few ups and downs, and end at (42, 3550). The other circles are plotted as follows: (35, 2500), (35, 2600), (36, 2850), (37, 2750, (37, 3050), (38, 2950), (38, 3150), (39, 3100), (39, 3250), (39, 3300), (40, 3225), (40, 3400), (40, 3450), (41, 3550). The triangles start at (35, 2425), increase to the right, with a few ups and downs, and end at (42, 3500). The other triangles are plotted as follows: (36, 2400), (36, 2750), (38, 2550), (38, 2750), (38, 2950), (39, 2775), (39, 2900), (39, 2950), (39, 3125), (41, 3150), (41, 3200), (42, 3300), (42, 3350), (42, 3450). All values are approximate.

  1. For each of the three variables displayed in the plot, state whether they are categorical or quantitative.

Weight:

Gestation:

Smoke:

Weight: quantitative

Gestation: quantitative

Smoke: categorical

LO: 10.1-1; Difficulty: Easy; Type: TE

  1. Based on the plot, does there appear to be an association between gestation period and birth weight?
    1. Yes, because as one variable increases, the other tends to increase as well.
    2. No, because as one variable increases, the other tends to increase as well.
    3. Yes, because most of the circles on the plot appear to be higher than the triangles.
    4. No, because one variable does not appear to change the other variable.
  2. Is whether the mother smoked during pregnancy a confounding variable in describing the relationship between gestation period and birth weight?
    1. No, because smoking status is not associated with gestation period.
    2. No, because smoking status is not associated with birth weight.
    3. Yes, because mothers who smoke tend to have lighter babies, and mothers who smoke also tend to have shorter gestation periods.
    4. Yes, because mothers who smoke tend to have heavier babies, and mothers who smoke also tend to have longer gestation periods.
    5. We cannot determine whether smoking status is a confounding variable based on this plot.
  3. Does the association between gestation period and birthweight appear to depend on smoking status?
    1. Yes, since the regression line for non-smokers is higher than the regression line for smokers.
    2. Yes, since the regression line for non-smokers would have a higher y-intercept than the regression line for smokers.
    3. No, since the regression line for non-smokers and the regression line for smokers have similar slopes.
    4. No, since the sample sizes of smokers and non-smokers are similar.
  4. How would the correlation coefficient between birth weight and gestation period computed from all women in the sample compare to the correlation coefficient between birth weight and gestation period computed from only non-smoking women?
    1. The correlation coefficient for the entire sample would be closer to 1 than the correlation coefficient for the non-smoking group.
    2. The correlation coefficient for the entire sample would be closer to 0 than the correlation coefficient for the non-smoking group.
    3. The correlation coefficient for the entire sample would be the same as the correlation coefficient for the non-smoking group.
    4. We cannot determine how the two correlation coefficients compare based on this plot.
  5. What type of plot would be appropriate for examining the relationship between a mother’s age and her baby’s birth weight?
    1. Scatterplot
    2. Side-by-side boxplots
    3. Segmented bar graph
    4. Dotplot

Questions 7 through 10: The following scatterplot displays the finish time (in minutes) and age (in years) for the male racers at the 2018 Strawberry Stampede (a 10k race through Arroyo Grande).

A scatterplot plots the relationship between the finish time and the age of the male racers. The horizontal axis is labeled Age Male and has markings from 10 to 80 in increments of 10. The vertical axis is labeled Finish Time Male (in minutes) and has markings from 30 to 80 in increments of 10. Dots are randomly scattered throughout the graph. The dots are plotted from 10 to 75 on the horizontal axis and from 35 to 77 on the vertical axis. The concentration of dots is more between 25 and 50 on the horizontal axis and between 40 and 65 on the vertical axis. All values are approximate.

  1. What is the form of this scatterplot?
    1. Linear
    2. Non-linear
  2. What is the direction of the association between finish time and age?
    1. Positive
    2. Negative
  3. Approximate the value of the correlation coefficient for these data.
    1. 0
    2. 0.25
    3. 0.50
    4. 0.80
  4. If 70-year-old male with a finishing time of 35 minutes was added to the data set, would the correlation coefficient increase, decrease, or remain the same?
    1. Increase
    2. Decrease
    3. Remain the same
    4. Unable to determine with the information provided
  5. Which of the following plots has the strongest correlation between the two variables plotted?

A scatterplot titled, Fuel Capacity and Page number depicts the relationship between a set of data. The horizontal axis has markings from 80 to 240 in increments of 80. The vertical axis has markings from 10 to 24 in increments of 2. Dots are randomly scattered throughout the graph. The dots are plotted from 60 to 250 on the horizontal axis and from 10 to 24 on the vertical axis. The concentration of dots is more between 60 and 240 on the horizontal axis and between 13 and 19 on the vertical axis. All values are approximate. A scatterplot titled, City M P G and Weight depicts the relationship between a set of data. The horizontal axis has markings from 2000 to 4000 in increments of 1000. The vertical axis has markings from 16 to 30 in increments of 2. Dots are plotted horizontally in a decreasing trend from left to right. The dots are plotted from 1900 to 4100 on the horizontal axis and from 17 to 30 on the vertical axis. The concentration of dots is more between 2200 and 4000 on the horizontal axis and between 17 and 28 on the vertical axis. All values are approximate. A scatterplot titled, Time for quarter mile and City M P G depicts the relationship between a set of data. The horizontal axis has markings from 20 to 30 in increments of 5. The vertical axis has markings from 14 to 19 in increments of 1. Dots are randomly scattered throughout the graph. The dots are plotted from 17 to 28 on the horizontal axis and from 14 to 19 on the vertical axis. The concentration of dots is more between 17 and 23 on the horizontal axis and between 16 and 19 on the vertical axis. All values are approximate. A scatterplot titled, Time for quarter mile and Weight depicts the relationship between a set of data. The horizontal axis has markings from 2000 to 4000 in increments of 1000. The vertical axis has markings from 14 to 19 in increments of 1. Dots are randomly scattered throughout the graph. The dots are plotted from 2300 to 4000 on the horizontal axis and from 14 to 19 on the vertical axis. The concentration of dots is more between 2500 and 4000 on the horizontal axis and between 16.5 and 19 on the vertical axis. All values are approximate.

A. B. C. D.

  1. Estimate the value of the correlation coefficient between the two variables shown in the following scatterplot.

A scatterplot plots the relationship between two variables. The horizontal axis has three markings in the order from left to right as: negative 39, 23.5, and 85. The vertical axis has markings from negative 1.0 to 1.0 in increments of 0.5. Dots are plotted approximately as a bell-shaped curve. The series of dots start at (negative 39, negative 1.2), increase to reach a peak of 0.75 at 24, then decrease and end at (83, negative 1.5). Three outliers are plotted at the point, (70, 0.1), (75, 0), and (75, 0.3). All values are approximate.

    1. 0.90
    2. 0.80
    3. -0.80
    4. 0.09
  1. True or False: If the correlation coefficient between variables x and y is equal to zero, then we can say that x and y are not associated.
  2. The graph below shows a scatter plot of medical expenses in the past year by age for a sample of Americans.

A scatterplot depicts the medical expenses in the past year by age for a sample of Americans. The horizontal axis is labeled Age and ranges from 0 to 60 in increments of 10. The vertical axis is labeled Yearly Medical Expenses and ranges from 0 to 2000 in increments of 500. Dots are randomly scattered throughout the graph and approximately form a positive parabola. The dots are plotted from 0 to 60 on the horizontal axis and from 250 to 1900 on the vertical axis. All values are approximate.

Which one of the following is a true statement about the data shown in the graph?

  1. The correlation must be close to one because there is a strong relationship between age and medical expenses.
  2. Using correlation on the data shown in the graph above is not appropriate because the relationship shown in the graph is not linear.
  3. Both A and B are true statements.
  4. Neither A nor B is a true statement.
  5. Which of the following correlation coefficient values describes the weakest linear association between two variables.
    1. -0.99
    2. -0.23
    3. 0.12
    4. 0.38

Section 10.2: Inference for the Correlation Coefficient: Simulation-Based Approach

10.2-1: Apply the 3S strategy when evaluating the hypothesis of linear association using the correlation coefficient as the statistic.

10.2-2: Articulate how to conduct a tactile simulation to implement the 3S strategy for testing a correlation coefficient.

10.2-3: Define the p-value in the context of the 3S strategy using simulated correlation coefficients under the null hypothesis of no association.

Questions 16 through 20: Are people with bigger brains more intelligent? Forty college students volunteered to participate in a study which examined brain size (measured as 1000’s of pixels counted in a brain scan), and IQ scores (measured in points). A scatterplot of the data is shown below.

A scatterplot plots the relationship between the brain size and I Q score. The horizontal axis is labeled Brain Size in 1000 pixels and has markings from 750 to 1050 in increments of 50. The vertical axis is labeled I Q (points) and has markings from 70 to 160 in increments of 10. Dots are randomly scattered throughout the graph. The dots are plotted from 790 to 1075 on the horizontal axis and from 72 to 150 on the vertical axis. The concentration of dots is more between 800 and 970 on the horizontal axis and between 80 and 140 on the vertical axis. All values are approximate.

  1. Approximate the value of the correlation coefficient for these data.
    1. 0.10
    2. 0.40
    3. 0.80
    4. 0
  2. State the null and alternative hypotheses using proper notation.
    1. versus
    2. versus
    3. versus
    4. versus
  3. Select the best explanation for how one sample would be simulated in order to generate the null distribution.
    1. Flip a coin to decide whether to swap the values for brain size and IQ or not. Plot the correlation coefficient of the randomized points on the null distribution.
    2. Holding the order of brain size values constant, randomize the order of the IQs. Plot the correlation coefficient of the shuffled data on the null distribution.
    3. Put each pair of (brain size, IQ) on a piece of paper. Draw with replacement 40 times. Plot the correlation coefficient of the resampled data on the null distribution.
    4. Add or subtract the appropriate value from each brain size and IQ in order to force the null hypothesis to be true. Plot the correlation coefficient of the shifted data on the null distribution.
  4. Below is a picture of a simulated null distribution of correlation coefficients created using the Corr/Regression applet. How would you use this distribution to calculate the p-value?

A histogram describes the results of null distribution of correlation coefficients. The horizontal axis is labeled Shuffled Correlation and has markings from negative 0.600 to 0.600 in increments of 0.300. The vertical axis is labeled Count and ranges from 0 to 200 in increments of 50. The distribution of the bars is approximately normal. From negative 0.600 to 0.600, the bars extend up to counts 1, 0, 5, 30, 50, 90, 150, 180, 152, 145, 75, 48, 25, 5, 2, and 1. There are no bars to the left of negative 0.600 and to the right of 0.600. The longest bar is at 0 with the count, 175. The tip of the bar above negative 0.24 is highlighted. The mean is negative 0.003, the standard deviation is 0.160 and the total number of shuffles is 1000. All values are approximate.

    1. Find the proportion of simulated correlation coefficients greater than zero.
    2. Find the proportion of simulated correlation coefficients as far away from zero or further than the one observed.
    3. Find the proportion of simulated correlation coefficients as small or smaller than the one observed.
    4. Find the proportion of simulated correlation coefficients as large or larger than the one observed.
  1. The p-value for this test is 0.008. What can we conclude?
    1. We have strong evidence that an increase in brain size will increase IQ.
    2. We have strong evidence that an increase in brain size is associated with an increase in IQ.
    3. We have strong evidence that an increase in IQ will increase brain size.
    4. We have strong evidence that brain size and IQ are not associated.

Questions 21 through 26: Data from gapminder.org on 184 countries was used to examine if there is an association between (average) female life expectancy (that is, the average lifespan of women in the country) and the average number of children women give birth to for the year 2019. A scatterplot of the data follows.

A scatterplot plots the relationship between average female life expectancy and the birth rate for the year 2019The horizontal axis is labeled Babies per Woman and has markings from 2 to 6 in increments of 2. The vertical axis is labeled Female Life Expectancy and has markings from 60 to 80 in increments of 10. Dots are plotted in a decreasing trend from left to right. The dots are plotted from 1 to 7 on the horizontal axis and from 55 to 85 on the vertical axis. The concentration of dots is more between 1 and 2 on the horizontal axis and between 75 and 85 on the vertical axis. An outlier is plotted at (7, 64). All values are approximate.

  1. What are the observational units?
    1. Women
    2. Babies
    3. Countries
    4. Years
  2. Approximate the correlation coefficient for these data.
  3. State the null and alternative hypotheses using proper notation.
    1. versus
    2. versus
    3. versus
    4. versus
  4. Select the best explanation for how one sample would be simulated in order to generate the null distribution.
    1. Holding the average number of children constant, randomize the order of the female life expectancies. Plot the correlation coefficient of the shuffled data on the null distribution.
    2. Add or subtract the appropriate value from each average number of children and female life expectancy in order to force the null hypothesis to be true. Plot the correlation coefficient of the shifted data on the null distribution.
    3. Flip a coin to decide whether to swap the values for average number of children and female life expectancy or not. Plot the correlation coefficient of the randomized points on the null distribution.
    4. Put each pair of (average number of children, female life expectancy) on a piece of paper. Draw with replacement 40 times. Plot the correlation coefficient of the resampled data on the null distribution.
  5. Below is a picture of a simulated null distribution of correlation coefficients created using the Corr/Regression applet. How would you use this distribution to calculate the p-value?

A histogram describes the results of null distribution. The horizontal axis is labeled Shuffled correlation and has markings from negative 0.300 to 0.300 in increments of 0.100. The vertical axis is labeled count and ranges from 0 to 200 in increments of 50. The distribution of bars is approximately normal. From negative 0.300 to 0.300, the bars extend up to counts 1, 12, 30, 55, 100, 125, 170, 175, 150, 85, 50, 20, and 8. There are no bars to the left of negative 0.200 and to the right of 0.200. The longest bar is at 0 with the count, 175. The tip of the bar above negative 0.033 is highlighted. The mean is 0.001, the standard deviation is 0.074, and the total number of shuffles is 1000. All values are approximate.

    1. Find the proportion of simulated correlation coefficients greater than zero.
    2. Find the proportion of simulated correlation coefficients as far away from zero or further than the one observed.
    3. Find the proportion of simulated correlation coefficients as small or smaller than the one observed.
    4. Find the proportion of simulated correlation coefficients as large or larger than the one observed.
  1. The p-value for this test is less than 0.001. What can we conclude?
    1. We have strong evidence that average number of children per woman is associated with female life expectancy.
    2. We have strong evidence that an increase in average number of children per woman will decrease female life expectancy.
    3. We have strong evidence that an increase in female life expectancy will decrease the average number of children per woman.
    4. We have strong evidence that average number of children per woman is not associated with female life expectancy.

Questions 27 through 30: The following scatterplot displays the finish time (in minutes) and age (in years) for the male racers at the 2018 Strawberry Stampede (a 10k race through Arroyo Grande).

A scatterplot plots the relationship between the finish time and the age of the male racers. The horizontal axis is labeled Age Male and has markings from 10 to 80 in increments of 10. The vertical axis is labeled Finish Time Male (in minutes) and has markings from 30 to 80 in increments of 10. Dots are randomly scattered throughout the graph. The dots are plotted from 10 to 75 on the horizontal axis and from 35 to 77 on the vertical axis. The concentration of dots is more between 25 and 50 on the horizontal axis and between 40 and 65 on the vertical axis. All values are approximate.

Below are the same data for the female racers in this year’s race.

A scatterplot plots the relationship between the finish time and the age of the female racers. The horizontal axis is labeled Age Female and ranges from 0 to 80 in increments of 10. The vertical axis is labeled Finish Time Female (in minutes) and has markings from 30 to 100 in increments of 10. Dots are randomly scattered throughout the graph. The dots are plotted from 8 to 76 on the horizontal axis and from 35 to 97 on the vertical axis. The concentration of dots is more between 19 and 55 on the horizontal axis and between 45 and 70 on the vertical axis. All values are approximate.

  1. Do you think the correlation coefficient for the females will be larger, smaller, or remain the same as the male’s?
    1. Larger
    2. Smaller
    3. Remain the same
  2. Select the best explanation for how one sample would be simulated in order to generate the null distribution for the females.
    1. Holding the ages constant, randomize the order of the race finish times. Plot the correlation coefficient of the shuffled data on the null distribution.
    2. Add or subtract the appropriate value from each age and race finish time in order to force the null hypothesis to be true. Plot the correlation coefficient of the shifted data on the null distribution.
    3. Flip a coin to decide whether to swap the values for age and race finish time or not. Plot the correlation coefficient of the randomized points on the null distribution.
    4. Put each pair of (age, race finish time) on a piece of paper. Draw with replacement 40 times. Plot the correlation coefficient of the resampled data on the null distribution.
  3. What is the null hypothesis for a simulation-based test of the correlation coefficient for the females.
    1. There is an association between female ages and female race finish times.
    2. There is no association between female ages and female race finish times.
  4. The p-value for a simulation-based test of the correlation coefficient for the females is 0.213. True or False: We have evidence that there is no association between female ages and female race finish times.

Section 10.3: Least Squares Regression

10.3-1: Understand that one way a scatterplot can be summarized is by fitting the best-fit (least squares regression) line and interpreting both the slope and intercept of the best-fit line in the context of the two variables on the scatterplot.

10.3-2: Find the predicted value of the response variable for a given value of the explanatory variable.

10.3-3: Understand that slope = 0 means no association, slope < 0 means negative association, and slope > 0 means positive association. Further, the sign of the slope will be the same as the sign of the correlation coefficient.

10.3-4: Understand that extrapolation is use of a regression line to predict values outside of the range of observed values for the explanatory variable, including the special case of y = 0 when applicable.

10.3-5: Understand the concept of residual and find and interpret the residual for an observational unit, given the raw data and the equation of the best-fit (regression) line.

10.3-6: Understand the relationship between residuals and strength of association and that the best-fit (regression) line minimizes the sum of the squared residuals.

10.3-7: Find and interpret the coefficient of determination (R2) as the squared correlation and as the proportion of total variation in the response variable that is accounted for by changes (variation) in the explanatory variable.

10.3-8: Understand that influential points can substantially change the equation of the best-fit line and that observations with extreme values of the explanatory variable may potentially be influential.

  1. True or False: If you fit a least squares line to two quantitative variables x and y, and the slope of the line differs from zero, then you know the correlation coefficient also differs from zero.

Questions 32 through 38: Annual measurements of the number of powerboat registrations (in thousands) and the number of manatees killed by powerboats in Florida were collected over the 14 years 1977–1990. A scatterplot of the data, least squares regression line, and correlation coefficient follow.

A scatterplot is titled Manatee Deaths versus Powerboat Registrations. The horizontal axis is labeled Powerboat Registrations (in thousands) and has markings from 450 to 700 in increments of 50. The vertical axis is labeled Manatee Deaths and has markings from 20 to 50 in increments of 10. Dots are plotted in an increasing trend from left to right. A regression line starts at (440, 15), increases to the right, and ends at (750, 48), such that some dots lie above the line, some dots lie below the line, and some dots lie on the line. The dots are plotted at the following point as follows: (449, 3), (460, 21), (480, 25), (499, 10), (510, 20), (511, 25), (525, 10), (560, 35), (590, 33), (620, 32), (645, 40), (675, 45), (720, 49), and (730, 46). All values are approximate.

Correlation:

r = 0.943

Regression line:

  1. How would you interpret the slope of the regression line in the context of the problem? Select all that apply.
    1. Every 8,000 powerboat registrations is associated with a predicted increase of one manatee death.
    2. We predict an additional 0.12 manatee death for each single powerboat registration.
    3. We predict an additional 0.12 manatee death for every 1,000 powerboats registered.
    4. We predict a decrease of 41.43 manatee deaths for every 1,000 powerboats registered.
  2. Fill in the blanks with the appropriate values to interpret the y-intercept:

We predict ___(1)____ manatee deaths when there are ____(2)____ powerboat registrations.

LO: 10.3-1; Difficulty: Medium; Type: TE-N

  1. The y-intercept is not a valid prediction of manatee deaths. Why?
    1. It is an example of extrapolation.
    2. We cannot observe a negative number of manatee deaths.
    3. Both A and B.
    4. Neither A nor B.
  2. What is the predicted number of manatee deaths for a year with 600,000 powerboat registrations?

LO: 10.3-2; Difficulty: Medium; Type: TE-N

  1. The year 1984 had 559,000 powerboat registrations and 34 manatee deaths. Calculate the residual for this observation.

LO: 10.3-5; Difficulty: Medium; Type: TE-N

  1. The year 1984 had 559,000 powerboat registrations and 34 manatee deaths. Did the least squares regression line underestimate, overestimate, or accurately estimate the number of manatee deaths for the year 1984?
    1. Underestimate
    2. Overestimate
    3. Accurately estimate
  2. Which of the following is a correct interpretation of the coefficient of determination?
    1. About 94.3% of the variation in manatee deaths can be explained by the number of powerboat registrations.
    2. About 88.9% of the variation in manatee deaths can be explained by the number of powerboat registrations.
    3. An increase of 1,000 powerboat registrations is associated with a predicted increase of 0.943 manatee deaths.
    4. An increase of 1,000 powerboat registrations is associated with a predicted increase of 0.889 manatee deaths.

Questions 39 through 41: Data from the World Bank for 25 Western Hemisphere countries was used to examine the association between (average) female life expectancy (that is, the average lifespan of women in the country) and the average number of children women give birth to. Given below is the scatterplot for the data.

A scatterplot titled, scatterplot of Life Expectancy versus Births per woman. The horizontal axis is labeled Births per Woman and has markings from 1.5 to 4.5 in increments of 0.5. The vertical axis is labeled Life Expectancy and has markings from 65.0 to 80.0 in increments of 2.5. Dots are plotted in a decreasing trend from left to right. The dots are plotted from 1.5 to 4.4 on the horizontal axis and from 65 to 90 on the vertical axis. All values are approximate.

The regression equation for this context is found to be:

where is female life expectancy in years, and is the average number of births per woman.

  1. Interpret the slope in the context of the study.
    1. We expect to see an increase of 84.5 years in female life expectancy when the average number of births per woman in a country increases by one child.
    2. When a country has zero births per woman on average, we predict a female life expectancy of 84.5 years.
    3. We expect to see a decrease of 4.4 years in female life expectancy when the average number of births per woman in a country increases by one child.
    4. When a country has zero births per woman on average, we predict a female life expectancy of 4.4 years.
  2. Interpret the y-intercept in the context of the study.
    1. We expect to see an increase of 84.5 years in female life expectancy when the average number of births per woman in a country increases by one child.
    2. When a country has zero births per woman on average, we predict a female life expectancy of 84.5 years.
    3. We expect to see a decrease of 4.4 years in female life expectancy when the average number of births per woman in a country increases by one child.
    4. When a country has zero births per woman on average, we predict a female life expectancy of 4.4 years.
  3. Is the interpretation of the y-intercept meaningful in the context? Why?
    1. No, since it is an example of extrapolation.
    2. No, since the lowest value for average births per woman in the data set was 1.5.
    3. Both A and B.
    4. Neither A nor B.
  4. In the scatterplot shown, which labeled point has the largest residual?
    1. A scatterplot plots the relationship between foot length and height. The horizontal axis is labeled Foot length and has markings from 20 to 36 in increments of 4. The vertical axis is labeled Height and has markings from 60 to 75 in increments of 5. Dots are plotted in an increasing trend from left to right in the graph. A regression line starts at (20, 59), increases to the right, and ends at (38, 77) such that some dots lie above the line, some dots lie below the line, and some dots lie on the line. The dots are plotted from 22 to 35 on the horizontal axis and from 58 to 77 on the vertical axis. The dot at the point, (24, 58) is marked as, A. The dot at the point, (26, 68) is marked as, B. The dot at the point, (30, 66) is marked as, C. The dot at the point, (29, 77) is marked as, D. The dot at the point, (35, 72.5) is marked as, E. All values are approximate.A
    2. B
    3. C
    4. D
    5. E

LO: 10.3-5

Difficulty: Easy

Type: MC

  1. True or False: Observations with values of the explanatory variable near the mean of the explanatory variable may potentially be influential.
  2. True or False: The least squares regression line minimizes the absolute value of the residuals.
  3. True or False: The correlation coefficient is the proportion of total variation in the response variable that is accounted for by changes in the explanatory variable.

Section 10.4: Inference for the Regression Slope: Simulation-Based Approach

10.4-1: Apply the 3S strategy when evaluating the hypothesis of association using the slope as the statistic.

10.4-2: Articulate how to conduct a tactile simulation to implement the 3S strategy for testing a slope.

10.4-3: Define the p-value in the context of the 3S strategy using simulated slopes under the null hypothesis of no association.

10.4-4: Know that a test of association based on slope is equivalent to a test of association based on a correlation coefficient.

  1. For a given dataset, a test of association based on a slope is equivalent to a test of association based on a correlation coefficient. Being equivalent means which of the following is true?
    1. The confidence intervals for the population correlation and population slope will have the same center.
    2. The p-value will be the same whether you use correlation as the statistic or the slope of the regression line as the statistic.
    3. The observed correlation will be the same as the observed slope of the regression line.
    4. The confidence intervals for the population correlation and population slope will have the same width.

Questions 47 through 52: Social warmth is a term referring to the feeling of being connected to others. A study published in PLoS One in 2016 looked at a potential relationship between physical warmth (body temperature) and social warmth among a group of 54 volunteers (Inagki et al.). These volunteers had their oral temperature taken by a registered nurse and then assessed themselves using a scale of 1 to 5 on twelve items related to a feeling of social connection for which the average was recorded. Higher average scores indicated higher levels of social warmth. The theory was that the thermoregulatory system, which helps maintain a relatively warm internal body temperature, may also help people assess feelings of social connection. Below is a scatterplot and least-squares regression line of the data.

A scatterplot of social warmth score and body temperature. The horizontal axis is labeled Body Temperature (degrees Celsius) and has markings from 36.4 to 37.6 in increments of 0.4. The vertical axis is labeled Social Warmth Score and has markings from 3.5 to 4.5 in increments of 0.5. Dots are vertically plotted for certain markings on the horizontal axis. The dots are plotted from 36.0 to 37.6 on the horizontal axis and from 3.2 to 4.9 on the vertical axis. The concentration of dots is more between 36.4 and 37.2 on the horizontal axis and between 3.7 and 4.5 on the vertical axis. An outlier is plotted at (37.6, 4.7). All values are approximate.

  1. How would you interpret the slope of the regression line in the context of the study?
    1. The correlation coefficient between body temperature and social warmth score is 0.461.
    2. The predicted social warmth score when body temperature is zero degrees Celsius is -12.773.
    3. A one degree Celsius increase in body temperature is associated with a predicted 0.461 increase in social warmth score.
    4. About 46.1% of variability in social warmth scores can be explained by body temperature.
  2. Which of the following is the correlation coefficient for these data?
    1. 0.348
    2. 0.721
    3. -0.032
    4. -0.213
  3. State the null and alternative hypotheses for a simulation-based test of the slope using proper notation.
    1. versus
    2. versus
    3. versus
    4. versus
  4. Select the best explanation for how one sample would be simulated in order to generate the null distribution.
    1. Holding the body temperatures constant, randomize the order of the social warmth scores. Plot the slope of the regression line of the shuffled data on the null distribution.
    2. Add or subtract the appropriate value from each body temperature and social warmth score in order to force the null hypothesis to be true. Plot the slope of the regression line of the shifted data on the null distribution.
    3. Flip a coin to decide whether to swap the values for body temperature and social warmth score or not. Plot the slope of the regression line of the randomized points on the null distribution.
    4. Put each pair of (body temperature, social warmth score) on a piece of paper. Draw with replacement 40 times. Plot the slope of the regression line of the resampled data on the null distribution.
  5. Below is a picture of a simulated null distribution of slopes created using the Corr/Regression applet. How is this distribution used calculate the p-value?

A histogram describes the results of null distribution on shuffled slopes against count. The horizontal axis is labeled Shuffled Slope and has markings from negative 0.600 to 0.600 in increments of 0.300. The vertical axis is labeled Count and ranges from 0 to 200 in increments of 50. The distribution of the bars is approximately normal. From negative 0.600 to 0.600, the bars extend up to counts 5, 20, 40, 55, 105, 155, 165, 160, 110, 75, 45, 15 and 14. There are no bars to the left of negative 0.600 and to the right of 0.600. The longest bar is at 0 with the count, 165. The tip of the bar above 0.06 is highlighted. The mean is negative 0.008, the standard deviation is 0.180, and the total number of shuffles is 1000. All values are approximate.

    1. Find the proportion of simulated slopes greater than zero.
    2. Find the proportion of simulated slopes as far away from zero or further than 0.461.
    3. Find the proportion of simulated slopes as small or smaller than 0.461.
    4. Find the proportion of simulated slopes as large or larger than 0.461.
  1. Based off of the simulated null distribution in question 51, what is the strength of evidence against the null hypothesis?
    1. We have strong evidence against the null hypothesis.
    2. We have moderate evidence against the null hypothesis.
    3. We have weak evidence against the null hypothesis.
    4. We have no evidence against the null hypothesis.

Questions 53 through 60: It is commonly expected that as a person ages, their muscle mass decreases. To further examine this relationship in women, a nutritionist randomly selected 60 female patients from her clinic, 15 women from each 10-year age group beginning with age 40 and ending with age 80. For each patient, her age and current muscle mass was recorded. A scatterplot, least squares regression line, and coefficient of determination are as follows.

"A scatterplot plots the relationship between the age and the muscle mass. The horizontal axis is labeled Age (in years) and has markings from 40 to 70 in increments of 10. The vertical axis is labeled Muscle Mass (in pounds) and has markings from 50 to 120 in increments of 10. Dots are plotted in a decreasing trend from left to right in the graph. A regression line starts at (40, 110), decreases to the right, and ends at (80, 61) such that some dots lie above the line, some dots lie below the line, and some dots lie on the line. The dots are plotted from 41 to 79 on the horizontal axis and from 51 to 120 on the vertical axis. All values are approximate.
To the top right of the  scatterplot, the equations reads,  y (hat symbol) equals 156.347 minus 1.19 times x; and R squared equals 75.01 percent."

  1. Write a sentence interpreting the value of the slope in the context of the study.
    1. A one year increase in age is associated with a 1.19 lb increase in predicted muscle mass.
    2. A one year increase in age is associated with a 1.19 lb decrease in predicted muscle mass.
    3. A one year increase in muscle mass is associated with a 1.19 lb increase in predicted age.
    4. A one year increase in muscle mass is associated with a 1.19 lb decrease in predicted age.
  2. Which of the following is a correct interpretation of the coefficient of determination?
    1. When the age of a woman is equal to zero, her predicted muscle mass is 75.01 lbs.
    2. The correlation coefficient between age and muscle mass is equal to 0.7501.
    3. Each additional year in age is associated with a 75.01% decrease in predicted muscle mass.
    4. Approximately 75.01% of the variation in muscle mass can be explained by changes in age among these women.
  3. What is the value of the correlation coefficient between age and muscle mass for these data?
    1. 0.7501
    2. -0.7501
    3. 0.8661
    4. -0.8661
  4. Write the null and alternative hypotheses of interest for testing if there is a negative linear relationship between age and muscle mass using proper notation for a test of slope.
    1. versus
    2. versus
    3. versus
    4. versus
  5. How would you simulate one sample, assuming the null hypothesis is true?
    1. Label cards with muscle mass values from the original data. Mix cards together; shuffle into two new groups of age 40 and age 80.
    2. Label cards with muscle mass values from the original data. Mix cards together, and deal one muscle mass value to each of the age values in the data.
    3. Flip a coin for each woman in the sample; if heads, swap the difference between age and muscle mass.
    4. Label cards with age values from the original data. Mix cards together; shuffle into two new groups of low and high muscle mass.
  6. The p-value for these data was less than 0.0001. Write a conclusion of the test in the context of the study.
    1. We have strong evidence that an increase in age causes a decrease in muscle mass.
    2. We have strong evidence that age is negatively correlated with muscle mass.
    3. We do not have strong evidence that an increase in age causes a decrease in muscle mass.
    4. We do not have strong evidence that age is negatively correlated with muscle mass.
  7. If you were to conduct a simulation-based test using the correlation coefficient as your statistic, would the p-value be larger, smaller, or remain the same as the p-value reported in question 58?
    1. Larger
    2. Smaller
    3. Remain the same
    4. Not enough information provided
  8. Can these results be generalized to the population of all patients at this clinic?
    1. Yes, since it was a random sample.
    2. No, since the sample size was too small.
    3. No, since only female patients were selected.
    4. No, since age was not randomly assigned to patients.

Section 10.5: Inference for the Regression Slope: Theory-Based Approach

10.5-1: Realize that both simulation-based approaches to testing correlation coefficients and slopes can, under certain conditions, be well predicted by the theory-based approach known as a t-test.

10.5-2: Evaluate a scatterplot for the two validity conditions for a theory-based test of correlation coefficients/slopes: symmetry and consistent variability around the regression line.

10.5-3: State hypotheses in terms of population slopes and correlations.

10.5-4: Interpret a confidence interval for the population slope.

Questions 61 through 68: Data from gapminder.org on 184 countries was used to examine if there is an association between (average) female life expectancy (that is, the average lifespan of women in the country) and the average number of children women give birth to for the year 2019. A scatterplot of the data and a regression table from the Corr/Regression applet follows.

A scatterplot plots the relationship between average female life expectancy and the average number of children women give birth to. The horizontal axis is labeled Average Number of Babies per Woman and has markings from 2 to 6 in increments of 2. The vertical axis is labeled Female Life Expectancy and has markings from 60 to 80 in increments of 10. Dots are densely plotted in a decreasing trend from left to right in the graph. A regression line starts at (1.5, 83), decreases to the right, and ends at (7, 52) such that some dots lie above the line, some dots lie below the line, and some dots lie on the line. The dots are plotted from 1.5 to 7 on the horizontal axis and from 55 to 88 on the vertical axis. The concentration of dots is more between 1.5 and 2.5 on the horizontal axis and between 75 and 86 on the vertical axis. An outlier is plotted at (7, 64). All values are approximate.

Term

Coeff

SE

t-stat

p-value

Intercept

88.91

0.72

123.78

<0.0001

Fertility

-5.20

0.24

-21.60

<0.0001

  1. Which of the following validity conditions does not need to be checked in order to conduct a theory-based test for a regression slope?
    1. The number of countries in the data set is larger than 20.
    2. The variability in female life expectancy around the regression line should be similar regardless of the value of average number of babies per woman.
    3. There is approximately the same distribution of points above the regression line as below the regression line.
    4. The general pattern of the points on the scatterplot has a linear trend.
  2. Which of the approaches to a test of the regression slope are valid?
    1. Simulation-based test
    2. Theory-based test
    3. Both A and B
    4. Neither A nor B
  3. State the null and alternative hypotheses to examine if there is an association between female life expectancy and the average number of children women give birth to for the year 2019.
    1. versus
    2. versus
    3. versus
    4. versus
    5. Both A and C
    6. Both B and D
  4. Using the regression table output, state the equation of the regression line.
  5. Using the regression table output, what is the standardized statistic for a test of the regression slope.
    1. 123.78
    2. -21.60
    3. 88.91
    4. -5.20
  6. How would you interpret the standardized statistic for a test of the regression slope?
    1. The observed sample slope of -5.20 is 21.6 standard errors below the hypothesized value of zero.
    2. The observed sample intercept of 88.91 is 21.6 standard errors below the hypothesized value of zero.
    3. Zero is 21.6 standard errors below the observed sample slope of -5.20.
    4. Zero is 21.6 standard errors below the observed sample intercept of 88.91.
  7. Is there significant evidence of an association between female life expectancy and the average number of children women give birth to for the year 2019?
    1. Yes, since the slope of the regression line is negative.
    2. Yes, since the intercept of the regression line is positive.
    3. Yes, since the correlation is negative.
    4. Yes, since the p-value for the intercept is less than 0.01.
    5. Yes, since the p-value for the slope is less than 0.01.
  8. A 95% confidence interval for the population slope is (-5.68, -4.73). How would you interpret this interval in the context of the study?
    1. If we repeated this study many times, 95% of the regression slopes would fall between -5.68 and -4.73.
    2. There is a 95% probability that the population slope is between -5.68 and -4.73.
    3. We are 95% confident that the population slope is between -5.68 and -4.73.
    4. A one baby increase in average number of babies per woman is associated with between a 4.73 and 5.68 year decrease in female life expectancy, with 95% confidence.
    5. Both A and B.
    6. Both B and C.
    7. Both C and D.

Questions 69 through 72: How is the number of pages in a textbook related to the price of the textbook? To find out, two Cal Poly freshmen (2006) randomly selected 30 textbooks at the campus bookstore and recorded the price ($) and number of pages for each book. Here's output from analyzing the data collected in the Corr/Regression applet. Assume, for now, that the normal approximation-based method is valid, and thus, the p-value from the Regression Table below is valid.

A table titled, Regression table with a selected checkbox at its right. The table has 2 rows and 5 columns with column headers as: Term, Coefficient, S E, t-stat, and p-value. Row 1: Term, Intercept; Coefficient, negative 3.4223; S E, 10.4637; t-stat, negative 0.33; p-value, 0.7461. Row 2: Term, Pages; Coefficient, 0.1473; S E, 0.0193; t-stat, 7.65; p-value, 0.0000.

  1. Using the information available, fill in the blanks to write the equation of the regression line as estimated from the data.

_____(1)_____ + ____(2)_____

LO: 10.5-1; Difficulty: Easy; Type: TE-N

  1. What is the most appropriate null hypothesis for testing the regression slope using theory-based methods?
    1. There is no association between number of pages in a textbook and price of the textbook.
    2. There is an association between number of pages in a textbook and price of the textbook.
    3. There is no linear association between number of pages in a textbook and price of the textbook.
    4. There is a linear association between number of pages in a textbook and price of the textbook.
  2. True or False: We can conclude that the price of a textbook will increase if we add more pages.

LO: 10.5-1; Difficulty: Hard; Type: TF

  1. A 95% confidence interval for the population slope is (0.108, 0.187). Which of the following statements are valid based on this interval?
    1. If we repeated this study many times, 95% of the regression slopes would fall between 0.108 and 0.187.
    2. There is a 95% probability that the population slope is between 0.108 and 0.187.
    3. We are 95% confident that the population slope is between 0.108 and 0.187.
    4. We have significant evidence that the population slope is greater than zero.
    5. Both A and B.
    6. Both B and C.
    7. Both C and D.

A scatterplot plots the relationship between the number of followers and the following of a teen on Instagram. The horizontal axis is labeled Following and ranges from 0 to 1500 in increments of 500. The vertical axis is labeled Followers and has markings from 200 to 1700 in increments of 100. Dots are plotted in an increasing trend from left to right. A regression line starts at (200, 200), increases to the right, and ends at (1500, 1700) such that some dots lie above the line, some dots lie below the line, and some dots lie on the line. The dots are plotted from 300 to 1500 on the horizontal axis and from 300 to 1600 on the vertical axis. A vertical line extends from the point, 0 on the horizontal axis and covers the entire range of the graph. All values are approximate.

Questions 73 through 75: A student in an AP Statistics class decided to conduct a study to determine whether you could predict the number of followers a teen has on Instagram based on the number of people he or she is following. To do this, she randomly selected fifty students from her high school that had Instagram accounts and for each student recorded the number of people they were following and the number of followers they had. A scatterplot of the data is shown.

The regression line is

.

  1. We want to test: H0: β = 0, Ha: β ≠ 0. Use results from the null distribution of simulated slopes shown to determine the standardized statistic.

A histogram describes the results of simulated null distribution of simulated slopes. The horizontal axis is labeled Shuffled Slope and has markings from negative 0.200 to 0.200 in increments of 0.200. The distribution of the bars is approximately normal.  The histogram shows 13 bars which starts from negative 0.200, and ends at 0.200. The longest bar is at 0 on the horizontal axis. and the bars decrease in height to the left and right of 0. The tip of the bar above 0 is highlighted. The mean is 0.000, the standard deviation is 0.07, and the total number of shuffles is 4000. All values are approximate.

LO: 10.5-1; Difficulty: Medium; Type: TE-N

  1. Based on the standardized statistic, is there strong evidence of an association between the number of followers and the number of people following an Instagram account?
    1. No, since the mean of the null distribution is zero.
    2. Yes, since the regression slope is positive.
    3. Yes, since the standardized statistic is greater than 3.
    4. Yes, since the correlation is positive.
  2. A 95% confidence interval for the population slope is (1.07, 1.35). How would you interpret this in the context of the study?
    1. We are 95% confident that the population slope is between 1.07 and 1.35.
    2. If we repeated this study many times, 95% of the regression slopes would fall between 1.07 and 1.35.
    3. There is a 95% probability that the population slope is between 1.07 and 1.35.
    4. We are 95% confident that a one person increase in the number of people one is following on Instagram is associated with between a 1.07 to 1.35 person increase in the number of followers.
    5. Both A and D.
    6. Both A and B.
    7. Both B and C.

Document Information

Document Type:
DOCX
Chapter Number:
10
Created Date:
Aug 21, 2025
Chapter Name:
Chapter 10 Two Quantitative Variables
Author:
Nathan Tintle

Connected Book

Test Bank + Answers | Statistical Investigations 2e

By Nathan Tintle

Test Bank General
View Product →

$24.99

100% satisfaction guarantee

Buy Full Test Bank

Benefits

Immediately available after payment
Answers are available after payment
ZIP file includes all related files
Files are in Word format (DOCX)
Check the description to see the contents of each ZIP file
We do not share your information with any third party