Tintle Full Test Bank Preliminaries Test Bank 1e - Intermediate Statistical Investigations 1st Ed - Exam Bank by Nathan Tintle. DOCX document preview.

Tintle Full Test Bank Preliminaries Test Bank 1e

Preliminaries

Intermediate Statistical Investigations Test Bank

Question types: FIB = Fill in the blank Calc = Calculation

Ma = Matching MS = Multiple select

MC = Multiple choice TF = True-false

PRELIMINARIES TERMINAL LEARNING OUTCOMES

TLOP-A: Explore how the relationship between two variables can be impacted by additional variables.

TLOP-B: Express the relationship between two variables using a statistical model, and calculate and report the standard error of the residuals, a measure of prediction error.

PRELIMINARIES ENABLING LEARNING OUTCOMES

LOP-1: Identify and apply basic terminology of statistical studies: observational units, response variable, explanatory variable, association, confounding variable.

LOP-2: Identify potential sources and measures of variation in a response variable.

LOP-3: Produce and describe some basic visualizations and numerical summaries to compare groups and explore relationships (e.g., bar graphs, dotplots/histograms/boxplots, scatterplots, means, medians, standard deviation).

LOP-4: Explore how those comparisons and relationships can be impacted by additional variables.

LOP-5: Calculate a residual and relate it to typical prediction error.

Section P.A

Questions 1 and 2: In order for students to participate in remote learning, they need access to a computer or tablet for school work. An advocacy group conducted a survey of U.S. families and analyzed the relationship between household income (less than $25,000, between $25,000 and $50,000, or greater than $50,000) and access to a computer or tablet (no access, sometimes, always).

  1. Access to a computer/tablet is the __________ (explanatory/response) variable.

We can classify this variable as __________ (categorical/quantitative).

Income is the __________ (explanatory/response) variable.

We can classify this variable as __________ (categorical/quantitative).

  1. Which type of graph is most appropriate for displaying the relationship between income and access to a computer/tablet?
    1. A mosaic plot
    2. A histogram
    3. Stacked boxplots (one boxplot for each group)
    4. A scatterplot
  2. Which of the hypothetical mosaic plots below shows an association between class (first, second, third, or crew) and survival on the Titanic? Select all that apply.

"Three side by side mosaic plots. The first mosaic plot is titled, Graph A and plots percentage of survived for the responses, yes and no. The horizontal axis is labeled Class and has the following markings, crew, first, second, and third, in the order from left to right. The vertical axis is labeled Survived and ranges from 0 to 100 in increments of 20 percent. The heights of the rectangular bars denoting each response are as follows: For crew: yes, 24 percent and no, 76 percent. For first: yes, 63 percent and no, 38 percent. For second: yes, 40 percent and no, 60 percent. For third: yes, 25 percent and no, 75 percent. The width of the rectangular bars for crew is 2.9 times more than that of the width of the rectangular bars for first, 3 times more than that of the width of the rectangular bars for second, and 0.3 times more than that of the width of the rectangular bars for third. All values are approximate. 

The second mosaic plot is titled, Graph B and plots percentage of survived for the responses, yes and no. The horizontal axis is labeled Class and has the following markings, crew, first, second, and third, in the order from left to right. The vertical axis is labeled Survived and ranges from 0 to 100 in increments of 20 percent. The heights of the rectangular bars denoting each response are as follows: For crew: yes, 50 percent and no, 50 percent. For first: yes, 50 percent and no, 50 percent. For second: yes, 50 percent and no, 50 percent. For third: yes, 50 percent and no, 50 percent. The width of the rectangular bars for crew is 2.9 times more than that of the width of the rectangular bars for first, 3 times more than that of the width of the rectangular bars for second, and 0.3 times more than that of the width of the rectangular bars for third. All values are approximate. 

The third mosaic plot is titled, Graph C and plots percentage of survival for the responses, yes and no. The horizontal axis is labeled Class and has the following markings, crew, first, second, and third, in the order from left to right. The vertical axis is labeled Survived and ranges from 0 to 100 in increments of 20 percent. The heights of the rectangular bars denoting each response are as follows: For crew: yes, 52 percent and no, 48 percent. For first: yes, 52 percent and no, 48 percent. For second: yes, 52 percent and no, 48 percent. For third: yes, 52 percent and no, 48 percent. The width of the rectangular bars for crew is 2.9 times more than that of the width of the rectangular bars for first, 3 times more than that of the width of the rectangular bars for second, and 0.3 times more than that of the width of the rectangular bars for third. All values are approximate. "

    1. Graph A
    2. Graph B
    3. Graph C

Questions 4 through 7: Two hospitals, Memorial Hospital and Fairbanks Medical Center, both perform the same procedure to alleviate joint pain. Researchers surveyed a random sample of 100 patients from each hospital who had undergone this procedure and asked whether or not they had made a full recovery in a six month period after surgery. The table below shows the results of the survey. The mosaic plot further breaks down the results based on socioeconomic status (SES).

Recovery

Hospital

Full

No

Total

Fairbanks

60

40

100

Memorial

66

34

100

Total

126

74

200

"Three-way mosaic plot has two responses in acceptance namely, Full and No. The horizontal axis is labeled Hospital or S E S and has the following markings, Fairbanks and Memorial, in the order from left to right. Both the markings, Fairbanks and Memorial have the following two sub-divisions: Low and High. The vertical axis is labeled Recovery and ranges from 0 to 100 in increments of 20 percent.  
For Fairbanks, the heights of the rectangular bars denoting each response are as follows: For Low: Full, 50 percent and No, 50 percent. For High: Full, 76 percent and No, 24 percent. The width of the rectangular bars for low is 1.5 times more than that of the width of the rectangular bars for high. For Memorial, the heights of the rectangular bars denoting each response are as follows: For Low: Full, 38 percent and No, 62 percent. For High: Full, 73 percent and No, 27 percent. The width of the rectangular bars for high is 4 times more than that of the width of the rectangular bars for low. All values are approximate. "

  1. Based on the table, is there an association between hospital and recovery in this sample? Note: This question refers to the sample, so there is no need to consider whether this data reflects a genuine tendency in the population.
    1. There is no association. More than 50% of patients recover, regardless of hospital.
    2. There is no association. The recovery rate is not the same at the two hospitals.
    3. There is an association. More than 50% of patients recover, regardless of hospital.
    4. There is an association. The recovery rate is not the same at the two hospitals.
  2. Suppose you are primarily interested in the potential association between hospital and recovery. Based on the mosaic plot, does SES satisfy the definition of a confounding variable in this scenario?

_______ (Yes/No), because Memorial Hospital is more likely to serve _______ (high/low) SES patients, compared to Fairbanks, and_______ (high/low) SES patients are more likely to make a full recovery.

  1. Does this scenario satisfy the definition of Simpson’s paradox?

_______ (Yes/No), because overall, the recovery rates are higher at _______ (Memorial/Fairbanks), but after adjusting for SES, the conditional recovery rates are higher at _______ (Memorial/Fairbanks).

  1. Put the components of the study into the correct boxes in the Sources of Variation Diagram. Note: Some boxes will include more than one answer.

Observed variation in:

Sources of
explained variation

Sources of
unexplained variation

Inclusion criteria

  1. Hospital (Memorial or Fairbanks)
  2. Has undergone medical procedure for joint pain
  3. Lifestyle factors (physicality of occupation, opportunities for rest, etc.)
  4. Recovery (full or no)
  5. Socioeconomic status (low SES or high SES)
  6. Physical therapy (as recommended, less than recommended, none)

Questions 8 through 13: In 2018, a sample of academic faculty from universities across the country were surveyed about their salaries (in US dollars). The results were classified according to each faculty member’s academic rank (instructor, assistant professor, associate professor, and full professor) and gender (male, female).

  1. Identify the observational units and variables. Note: Some blanks will include more than one answer. One of the answer choices will not be used.

Observational units: A. Academic faculty members

Explanatory variables: B. Universities

Response variable: C. Salary (in US dollars)

D. Academic rank

E. Gender

  1. Name the graph types being used to display the distribution of salaries.

A histogram describes the distribution of salaries. The horizontal axis is labeled Salary (in dollars) and has markings from 50,000 to 300,000 in increments of 50,000. The distribution of the bars is approximately normal and it starts from 50,000, and ends at 325,000. The longest bar is at 100,000 on the horizontal axis and the bars decrease in height to the left of 100,000 and to the right of 100,000. All values are approximate.

    1. Bar graph
    2. Mosaic plot
    3. Histogram
    4. Boxplot
    5. Scatterplot
  1. Describe the relationship between the mean salary and the median salary.

A horizontal box plot with dots depicts the distribution of salaries. The horizontal axis is labeled Salary (in dollars) and has markings from 50,000 to 300,000 in increments of 50,000. The box plot for salary has whiskers that range from 30,000 to 215,000 and its box ranges from 80,000 to 130,000 with median at 100,000. A series of dots are plotted vertically below the lower whisker, at certain markings on the horizontal axis. The dots are plotted as follows: 1 dot above 215,000; 7 dots above 218,000; 2 dots above 220,000, 227,000, 229,000, 231,000, 233,000, 246,000; 3 dots above 248,000, 252,000; 1 dot above 254,000, 256,000; 3 dots above 260,000, 1 dot above 280,000, 282,000; 2 dots above 286,000; 1 dot above 288,000; 2 dots above 300,000; 1 dot above 305,000. All values are approximate.

The mean salary is ______ (greater/less) than the median salary, because the distribution of salaries is skewed ______ (right, left)

  1. True or False: There is an association between gender and salary in this sample.

"Two side by side horizontal box plots with dots describe the association between gender and salary. The horizontal axis is labeled Salary (in dollars) and has markings from 50,000 to 300,000 in increments of 50,000. The vertical axis is labeled gender and has markings as, male and female in the order from top to bottom. 
For Male, the whiskers range from 30,000 to 235,000 and its box ranges from 85,000 to 149,000 with median at 110,000. A series of dots are plotted vertically below the lower whisker, at certain markings on the horizontal axis. The dots are plotted as follows: 1 dot above 240,000; 4 dots above 245,000, 250,000; 1 dot above 255,000, 260,000, 265,000; 2 dots above 270,000; 1 dot above 280,000, 285,000; 2 dots above 290,000; 1 dot above 295,000; 2 dots above 300,000; and 1 dot above 305,000. All values are approximate. 
For Female, the whiskers range from 32,000 to 170,000 and its box ranges from 70,000 to 115,000 with median at 87,500. A series of dots are plotted vertically below the lower whisker, at certain markings on the horizontal axis. The dots are plotted as follows: 2 dots above 170,000; 1 dot above 180,000; 2 dots above 185,000; 1 dot above 190,000; 2 dots above 195,000; 1 dot above 210,000; and 2 dots above 220,000. All values are approximate."

  1. Use the mosaic plot to match the percentages below to the appropriate statement describing the data.

Statement 1: _____% of the faculty in this sample are male.

Statement 2: _____ % of the faculty who have reached the rank of professor are male.

Statement 3: _____% of female faculty members are at the rank of instructor.

Statement 4: _____% of male faculty members are at the rank of instructor.

  1. 7%
  2. 22%
  3. 68%
  4. 81%
  5. Using the tables below, compare the distribution of salaries for male and female faculty members, before and after taking rank into account.

A table titled, salary has two rows and five columns and the column headers are: gender, Mean, Median, Std Dev, and I Q R. The data from the table reads: Row 1: female: mean, 95670.78; median, 89004.99; std Dev, 36198.22; I Q R, 40259.92. Row 2: male: mean, 123953.16; median, 107470.67; std Dev, 55281.05; I Q R, 60642.28.

A table titled, salary has eight rows and six columns and the column headers are: rank, gender, Mean, Median, Std Dev, and I Q R. The data from the table reads: Row 1: rank, instructor; gender, female; mean, 61462.22; median, 60900.00; std Dev, 12438.11; I Q R, 18512.42. Row 2: rank, instructor; gender, male; mean, 66067.21; median, 66684.00; std Dev, 14774.89; I Q R, 18922.34. Row 3: rank, assistant; gender, female; mean, 88363.27; median, 88482.00; std Dev, 14487.41; I Q R, 24970.73. Row 4: rank, assistant; gender, male; mean, 93989.75; median, 92847.00; std Dev, 12921.20; I Q R, 19881.49. Row 5: rank, associate; gender, female; mean, 102989.39; median, 99855.00; std Dev, 20006.99; I Q R, 31617.07. Row 6: rank, associate; gender, male; mean, 106261.66; median, 103363.00; std Dev, 26676.66; I Q R, 35168.99. Row 7: rank, professor; gender, female; mean, 132471.09; median, 130432.00; std Dev, 49073.93; I Q R, 78319.08. Row 8: rank, professor; gender, male; mean, 159612.19; median, 152824.00; std Dev, 62664.90; I Q R, 83208.55.

Which of the following statements are true? Select all that apply.

    1. Before taking rank into account, there is an association between gender and salary in this sample.
    2. After taking rank into account, there is an association between gender and salary in this sample.
    3. The nature of the relationship between gender and salary changes after adjusting for rank; thus rank satisfies the definition of a confounding variable in this scenario.
    4. The direction of the relationship between gender and salary changes after adjusting for rank; thus this scenario satisfies the definition of Simpson’s paradox.

Section P.B

Questions 14 through 16: A sample of caregivers was selected to be representative of the U.S. population. If there was more than one child in the caregiver’s household, the child asked about was determined randomly. Each caregiver was asked to estimate how much sleep their child typically gets at night. To predict sleep times, we can use the following statistical model:

Model 1: Predicted sleep = 8.26 hours, standard deviation = 1.33 hours

  1. Interpret the standard deviation.
    1. The range of the middle 50% of the data is 1.33 hours.
    2. A typical sleep time lies about 1.33 hours from the mean sleep time.
    3. 95% of the observations in this sample are in the range 8.26 1.33.
    4. 95% of the observations in this sample are higher than 1.33 hours.
  2. One caregiver estimates that their child gets 7 hours of sleep per night. Calculate the residual for this observation.

Solution: 7 – 8.26 = -1.26

  1. Suppose we add a new variable, Age, which explains a substantial amount of the variability in sleep times.

Model 2:

Predicted sleep = 7.63 hours for older children (age 12-17), 8.89 hours for younger children (age 6-11)

SE of the residuals = ?

Since Age explains a substantial amount of the variability in sleep times, we would expect the standard error of the residuals in Model 2 to be _______ (greater than/less than/equal to) the standard deviation in Model 1.

Questions 17 and 18: For a sample of 100 colleges, four different models were used to predict the average salary in the year after graduating. The graphs below show the residuals from these models.

"Four side by side histograms describe the results of average salary after graduation. The first histogram is titled, Graph 1, S D of residuals equals 6329. The horizontal axis has markings from negative 30000 to 30000 in increments of 10000. The distribution of the bars is approximately normal and it starts from negative 20000, and ends at 25000. There is no bar in between 15000 and 20000. The longest bar is at 5000 on the horizontal axis and the bars decrease in height to the left of 5000 and to the right of 5000. All values are approximate.

The second histogram is titled, Graph 2, S D of residuals equals 5768. The horizontal axis has markings from negative 30000 to 30000 in increments of 10000. The distribution of the bars is approximately normal and it starts from negative 15000, and ends at 25000. The longest bar is at 5000 on the horizontal axis and the bars decrease in height to the left of 5000 and to the right of 5000. All values are approximate.

The third histogram is titled, Graph 3, S D of residuals equals 5270. The horizontal axis has markings from negative 30000 to 30000 in increments of 10000. The distribution of the bars is approximately normal and it starts from negative 15000, and ends at 25000. There is no bar in between 15000 and 20000. The longest bar is at negative 5000 on the horizontal axis and the bars decrease in height to the left of negative 5000 and to the right of negative 5000. All values are approximate.

The fourth histogram is titled, Graph 4, S D of residuals equals 7169. The horizontal axis has markings from negative 30000 to 30000 in increments of 10000. The distribution of the bars is approximately normal and it starts from negative 20000, and ends at 30000. There is no bar in between 15000 and 25000.The longest bar is at negative 5000 on the horizontal axis and the bars decrease in height to the left of negative 5000 and to the right of negative 5000.  All values are approximate."

  1. Graph 1 shows the residuals from a model that uses a school’s admittance rate to predict average salary. Graph 2 shows the residuals from a model that uses a school’s total cost per year (out-of-state) to predict average salary. Which explanatory variable do you prefer?
    1. Admittance rate (Model 1), because more of the variability in average salaries is explained by the model.
    2. Total cost (Model 2), because there is less unexplained variability in average salaries after applying the model.
    3. Both models are equally good, because both have a distribution of residuals that is roughly bell-shaped.
    4. Neither of these models is useful for predicting average salary, because both have a distribution of residuals that is centered at 0.
  2. Suppose we used a model with two explanatory variables. What would the graph of the residuals look like if both admittance rate and cost were used to predict average salary?
    1. Graph 3 is reasonable, because the SD of the residuals is less than 5768.
    2. Graph 4 is a reasonable, because the SD of the residuals is greater than 6329.
    3. Neither Graph 3 nor Graph 4 is reasonable, because the SD of the residuals should be between 5768 and 6329.

Questions 19 and 20: A botanist collected data to measure the variation of Iris flowers of two related species. The data set consists of 50 flowers from two species of Iris – Iris setosa and Iris versicolor. Various features were measured including the length and width of the sepals, in centimeters. (Sepals are a part of the flower.)

The graphs below show the relationship between sepal length and sepal width. Graph A shows the line of best fit for the full sample. Graph B shows a separate line of fit for each species: red represents setosa and blue represents versicolor

"Two side by side scatterplots plots the relationship between sepal length and sepal width. The first scatterplot is titled, Graph A. The horizontal axis is labeled Sepal. Length and has markings from 4.5 to 7 in increments of 0.5.  The vertical axis is labeled Sepal. Width and has markings from 2 to 4.5 in increments of 0.5. Dots are plotted vertically for certain markings on the horizontal axis and some dots are randomly scattered throughout the graph. A regression line starts from 3.30 on the vertical axis, decreases to the right such that some of the dots lie above the line, some of the dots lie below the line, and few dots lie on the line. The dots are plotted from 4.26 to 7 on the horizontal axis and from 2 to 4.4 on the vertical axis. The concentration of dots is more between 5 and 6.75 on the horizontal axis and between 2.25 and 3.80 on the vertical axis. All values are approximate.

The second scatterplot is titled, Graph B. The horizontal axis is labeled Sepal. Length and has markings from 4.5 to 7 in increments of 0.5.  The vertical axis is labeled Sepal. Width and has markings from 2 to 4.5 in increments of 0.5. Dots are plotted vertically in an increasing trend for certain markings on the horizontal axis and few dots are randomly scattered throughout the graph. A red regression line starts from 3.10 on the vertical axis, increases upward to the right such that some of the dots lie above the line, some of the dots lie below the line, and few dots lie on the line. A blue regression line starts from 2 on the vertical axis, increases upward to the right such that some of the dots lie above the line, some of the dots lie below the line, and few dots lie on the line. The dots are plotted from 4.26 to 7 on the horizontal axis and from 2 to 4.4 on the vertical axis. The concentration of dots is more between 5 and 6.75 on the horizontal axis and between 2.25 and 3.80 on the vertical axis. All values are approximate."

Consider two models for predicting sepal width. Note that Model 2 accounts for Species. Some values are intentionally missing in Model 2.

Model 1: Predicted sepal width = 3.94 – 0.15(Sepal length), SE of residuals = 0.47

Model 2: Predicted sepal width = , SE of residuals =

  1. The letter B is a placeholder for a missing value in Model 2. Would you expect this value to be positive, negative, or close to 0?
    1. Positive
    2. Negative
    3. Close to 0
  2. We would expect the SE of the residuals for Model 2 to be _______ (>, <, =) the SE of the residuals for Model 1.

Document Information

Document Type:
DOCX
Chapter Number:
All in one
Created Date:
Aug 21, 2025
Chapter Name:
Preliminaries Test Bank 1e
Author:
Nathan Tintle

Connected Book

Intermediate Statistical Investigations 1st Ed - Exam Bank

By Nathan Tintle

Test Bank General
View Product →

$24.99

100% satisfaction guarantee

Buy Full Test Bank

Benefits

Immediately available after payment
Answers are available after payment
ZIP file includes all related files
Files are in Word format (DOCX)
Check the description to see the contents of each ZIP file
We do not share your information with any third party