Tintle Full Test Bank Preliminaries Test Bank 1e - Intermediate Statistical Investigations 1st Ed - Exam Bank by Nathan Tintle. DOCX document preview.
Preliminaries
Intermediate Statistical Investigations Test Bank
Question types: FIB = Fill in the blank Calc = Calculation
Ma = Matching MS = Multiple select
MC = Multiple choice TF = True-false
PRELIMINARIES TERMINAL LEARNING OUTCOMES
TLOP-A: Explore how the relationship between two variables can be impacted by additional variables.
TLOP-B: Express the relationship between two variables using a statistical model, and calculate and report the standard error of the residuals, a measure of prediction error.
PRELIMINARIES ENABLING LEARNING OUTCOMES
LOP-1: Identify and apply basic terminology of statistical studies: observational units, response variable, explanatory variable, association, confounding variable.
LOP-2: Identify potential sources and measures of variation in a response variable.
LOP-3: Produce and describe some basic visualizations and numerical summaries to compare groups and explore relationships (e.g., bar graphs, dotplots/histograms/boxplots, scatterplots, means, medians, standard deviation).
LOP-4: Explore how those comparisons and relationships can be impacted by additional variables.
LOP-5: Calculate a residual and relate it to typical prediction error.
Section P.A
Questions 1 and 2: In order for students to participate in remote learning, they need access to a computer or tablet for school work. An advocacy group conducted a survey of U.S. families and analyzed the relationship between household income (less than $25,000, between $25,000 and $50,000, or greater than $50,000) and access to a computer or tablet (no access, sometimes, always).
- Access to a computer/tablet is the __________ (explanatory/response) variable.
We can classify this variable as __________ (categorical/quantitative).
Income is the __________ (explanatory/response) variable.
We can classify this variable as __________ (categorical/quantitative).
- Which type of graph is most appropriate for displaying the relationship between income and access to a computer/tablet?
- A mosaic plot
- A histogram
- Stacked boxplots (one boxplot for each group)
- A scatterplot
- Which of the hypothetical mosaic plots below shows an association between class (first, second, third, or crew) and survival on the Titanic? Select all that apply.
- Graph A
- Graph B
- Graph C
Questions 4 through 7: Two hospitals, Memorial Hospital and Fairbanks Medical Center, both perform the same procedure to alleviate joint pain. Researchers surveyed a random sample of 100 patients from each hospital who had undergone this procedure and asked whether or not they had made a full recovery in a six month period after surgery. The table below shows the results of the survey. The mosaic plot further breaks down the results based on socioeconomic status (SES).
Recovery | |||
Hospital | Full | No | Total |
Fairbanks | 60 | 40 | 100 |
Memorial | 66 | 34 | 100 |
Total | 126 | 74 | 200 |
- Based on the table, is there an association between hospital and recovery in this sample? Note: This question refers to the sample, so there is no need to consider whether this data reflects a genuine tendency in the population.
- There is no association. More than 50% of patients recover, regardless of hospital.
- There is no association. The recovery rate is not the same at the two hospitals.
- There is an association. More than 50% of patients recover, regardless of hospital.
- There is an association. The recovery rate is not the same at the two hospitals.
- Suppose you are primarily interested in the potential association between hospital and recovery. Based on the mosaic plot, does SES satisfy the definition of a confounding variable in this scenario?
_______ (Yes/No), because Memorial Hospital is more likely to serve _______ (high/low) SES patients, compared to Fairbanks, and_______ (high/low) SES patients are more likely to make a full recovery.
- Does this scenario satisfy the definition of Simpson’s paradox?
_______ (Yes/No), because overall, the recovery rates are higher at _______ (Memorial/Fairbanks), but after adjusting for SES, the conditional recovery rates are higher at _______ (Memorial/Fairbanks).
- Put the components of the study into the correct boxes in the Sources of Variation Diagram. Note: Some boxes will include more than one answer.
Observed variation in: | Sources of | Sources of |
Inclusion criteria |
- Hospital (Memorial or Fairbanks)
- Has undergone medical procedure for joint pain
- Lifestyle factors (physicality of occupation, opportunities for rest, etc.)
- Recovery (full or no)
- Socioeconomic status (low SES or high SES)
- Physical therapy (as recommended, less than recommended, none)
Questions 8 through 13: In 2018, a sample of academic faculty from universities across the country were surveyed about their salaries (in US dollars). The results were classified according to each faculty member’s academic rank (instructor, assistant professor, associate professor, and full professor) and gender (male, female).
- Identify the observational units and variables. Note: Some blanks will include more than one answer. One of the answer choices will not be used.
Observational units: A. Academic faculty members
Explanatory variables: B. Universities
Response variable: C. Salary (in US dollars)
D. Academic rank
E. Gender
- Name the graph types being used to display the distribution of salaries.
- Bar graph
- Mosaic plot
- Histogram
- Boxplot
- Scatterplot
- Describe the relationship between the mean salary and the median salary.
The mean salary is ______ (greater/less) than the median salary, because the distribution of salaries is skewed ______ (right, left)
- True or False: There is an association between gender and salary in this sample.
- Use the mosaic plot to match the percentages below to the appropriate statement describing the data.
Statement 1: _____% of the faculty in this sample are male.
Statement 2: _____ % of the faculty who have reached the rank of professor are male.
Statement 3: _____% of female faculty members are at the rank of instructor.
Statement 4: _____% of male faculty members are at the rank of instructor.
- 7%
- 22%
- 68%
- 81%
- Using the tables below, compare the distribution of salaries for male and female faculty members, before and after taking rank into account.
Which of the following statements are true? Select all that apply.
- Before taking rank into account, there is an association between gender and salary in this sample.
- After taking rank into account, there is an association between gender and salary in this sample.
- The nature of the relationship between gender and salary changes after adjusting for rank; thus rank satisfies the definition of a confounding variable in this scenario.
- The direction of the relationship between gender and salary changes after adjusting for rank; thus this scenario satisfies the definition of Simpson’s paradox.
Section P.B
Questions 14 through 16: A sample of caregivers was selected to be representative of the U.S. population. If there was more than one child in the caregiver’s household, the child asked about was determined randomly. Each caregiver was asked to estimate how much sleep their child typically gets at night. To predict sleep times, we can use the following statistical model:
Model 1: Predicted sleep = 8.26 hours, standard deviation = 1.33 hours
- Interpret the standard deviation.
- The range of the middle 50% of the data is 1.33 hours.
- A typical sleep time lies about 1.33 hours from the mean sleep time.
- 95% of the observations in this sample are in the range 8.26 1.33.
- 95% of the observations in this sample are higher than 1.33 hours.
- One caregiver estimates that their child gets 7 hours of sleep per night. Calculate the residual for this observation.
Solution: 7 – 8.26 = -1.26
- Suppose we add a new variable, Age, which explains a substantial amount of the variability in sleep times.
Model 2:
Predicted sleep = 7.63 hours for older children (age 12-17), 8.89 hours for younger children (age 6-11)
SE of the residuals = ?
Since Age explains a substantial amount of the variability in sleep times, we would expect the standard error of the residuals in Model 2 to be _______ (greater than/less than/equal to) the standard deviation in Model 1.
Questions 17 and 18: For a sample of 100 colleges, four different models were used to predict the average salary in the year after graduating. The graphs below show the residuals from these models.
- Graph 1 shows the residuals from a model that uses a school’s admittance rate to predict average salary. Graph 2 shows the residuals from a model that uses a school’s total cost per year (out-of-state) to predict average salary. Which explanatory variable do you prefer?
- Admittance rate (Model 1), because more of the variability in average salaries is explained by the model.
- Total cost (Model 2), because there is less unexplained variability in average salaries after applying the model.
- Both models are equally good, because both have a distribution of residuals that is roughly bell-shaped.
- Neither of these models is useful for predicting average salary, because both have a distribution of residuals that is centered at 0.
- Suppose we used a model with two explanatory variables. What would the graph of the residuals look like if both admittance rate and cost were used to predict average salary?
- Graph 3 is reasonable, because the SD of the residuals is less than 5768.
- Graph 4 is a reasonable, because the SD of the residuals is greater than 6329.
- Neither Graph 3 nor Graph 4 is reasonable, because the SD of the residuals should be between 5768 and 6329.
Questions 19 and 20: A botanist collected data to measure the variation of Iris flowers of two related species. The data set consists of 50 flowers from two species of Iris – Iris setosa and Iris versicolor. Various features were measured including the length and width of the sepals, in centimeters. (Sepals are a part of the flower.)
The graphs below show the relationship between sepal length and sepal width. Graph A shows the line of best fit for the full sample. Graph B shows a separate line of fit for each species: red represents setosa and blue represents versicolor
Consider two models for predicting sepal width. Note that Model 2 accounts for Species. Some values are intentionally missing in Model 2.
Model 1: Predicted sepal width = 3.94 – 0.15(Sepal length), SE of residuals = 0.47
Model 2: Predicted sepal width = , SE of residuals =
- The letter B is a placeholder for a missing value in Model 2. Would you expect this value to be positive, negative, or close to 0?
- Positive
- Negative
- Close to 0
- We would expect the SE of the residuals for Model 2 to be _______ (>, <, =) the SE of the residuals for Model 1.
Document Information
Connected Book
Intermediate Statistical Investigations 1st Ed - Exam Bank
By Nathan Tintle