Ch1 + Intermediate Statistical | Test Questions & Answers - Intermediate Statistical Investigations 1st Ed - Exam Bank by Nathan Tintle. DOCX document preview.
Chapter 1
Intermediate Statistical Investigations Test Bank
Question types: FIB = Fill in the blank Calc = Calculation
Ma = Matching MS = Multiple select
MC = Multiple choice TF = True-false
CHAPTER 1 TERMINAL LEARNING OUTCOMES
TLO1-1: Apply the six-step investigative process in the context of a well-designed experiment.
TLO1-2: Partitioning variation in the response variable into variation explained by the model and unexplained variation, and measuring and reporting the percentage of variation explained
TLO1-3: Assess the statistical significance of the difference between two groups on a quantitative response variable using both simulation and theory-based approaches
TLO1-4: Compare more than two treatments on a quantitative response using both simulation and theory-based approaches
TLO1-5: Apply Post-hoc analysis after significant F-test (pairwise differences, as well as confidence and prediction intervals for single means)
TLO1-6: Understand statistical power and how it is impacted by sample size, variability within groups, number of groups, and significance level
Section 1.1: Sources of Variation in an Experiment
LO1.1-1: Apply the six-step investigative process.
LO1.1-2: Distinguish experiments and observational studies.
LO1.1-3: Review basic study design principles such as inclusion criteria and random assignment.
LO1.1-4: Define terminology specific to an experimental study (e.g., treatments).
LO1.1-5: Produce a Sources of Variation diagram for an experiment.
Questions 1 through 3: A study published in Psychological Science in 2007 examined a possible link between mindset and health. The following is an excerpt from the abstract of the article: “84 female room attendants working in seven different hotels were measured on physiological health variables affected by exercise. Those in the informed condition were told that the work they do (cleaning hotel rooms) is good exercise and satisfies the Surgeon General's recommendations for an active lifestyle. Examples of how their work was exercise were provided. Subjects in the control group were not given this information.”
- Identify the experimental units in this study.
- The eighty-four room attendants
- The seven different hotels
- The physiological health variables
- The two groups (informed and control)
- The researchers chose to include room attendants from seven different hotels (as opposed to using stricter inclusion criteria that would limit the study to room attendants at one particular hotel). Describe the consequences of this decision.
Using broader inclusion criteria may ______ (increase/decrease) the amount of variation in the observed health outcomes. However, this decision also __________ (supports/limits) generalizability to a larger population of room attendants.
- Room attendants were randomly assigned to either the informed condition or the control group. What is the most important reason for the random assignment?
- Random assignment ensures that the study is double-blind.
- Random assignment reduces the impact of outliers.
- Random assignment creates two groups of room attendants that are as similar as possible, which supports cause-and-effect conclusions.
- Random assignment makes it possible to generalize the results to the population.
Questions 4 through 6: An online retailer is using an experiment to decide whether to modify their website. When visitors type in the web address or click a link to the site, they are randomly re-directed to one of two versions of the website: the version that has been in use for the last year (version A) or an updated version (version B). The retailer’s goal is to maximize the amount of time (in minutes) visitors stay on the site.
- Identify the experimental units and variables. Note: One of the answer choices will not be used.
Experimental units: A. Version of the website (A and B)
Explanatory variable: B: Online retailers
Response variable: C: Visitors to the website
D: Time spent on the website (in minutes)
- Consider two possible models for analyzing time spent on this retailer’s website.
Single-mean model:
Separate-means model:
Does the version of the website appear to explain any of the variation in time spent on the site? Note: If more than one of these justifications is appropriate, select multiple answers.
- Yes, because the mean time spent on the site is higher for Version B than for Version A.
- Yes, because the SE of the residuals is smaller for the separate-means model than for the single-mean model.
- No, because the mean time spent on the site is not the same for Version A and for Version B.
- No, because the SE of the residuals is smaller for the separate-means model than for the single-mean model.
- The researcher decides that the difference between Version A and Version B in this study is meaningful. Is it reasonable to generalize these results to all customers of this retailer?
- Yes, because visitors to the website were randomly assigned to either Version A or Version B.
- Yes, because the study’s inclusion criteria would exclude potential subjects who are not customers.
- It depends whether visitors to the website knew about the research question being investigated. The study may not be double-blind.
- It depends who visited the website during the study period. The sample may not be representative.
Questions 7 through 8: Researchers at a university were interested in the effectiveness of a calculus workshop program for students who fail Calculus I and need to retake the course. As part of the study, students who were retaking Calculus I were allowed to enroll in a calculus workshop at their own discretion. At the end of the grading term, all students (even those with different instructors) took the same final exam. The researchers then compared the scores for those who enrolled in the workshop while re-taking calculus to those who re-took calculus without enrolling in the workshop.
- Is this an experiment? Justify your answer.
- Yes, this is an experiment, because there was a treatment group who enrolled in the calculus workshop and a control group that did not.
- Yes, this is an experiment, because the study was double-blind (as long as the calculus teachers did not know which students enrolled in the workshop).
- No, this is an observational study, because it does not take place in a laboratory or other tightly controlled research environment.
- No, this is an observational study, because the choice of whether to participate in the workshop was made by the students not the researchers.
- Which of the following are sources of unexplained variation in this study? Select all that apply.
- Whether or not students enrolled in the workshop
- Whether or not students had failed a calculus class in the past
- Student attendance in class (number of absences)
- Student motivation to study calculus
- Calculus instructor
- Difficulty of the final exam
- A study published in Athletic Training examined the effects of three different types of knee stabilizing braces on agility test speed. College football players from all different positions (running back, wide receiver, linebacker, lineman, etc.) were recruited to participate in the study. All players in the study had torn their ACL (anterior cruciate ligament) in the past, and needed to wear a knee brace to play football. Agility tests were administered in an outdoor football stadium, and the time to complete the test was recorded by a Lafayette photoelectric Cell and Light Time Unit (in seconds).
Put the components of the study into the correct boxes in the Sources of Variation Diagram. Note: Some boxes will include more than one answer.
Observed variation in: | Sources of | Sources of |
Inclusion criteria: Design: |
- College football players from all different positions
- Type of knee brace
- Players’ current condition (health, mood, motivation, etc.)
- Time to complete agility test (in seconds)
- Measurement error
- History of torn ACL and need for a knee brace
- Details of the agility test
- Players’ natural speed and agility
- Environmental factors (weather, wind, etc.)
- In a separate-means model, the standard error of the residuals can be thought of as the typical deviation of an observed response from:
- The residuals (prediction errors)
- The response predicted by the model (group mean)
- The overall mean of the response variable
- The overall mean of the explanatory variable
- A study published in the Journal of Sports Science & Medicine tested the effectiveness of the Power Balance © bracelet, which has been marketed as a way to improve balance, flexibility, strength, and power through the use of hologram technology. Subjects, who were all college athletes, completed tests of their athletic performance while wearing either a Power Balance © bracelet or a plain rubber placebo bracelet. The bracelets were covered with a wristband, so the athletes and those measuring their performance could not see which bracelet was being worn. Only the researcher who analyzed the data knew which measurements corresponded to the Power Balance © bracelet and which to the placebo bracelet. Classify this study.
- This study is not blinded.
- This is a single-blind study.
- This is a double-blind study.
- There is not enough information to classify this study.
- Which of the following is an experiment? Select all that apply.
- Executives at a large department store chain selected 100 stores and randomly assigned 50 of them to reduce their hours, opening an hour later than before; hours for the other 50 stores were not changed. After six months, the executives compared revenue for the two groups of stores.
- A researcher recruited a group of American adults whose demographics were similar to the American population. The researcher measured each subject’s forced expiratory volume, an indicator of lung function. Then each subject was asked whether or not they smoke cigarettes.
- A survey was administered to a large group of high school students. The survey asked whether the students were employed outside of school (in a paying job) and how much sleep they got the night before (in hours).
- A university professor teaches two sections of introductory statistics: one section meets at 8:00 am and the other meets at 11:00 am. She wants to evaluate the effectiveness of a new method of teaching statistics compared to the standard method she has used in the past. She flips a coin to assign her sections to teaching methods and determines that she will use the standard method at 8:00 am and the new method at 11:00 am. At the end of the term, she compares the final exam scores for the two sections. Which of the following best describes the potential for confounding in this scenario?
- Different students have different levels of talent and motivation, so it is impossible to attribute differences in final exam scores to the teaching method.
- The two sections may not be exactly the same size, which would lead to inappropriate comparisons between the treatment and control groups.
- Students may find it difficult to pay attention at 8:00 am, which may negatively impact the exam scores of those taught with the standard method.
- Confounding is not a concern in this scenario, because the professor used random assignment as part of the study design.
- True or False: The standard error of the residuals is a way to measure the amount of variation in the response variable that remains unexplained after applying the model.
- True or False: Because of the possibility of confounding, you should always avoid using causal language (action verbs like “affect” and “lead to”) in your conclusions.
Section 1.2: Quantifying Sources of Variation
LO1.2-1: Partitioning variation in the response variable into variation explained by the model and unexplained variation.
LO1.2-2: Measuring percentage of variation explained.
LO1.2-3: Understanding effect size and practical significance.
Questions 1 through 3: Dog agility is a sport where trainers guide their dogs through an obstacle course as quickly as possible. Two trainers, Abby and Lauren, have dogs participating in agility competitions. Each dog completes the same agility course, and they measure the time it takes each dog to complete the course (in seconds). The times for Abby’s three dogs were 30, 40, and 50. The times for Lauren’s three dogs were 50, 60, and 70.
- Calculate the Sum of Squares Total (SSTotal).
Solution:
- Calculate the Sums of Squared Errors (SSError).
Solution:
- Calculate the Sum of Squares for the Model (SSModel).
Solution:
Questions 4 and 5: Dog agility is a sport where trainers guide their dogs through an obstacle course as quickly as possible. Two trainers, Abby and Lauren, have dogs participating in agility competitions. Each dog completes the same agility course, and they measure the time it takes each dog to complete the course in seconds. Compare the sums of squares for two possible datasets that could occur in this context.
Dataset 1:
Times for Abby’s dogs: 30, 40, 50
Times for Lauren’s dogs: 50, 60, 70
Dataset 2:
Times for Abby’s dogs: 20, 40, 60
Times for Lauren’s dogs: 40, 60, 80
- SSTotal for Dataset 1 __________ (<, >, or =) SSTotal for Dataset 2
SSModel for Dataset 1 __________ (<, >, or =) SSModel for Dataset 2
SSError for Dataset 1 __________ (<, >, or =) SSError for Dataset 2
- The R2 value for Dataset 1 __________ (<, >, or =) the R2 value for Dataset 2
Questions 6 and 7: An office designer claims that a new ergonomic desk chair makes typing at a computer terminal faster and easier. A client company plans to test it by asking 30 employees who do a lot of typing to take part in an experiment. They will randomly assign 15 employees to use the new ergonomic chair and 15 to use a regular chair. The 30 employees will then type a selected passage for 5 minutes, recording the total number of words that are typed correctly.
Consider two hypothetical data sets that could result from this experiment:
- Which of the datasets would result in a larger R2 value?
- Dataset 1 would result in a larger R2 value, because SSModel for Dataset 1 is larger than SSModel for Dataset 2.
- Dataset 1 would result in a larger R2 value, because SSError for Dataset 1 is smaller than SSError for Dataset 2.
- Dataset 2 would result in a larger R2 value, because SSModel for Dataset 2 is larger than SSModel for Dataset 1.
- Dataset 2 would result in a larger R2 value, because SSError for Dataset 2 is smaller than SSError for Dataset 1.
- Compare the value of the effects for the two datasets.
- The effects for Dataset 1 would be larger (in absolute value), because in Dataset 1 there is a smaller difference between the group means.
- The effects for Dataset 2 would be larger (in absolute value), because in Dataset 2 there is a larger difference between the group means.
- The effects for Dataset 1 would be the same size as the effects for Dataset 2, because the standard deviations are the same for both datasets.
- The effects for Dataset 1 would be the same size as the effects for Dataset 2, because the sample sizes are the same for both datasets.
Questions 8 through 11: The graphs below display the outcomes of three different experiments to compare a treatment group with a control group.
- For which of the experiments does SSModel = 0? Select one or more than one.
- Experiment A
- Experiment B
- Experiment C
- For which of the experiments does SSError = 0? Select one or more than one.
- Experiment A
- Experiment B
- Experiment C
- The R2 value for Experiment B is _______ (0, 0.5, 1), because ________ (none, half, all) of the variability in outcomes is explained by the treatment group model.
The R2 value for Experiment C is ________ (0, 0.5, 1), because ________ (none, half, all) of the variability in outcomes is explained by the treatment group model.
- The value of the effect for the treatment group in Experiment B is _____ (-4, -2, 0, 2, or 4).
The value of the effect for the treatment group in Experiment C is _____ (-4, -2, 0, 2, or 4).
Questions 12 and 13: A statistics class conducted an experiment to investigate whether standing heart rates tend to be higher than sitting heart rates. Students were randomly assigned to either sit or stand, then the students measured their heart rates (in beats per minute). They used software to calculate the sums of squares and found that SSModel = 614.8 and SSTotal = 13232.1
- Calculate SSError.
Solution:
- Calculate the R2 value. Give your answer as a proportion.
Solution:
- An online retailer is using an experiment to decide whether to modify their website. When visitors type in the web address or click a link to the site, they are randomly re-directed to one of two versions of the website: the version that has been in use for the last year (version A) or an updated version (version B). The retailer’s goal is to maximize the amount of time (in minutes) visitors stay on the site.
Single-mean model:
Separate-means model:
Calculate the effect for Version A.
Solution:
Questions 15 and 16: The following output displays the amount (in dollars) that a sample of male and female college students spent on their most recent haircuts.
- Calculate SSModel. Note that the sample sizes are the same for the two groups.
Solution:
- Calculate the standard error of the residuals. Note that the sample sizes are the same for the two groups.
Solution:
- Match each sum of squares (SS) with its description.
SSModel: A. Measures the amount of variability in the response variable without accounting for groups
SSError: B. Measures the variability within groups, the variability unexplained by the model
SSTotal: C. Measures the variability between groups, the variability explained by the model
- In general, study results are considered practically significant when the R2 value is ______ (large/small) and the effect sizes are ______ (large/small).
- What is the cutoff for determining whether study results are practically significant?
- Results are practically significant when the R2 value is less than 0.05.
- Results are practically significant when the R2 value is less than 0.5.
- Results are practically significant when the R2 value is greater than 0.5.
- Results are practically significant when the R2 value is greater than 0.95.
- There is no set cut-off for practical significance. It differs based on context.
- If a researcher were unhappy with his/her effect size or R2 value, what steps could they take when planning a follow-up study?
- They could try to improve the model by adding new explanatory variables.
- They could try to reduce unexplained variation in the response through experimental controls or stricter inclusion criteria.
- Both of these strategies are reasonable.
- Neither of these strategies would impact the effect size or R2 value.
Section 1.3: Is the Variation Explained Statistically Significant
LO1.3-1: Carry out and evaluate a randomization test comparing two groups on a quantitative response variable.
LO1.3-2: Assess the statistical significance of a two-group comparison.
LO1.3-3: Apply two-sample t-procedures for tests of significance and confidence intervals.
Questions 1 through 3: The following output displays the amount (in dollars) that a sample of male and female college students spent on their most recent haircuts. You may assume that the sample is representative of a larger population.
- Does this data provide strong evidence that female students spend more per haircut than male students, on average? Choose the appropriate statement of the null hypothesis.
- Interpret the 95% confidence interval.
- We are 95% confident that the sample mean for females is between $6.10 and $38.20 higher than the sample mean for males.
- We are 95% confident that the population mean for females is between $6.10 and $38.20 higher than the population mean for males.
- If we randomly select one male student and one female student from this sample, we are 95% confident that the haircut value for the female student will be higher.
- If we randomly select one male student and one female student from the population, we are 95% confident that the haircut value for the female student will be higher.
- Based on the 95% confidence interval, we would expect the two-sided p-value to be _______ (>, <, or =) 0.05.
Questions 4 through 6: A study published in Psychological Science in 2007 examined a possible link between mindset and health. The following is an excerpt from the abstract of the article: “84 female room attendants working in seven different hotels were measured on physiological health variables affected by exercise. Those in the informed condition were told that the work they do (cleaning hotel rooms) is good exercise and satisfies the Surgeon General's recommendations for an active lifestyle. Examples of how their work was exercise were provided. Subjects in the control group were not given this information.”
Over the course of four weeks, the informed group lost an average of 1.79 lbs and the uninformed group lost an average of 0.20 lbs. Are these results statistically significant? To decide, we can use a randomization test. The dotplot below shows 1000 simulated differences in mean weight loss: .
- Suppose we want to use the 3S Strategy to investigate whether being informed about how their work qualifies as exercise affects room attendants’ weight loss. How would we design the simulation?
- Write the weight loss amounts on 84 cards. Shuffle and deal them into two groups to represent the informed and uninformed groups. Calculate the difference of means. Repeat.
- Write the group labels (informed or uninformed) on 84 cards. Shuffle and deal them into groups to represent weight loss. Calculate the difference of means. Repeat.
- Both of these designs are appropriate in this context.
- Neither of these designs is appropriate in this context.
- Why is the distribution of simulated statistics centered at 0?
- Because some of the room attendants in the sample gained weight and others lost weight, but the average is close to 0
- Because if we repeated this study again, some of the results would be positive and some of the results would be negative, but the average is close to 0
- Because the statistics were simulated under the assumption that being informed about how their work qualifies as exercise doesn’t affect room attendants’ weight loss
- Because the data in this study do not provide sufficient evidence to conclude that being informed about how their work qualifies as exercise affects room attendants’ weight loss
- Estimate the p-value. Include three decimal places in your answer.
Solution: Approximately 8 out of 1000 simulated differences were greater than or equal to 1.59, so we estimate that the p-value is 0.008
Questions 7 through 9: Researchers used an experiment to investigate whether cell phone use impairs drivers’ reaction times. 64 students who volunteered to participate in the study were assigned to one of two driving conditions: cell phone use (n1=32) or no distractions (n2=32). The students then participated in a simulation of driving situations, pressing a brake button as soon as they saw a red light. A device recorded their reaction times (in milliseconds).
- The standardized statistic for testing whether
is
. Interpret.
- The standard deviation of the reaction times is 2.72 milliseconds.
- The standard deviation of the reaction times is 2.72 milliseconds higher than we would have expected based on the null hypothesis.
- The sample mean for the cell phone group is 2.72 milliseconds above the sample mean for the control group.
- The sample mean for the cell phone group is 2.72 standard errors above the sample mean for the control group.
- You want to use a theory-based pooled t-test to assess the statistical significance of the difference between these two groups, so you would use a t-distribution with ____ degrees of freedom.
- The graph below shows the appropriate t-distribution for assessing the statistical significance of the difference between these two groups. Which of the following statements includes a reasonable p-value and conclusion?
- The p-value = 0. This study provides only weak evidence to suggest that cell phone use impairs drivers’ reaction times.
- The p-value = 0.0042. This study provides strong evidence to suggest that cell phone use impairs drivers’ reaction times.
- The p-value = 0.2495. This study provides only weak evidence to suggest that cell phone use impairs drivers’ reaction times.
- The p-value = 0.4856. This study provides strong evidence to suggest that cell phone use impairs drivers’ reaction times.
Questions 10 and 11: Anchoring is the common human tendency to rely too heavily, or “anchor”, on one trait or piece of information when making decisions. A group of statistics students from California were asked to guess the population of Milwaukee, Wisconsin. Some of the students were randomly chosen to be told that the nearby city of Chicago, Illinois, has a population of about 3 million people, while the rest of the students were told that the nearby city of Green Bay, Wisconsin, has a population of about 100,000.
City | N | Mean | SD |
Chicago | 35 | 1357.34 | 802.21 |
Green Bay | 34 | 271.38 | 370.96 |
- Based on the information given above, is the effect of anchoring statistically significant in this context?
- Yes, because the difference of means observed in this sample would be unlikely to occur if anchoring really had no effect.
- Yes, because the sample means in this study are different and the sample sizes are both larger than 20.
- No, because the shuffled differences in means are centered at 0, so it is reasonable to conclude that anchoring has no effect.
- No, because shuffled differences in means generally fall between -500 and 500. That means the results of this experiment were an unlikely fluke.
- Are the validity conditions met for a theory-based pooled t-test?
- No, because the samples are not independent of each other.
- No, because the sample sizes are not the same.
- No, because the sample standard deviations for the two groups are very different.
- Yes. The only potential violation is the skewness in the sample distributions, but this is not a problem, because the sample sizes are both larger than 20.
Questions 12 through 14: A psychology study (Rutchick, Slepian, and Ferris, 2010) investigated whether using a red pen causes people to assign lower scores than using a blue pen. A group of 128 students in an undergraduate psychology class were asked to grade 128 different eighth graders’ essays on a scale of 0—100. Half of the students were randomly assigned a red pen while grading, and the other half were given blue. The results are given in the table below:
Pen Color | N | Mean Score | Standard Deviation |
Red | 64 | 76.20 | 12.29 |
Blue | 64 | 80.00 | 9.36 |
- State the alternative hypothesis.
- Is it appropriate to use a pooled t-test to compare these groups?
- Yes, a pooled t-test is appropriate, because the group means are fairly similar.
- Yes, a pooled t-test is appropriate, because the group standard deviations are fairly similar.
- No, an unpooled two-sample t-test is more appropriate, because it is not reasonable to assume that the group means are equal in the population.
- No, an unpooled two-sample t-test is more appropriate, because it is not reasonable to assume that the group standard deviations are equal in the population.
- Calculate the t-statistic. Note that the sample sizes are equal.
Solution: SE of residuals =
- In the context of a pooled t-test for comparing two groups, what does it mean to say that the study results are statistically significant? Select all that apply.
- It means there is large difference between the means of the two groups.
- It means there is small difference between the means of the two groups.
- It means the observed sample difference in means would be unlikely to occur if there were really no difference between the two groups in the population.
- It means the observed sample difference in means would be likely to occur if there were really no difference between the two groups in the population.
- Which of the following statistics reflect both the effect size and the sample size? In other words, which of the following statistics can be used to assess statistical significance? Select all that apply.
- R2
- Difference in means,
- Standardized statistic, t
- p-value
- True or False: The p-value is the probability that the null hypothesis is true.
- True or False: A small p-value indicates strong evidence against the null hypothesis.
- True or False: A confidence interval is a range of plausible values for a parameter calculated using sample statistics.
Section 1.4: Comparing Several Groups
LO1.4-1: Compare more than two treatments using randomization tests.
LO1.4-2: Calculate an F-statistic and use the F-distribution to find theory-based p-values.
LO1.4-3: Assess the validity of an F-test.
LO1.4-4: Complete an Analysis of Variance table.
Questions 1 through 4: Does seeing a picture have any effect on college students’ understanding of ambiguous prose? 57 students were randomly assigned to three groups: 19 saw a picture before reading a difficult passage of text, 19 saw the picture after reading the passage, and 19 were shown no picture at all. The groups were then tested on their reading comprehension and assigned a quantitative score.
Does this data provide convincing evidence that seeing a picture has an effect on reading comprehension scores?
- Which of the following is an appropriate statement of the null hypothesis? Select all that apply.
At least one
differs from the others
There is an association between seeing a picture (before, after, or not at all) and reading comprehension scores.
There is no association between seeing a picture (before, after, or not at all) and reading comprehension scores.
- Which of the following is an appropriate statement of the alternative hypothesis? Select all that apply.
At least one
differs from the others
There is an association between seeing a picture (before, after, or not at all) and reading comprehension scores.
There is no association between seeing a picture (before, after, or not at all) and reading comprehension scores.
- What does the graph of shuffled R-squared statistics represent?
- The values of R2 for all 57 students who participated in this study
- The values of R2 that would occur if the null hypothesis were really true
- The values of R2 that would occur if the alternative hypothesis were really true
- The values of R2 that would occur if we repeated this study many times in the real world
- Does this study provide strong evidence that seeing a picture affects reading comprehension?
- Yes, because an R2 value of 0.271 does not appear in the graph of shuffled R-squared statistics, which suggests it would be unlikely to occur by chance alone.
- Yes, because the mean comprehension scores are different in each group, and all the validity conditions for the significance test are satisfied.
- No, because the graph of shuffled R-squared statistics is centered at 0.036, which suggests that very little variability is explained by the model.
- No, because an R2 value of 0.271 does not appear in the graph of shuffled R-squared statistics, which suggests the data from this experiment was a fluke that occurred by chance alone.
Questions 5 and 6: A study was carried out to investigate whether the type of message on the back of customer checks at a restaurant would affect tips (recorded as a percentage of the total bill). Sixty tables were selected to participate over a weekend at a restaurant in Philadelphia. Each table was randomly assigned to receive either (1) a picture of a happy face, (2) the words “Thank you!” written out, or (3) no message.
Does this study provide convincing evidence that the message written on the back of the check affects tip percentage? You can use a randomization test to decide.
- How would you design the physical simulation?
Write the tip percentages on cards. Shuffle and deal the cards into ____ groups.
- 2
- 3
- 20
- 60
- Which of the following statistics could be used to summarize each simulated sample? Select all that apply.
- Difference in means
- R2
- t-statistic
- F-statistic
- A food company was interested in how texture might affect the palatability of a particular food. They set up an experiment in which they looked at whether the “coarseness” of the final product (coarse or fine) affected the palatability scores given by 50 people. A partially filled in ANOVA table is given below. Calculate the F-statistic.
Source | df | SS | MS | F |
Model | ? | |||
Error | 6113 | |||
Total | 16722 |
Solution:
Questions 8 through 11: Multiple researchers have conducted studies to examine the time it takes for three different medications to register in a patient’s blood system (in minutes). Each researcher wants to test whether the type of medication affects time.
- Which researcher would obtain a larger F-statistic based on their results?
- Researcher 1 would obtain a larger F-statistic.
- Researcher 2 would obtain a larger F-statistic.
- Researchers 1 and 2 would obtain very similar F-statistics.
- There is not enough information to determine which F-statistic would be larger.
- Compare the results for Researcher 1 and Researcher 3.
Which researcher would obtain a larger F-statistic based on their results?
- Researcher 1 would obtain a larger F-statistic.
- Researcher 3 would obtain a larger F-statistic.
- Researchers 1 and 3 would obtain very similar F-statistics.
- There is not enough information to determine which F-statistic would be larger.
- Researcher 4 had a total sample size of 90, with 30 patients per treatment group, and a standardized statistic of F = 12.72. She intends to use the F-distribution to find a theory-based p-value. Which F-distribution should she use?
She should use an F-distribution with Model df = _______ and Error df = ________.
- Researcher 4 had a total sample size of 90, with 30 patients per treatment group, and a standardized statistic of F = 12.72. The graph below shows the appropriate F-distribution for assessing the statistical significance of the association between type of medication and time. Which of the following statements includes a reasonable p-value and conclusion?
- The p-value is close to 0. This study provides only weak evidence to suggest that type of medication has an effect on the time it takes for the medication to register.
- The p-value is close to 1. This study provides only weak evidence to suggest that type of medication has an effect on the time it takes for the medication to register.
- The p-value is close to 0. This study provides strong evidence to suggest that type of medication has an effect on the time it takes for the medication to register.
- The p-value is close to 1. This study provides strong evidence to suggest that type of medication has an effect on the time it takes for the medication to register.
- The table below summarizes the results of an experiment to compare yields (as measured by the dried weight of plants) obtained under a control and two different treatment conditions. Calculate SSModel.
Sample Size | Mean | SD | |
Full sample | 30 | 5.07 | 0.701 |
Treatment 1 | 10 | 5.03 | 0.583 |
Treatment 2 | 10 | 4.66 | 0.794 |
Treatment 3 | 10 | 5.53 | 0.443 |
Solution:
- The following output displays the amount (in dollars) that a sample of male and female college students spent on their most recent haircuts. You may assume that the sample is representative of a larger population. Calculate the F-statistic.
Solution:
- A randomized experiment was conducted exploring the effectiveness of acupuncture in treating chronic lower back pain. Patients in the study were randomly assigned to one of three treatment groups: Verum acupuncture (traditional Chinese medicine), Sham acupuncture (placebo), and nonacupuncture therapy (drugs, physical therapy, etc.). After six months, each patient’s pain reduction was measured on a quantitative scale. Which inference procedure(s) could you use to test for an association between type of treatment and pain reduction?
- Two sample t-test (pooled or unpooled)
- ANOVA F-test
- Both of these tests are appropriate in this scenario.
- Neither of these tests is appropriate in this scenario.
- A random sample of 1450 birth records was selected from the state of North Carolina in the year 2001. One question of interest is whether the distribution of birth weights (in ounces) differs based on the race/ethnicity of the mother (White, Black, Hispanic, or other).
Would an ANOVA F-test be valid for these data?
- No, because the samples are not independent of each other.
- No, because the sample sizes for the groups are not similar enough.
- No, because distribution of weights is slightly skewed left for two of the groups.
- Yes, because all of the validity conditions are met.
- True or False: The validity conditions for the ANOVA F-test are the same as the validity conditions for the two-sample pooled t-test.
Section 1.5: Confidence and Prediction Intervals
LO1.5-1: Apply post-hoc analysis after significant F-test (pairwise differences).
LO1.5-2: Calculate and interpret confidence intervals on single means and differences in two means.
LO1.5-3: Calculate and interpret prediction intervals on quantitative variables.
LO1.5-4: Identify factors that impact widths of confidence intervals and prediction intervals.
Questions 1 through 3: Does seeing a picture have any effect on college students’ understanding of ambiguous prose? 57 students were randomly assigned to three groups: 19 saw a picture before reading a difficult passage of text, 19 saw the picture after reading the passage, and 19 were shown no picture at all. The groups were then tested on their reading comprehension and assigned a quantitative score. The pairwise confidence intervals for the difference in mean comprehension scores are given below.
95% confidence interval for (-2.60, -0.88)
95% confidence interval for (-1.02, 0.70)
95% confidence interval for (0.72, 2.44)
- Based on the confidence intervals, which conditions are significantly different from each other? Select all that apply.
- After is significantly different from Before.
- After is significantly different from None.
- Before is significantly different from None.
- Which of the following letters tables is consistent with the confidence intervals given above?
- Group Letters
Before A
After A
None A
- Group Letters
Before A
After B
None B
- Group Letters
Before A
After B
None C
- The confidence intervals do not provide enough information to construct a letters table.
- How could you use a confidence interval to decide whether the difference in comprehension scores for the Before group and the None group is important in a practical sense?
- Subtract the endpoints to find the width of the interval. If the interval is wide, then the true difference is practically important.
- Subtract the endpoints to find the width of the interval. If the interval is narrow, then the true difference is practically important.
- Look to see whether the interval includes 0. If the interval does not include 0, then the true difference is practically important.
- Look to see whether the endpoints of the interval are close to 0 or far from 0. If the endpoints are far from 0, then the true difference may be practically important.
Questions 4 and 5: An experiment was conducted to compare yields (as measured by the dried weight of plants in kg) obtained under a control and two different treatment conditions. The pairwise confidence intervals for the difference in mean yields are given below.
95% confidence interval for (-1.44, -0.29)
95% confidence interval for (-0.94, 0.20)
95% confidence interval for (-0.08, -1.07)
- Based on the confidence intervals, which of the treatments are significantly different from each other?
- All three treatments are significantly different, because none of the confidence intervals have similar endpoints.
- All three treatments are significantly different, because none of the confidence intervals have midpoints that are equal to 0.
- None of the treatments are significantly different, because the widths of the confidence intervals are all very similar to each other.
- Treatment 1 is significantly different from Treatment 2, because the confidence interval for this comparison does not include 0.
- The grower’s goal is to find a treatment that produces yields of at least 1.4 kg, on average. Which of the treatments meets this goal?
- None of the treatments meet the grower’s goal.
- Only Treatment 2 meets the grower’s goal.
- Both Treatment 1 and Treatment 2 meet the grower’s goal.
- The pairwise confidence intervals for the difference in mean yields do not provide enough information to decide if the treatments meet the grower’s goal.
Question 6 through 8: In 2018, a sample of academic faculty were surveyed about their salaries (in US dollars). The results were classified according to academic rank: instructor, assistant professor, associate professor, and full professor. The table below shows 95% confidence intervals for the population mean of each rank.
Rank | Sample size | Group Mean | 95% CI for |
Instructor | 75 | 63680 | (54583, 72776) |
Assistant | 175 | 92029 | (86073, 97984) |
Associate | 145 | 105133 | (98591, 111676) |
Full Professor | 234 | 154509 | (149359, 159659) |
- Which of the following statements is an appropriate interpretation based on the confidence intervals? You may assume that this sample is representative of a larger population of academic faculty.
- The average salaries of all four ranks are significantly different from each other, because none of the confidence intervals include 0.
- We are 95% confident that the population mean salary for instructors is between $54,583 and $72,776.
- Roughly 95% of instructors in the population earn between $54,583 and $72,776 per year.
- More than one of these statements is an appropriate interpretation of the confidence intervals.
- Why is the 95% confidence interval for
narrower than the other intervals?
- The sample size for full professors is largest, and as sample size increases, the width of the confidence interval tends to decrease.
- The group mean for professors is largest, and as group mean increases, the width of the confidence interval tends to decrease.
- We can predict with a high level of certainty that full professors make more than other ranks, so the confidence interval provides a precise estimate.
- None of the justifications above are reasonable, so professors’ salaries must be less variable (lower SD) compared to the other ranks.
- If we changed the confidence level from 95% to 99% (holding everything else constant), would the width of the confidence intervals change?
- The width of the confidence intervals would decrease.
- The width of the confidence intervals would increase.
- The width of some intervals would increase and the width of the other intervals would decrease.
- The width of all the confidence intervals would stay the same.
- An online retailer is using an experiment to decide whether to modify their website. When visitors type in the web address or click a link to the site, they are randomly re-directed to one of three versions of the website. The retailer’s goal is to maximize the amount of time (in minutes) visitors stay on the site.
Version | Letters |
1 | A |
2 | AB |
3 | B |
True or False: In this study, there is a statistically significant difference in mean time spent on the site for Version 1 and Version 2.
- Body temperature measurements (in Fahrenheit) were taken from 65 healthy female volunteers aged 18 to 40 that were participating in vaccine trials. Based on this data, researchers calculated a 95% prediction interval: (96.90, 99.89). Interpret the interval. You may assume that the sample is representative of a larger population.
- Roughly 95% of healthy females in this population would have body temperatures between 96.90 and 99.89.
- We are 95% confident that the sample mean body temperature is between 96.90 and 99.89.
- We are 95% confident that the population mean body temperature is between 96.90 and 99.89.
- If we were to collect another sample of size 65, we are 95% confident that the sample mean body temperature would be between 96.90 and 99.89.
- In a context with a quantitative response variable and a multi-level categorical explanatory variable, which of the following best describes the purpose of the ANOVA F-test?
- The F-test helps us assess whether or not there is convincing evidence of an association between the variables.
- The F-test helps us measure the strength of the association between the variables by indicating how much the groups differ in terms of the mean response.
- The F-test helps us determine the direction of the association between the variables by indicating which group means are higher than others.
- The F-test serves all three of the purposes listed above.
- A researcher has conducted an experiment to study four different treatments, and they decide to analyze the data by comparing each treatment group mean to every other treatment group mean. This involves tests for six pairwise comparisons. If the researcher uses a significance level of
for each test, then the probability of making at least one Type I error is ________ (>, <, =) 0.05.
- What should you do to protect against an inflated experiment-wise Type I error rate?
- Conduct pairwise comparisons using t-procedures first and only conduct an F-test if the p-values for the t-tests are all large.
- Conduct pairwise comparisons using t-procedures first and only conduct an F-test if the p-values for the t-tests are all small.
- Conduct an F-test first and only conduct pairwise comparisons using t-procedures if the p-value for the F-test is large.
- Conduct an F-test first and only conduct pairwise comparisons using t-procedures if the p-value for the F-test is small.
- How are prediction intervals different from confidence intervals?
- Prediction intervals predict the population mean for a particular group, thus they are wider than confidence intervals.
- Prediction intervals predict the population mean for a particular group, thus they are narrower than confidence intervals.
- Prediction intervals predict the response of a new individual observation, thus they are wider than confidence intervals.
- Prediction intervals predict the response of a new individual observation, thus they are narrower than confidence intervals.
- The validity conditions for confidence intervals and prediction intervals on means require that the data distribution be reasonably bell-shaped and symmetric. Is this condition always important, even when the sample size is large?
- This condition is not very important for confidence intervals or prediction intervals, as long as the sample size is large.
- This condition is very important for prediction intervals. It is less of a concern for confidence intervals, as long as the sample size is large.
- This condition is very important for confidence intervals. It is less of a concern for prediction intervals, as long as the sample size is large.
- This condition is very important for both confidence intervals and prediction intervals, regardless of sample size.
- Which of the following is the best way to reduce the width of a prediction interval?
- Increase the confidence level
- Increase the sample size
- Reduce the unexplained variation within groups
- Use a pooled estimate of the total variation
- True or False: Suppose we test
using an ANOVA F-test. This is preferable to using 6 pairwise t-tests, because testing all parameters at once controls the probability of Type II error.
Section 1.6: More Study Design Considerations
LO1.6-1: Understand statistical power and how it is impacted by sample size, variability within groups, number of groups, and significance level.
LO1.6-2: Use statistical power analysis to plan the sample size of a study.
- A Type II error occurs when researchers _________ (do/don’t) find convincing evidence against the null hypothesis, when the null hypothesis is actually __________ (true/false).
- Suppose that you analyzed data from an experiment and obtained a large p-value. Which type of error is possible in this case?
- This result could be due to a Type I error.
- This result could be due to a Type II error.
- This result could be due to either a Type I error or a Type II error.
- As long as the validity conditions were met, the large p-value is not due to an error.
- The statistical power of a study is the probability that the researchers _______ (will/won’t) find convincing evidence against the null hypothesis, when the null hypothesis is actually __________ (true/false).
- Suppose researchers design a study such that the Type I error rate is 5% and the Type II error rate is 20%. Calculate the power of the study. Include the % sign in your answer.
Solution:
- True or False: The aspects of a study that impact the strength of evidence (sample size, unexplained variation, number of groups, etc.) are the same ones that impact a study’s power.
- How are the probabilities of Type I and Type II error affected by sample size? Assume that variability within groups, number of groups, and significance level remain unchanged.
As the sample size increases, the probability of making a Type I error ________ (increases / decreases / stays the same), and the probability of making a Type II error ________ (increases / decreases / stays the same).
- How are the probabilities of Type I and Type II error affected by the significance level,
? Assume that sample size, variability within groups, and number of groups remain unchanged.
As the significance level increases, the probability of making a Type I error ________ (increases / decreases / stays the same), and the probability of making a Type II error ________ (increases / decreases / stays the same).
- True or False: When comparing groups with a quantitative response, using a smaller number of groups (fewer levels of the categorical variable) always increases the statistical power of the test.
- Suppose a researcher wants to design an experiment with a high level of statistical power. What should they do?
Include a _________ (large/small) number of experimental units in the study.
Choose a number of groups that is as_________ (large/small) as possible without compromising the amount of variability explained.
Take steps during study design and data collection to __________ (increase/decrease) the amount of variability within groups.
Questions 10 and 11: For a class project, a statistics student plans to conduct an experiment to investigate whether standing heart rates tend to be higher than sitting heart rates. They will randomly assign their participants to either sit or stand, then the participants’ heart rates will be measured (in beats per minute).
The student considers a difference of 5 beats per minute to be practically important, so if the difference is 5 beats per minute or larger, they want to be able to detect it. The student expects the standard deviation of each group to be about 12 bpm and plans to use a significance level of and a sample size of 20 in each group.
- Which of the graphs above would the student use to find the rejection region?
- Graph A
- Graph B
- Either of these two graphs could be used to find the rejection region.
- Neither of these two graphs could be used to find the rejection region.
- The rejection region for this study is a difference of means
of 9.1 or higher. Which of the values below is closest to the power of the test, given that the difference in standing and sitting heart rates is really 5 beats per minute?
- 1%
- 15%
- 50%
- 90%
Questions 12 and 13: Do older adults (ages 65+) have lower body temperatures than younger adults (ages 18-64), on average? Researchers decide to conduct a test of vs.
at the
significance level with sample sizes of 25 in each group.
Suppose that the true mean body temperature for older adults is 97.5º F, the true mean body temperature for younger adults is 98.6º F, and both groups have a standard deviation of 0.75º F.
- Match each term to its representation in the graphs above.
Power: A. The area to the left of the red line in Graph 1
Prob(Type I error): B. The area to the left of the red line in Graph 2
Prob(Type II error): C. The area to the right of the red line in Graph 2
- The rejection region for this study is a difference of means (
) of -0.51 or lower. Suppose the significance level was changed from
to
. The boundary of the rejection region (the red line) would shift to the _____ (left/right), and the power would ______ (increase/decrease).
Question 14 and 15: Olestra was approved by the FDA for use in snack foods as a fat substitute in the 1990s. Because there were anecdotal reports of stomach (GI) problems associated with Olestra consumption, researchers planned to carry out an experiment to compare GI symptoms after consuming Olestra potato chips or regular potato chips.
The researchers consider a difference of proportions of 0.05 to be practically significant. That is, if 20% of people experience GI problems when eating Olestra and 15% experience GI problems while eating regular potato chips, they want to be able to detect the difference between the two. They use software to conduct a power analysis to decide how large the sample size needs to be in order for the study to have 80% power with a significance level of and a one-sided alternative hypothesis.
- Statistical power is the probability of concluding that the risk of GI problems for those eating chips with Olestra is ___________ (higher than / the same as) the risk for those eating regular potato chips, given that the difference in proportions who experience GI problems is really equal to ____ (0 / 0.05).
- The power analysis suggests a sample size of at least 714. One of the researchers’ colleagues is surprised that such a large sample size is necessary. Which of the following is the best explanation?
- The difference between 15% and 20% is small, and small effect sizes are more difficult to detect, so a large sample size is required.
- The significance level is fairly high. If the researchers changed the significance level from
to
they wouldn’t need such a large sample size.
- 80% power is an unusually high value that demands an unusually large sample. Lowering the power would make their plan more acceptable to funding agencies.
- One-sided tests always require larger samples. If they used a two-sided test, the necessary sample size would be roughly half as large.
Document Information
Connected Book
Intermediate Statistical Investigations 1st Ed - Exam Bank
By Nathan Tintle