Ch1 + Intermediate Statistical | Test Questions & Answers - Intermediate Statistical Investigations 1st Ed - Exam Bank by Nathan Tintle. DOCX document preview.

Ch1 + Intermediate Statistical | Test Questions & Answers

Chapter 1

Intermediate Statistical Investigations Test Bank

Question types: FIB = Fill in the blank Calc = Calculation

Ma = Matching MS = Multiple select

MC = Multiple choice TF = True-false

CHAPTER 1 TERMINAL LEARNING OUTCOMES

TLO1-1: Apply the six-step investigative process in the context of a well-designed experiment.

TLO1-2: Partitioning variation in the response variable into variation explained by the model and unexplained variation, and measuring and reporting the percentage of variation explained

TLO1-3: Assess the statistical significance of the difference between two groups on a quantitative response variable using both simulation and theory-based approaches

TLO1-4: Compare more than two treatments on a quantitative response using both simulation and theory-based approaches

TLO1-5: Apply Post-hoc analysis after significant F-test (pairwise differences, as well as confidence and prediction intervals for single means)

TLO1-6: Understand statistical power and how it is impacted by sample size, variability within groups, number of groups, and significance level

Section 1.1: Sources of Variation in an Experiment

LO1.1-1: Apply the six-step investigative process.

LO1.1-2: Distinguish experiments and observational studies.

LO1.1-3: Review basic study design principles such as inclusion criteria and random assignment.

LO1.1-4: Define terminology specific to an experimental study (e.g., treatments).

LO1.1-5: Produce a Sources of Variation diagram for an experiment.

Questions 1 through 3: A study published in Psychological Science in 2007 examined a possible link between mindset and health. The following is an excerpt from the abstract of the article: “84 female room attendants working in seven different hotels were measured on physiological health variables affected by exercise. Those in the informed condition were told that the work they do (cleaning hotel rooms) is good exercise and satisfies the Surgeon General's recommendations for an active lifestyle. Examples of how their work was exercise were provided. Subjects in the control group were not given this information.”

  1. Identify the experimental units in this study.
    1. The eighty-four room attendants
    2. The seven different hotels
    3. The physiological health variables
    4. The two groups (informed and control)
  2. The researchers chose to include room attendants from seven different hotels (as opposed to using stricter inclusion criteria that would limit the study to room attendants at one particular hotel). Describe the consequences of this decision.

Using broader inclusion criteria may ______ (increase/decrease) the amount of variation in the observed health outcomes. However, this decision also __________ (supports/limits) generalizability to a larger population of room attendants.

  1. Room attendants were randomly assigned to either the informed condition or the control group. What is the most important reason for the random assignment?
    1. Random assignment ensures that the study is double-blind.
    2. Random assignment reduces the impact of outliers.
    3. Random assignment creates two groups of room attendants that are as similar as possible, which supports cause-and-effect conclusions.
    4. Random assignment makes it possible to generalize the results to the population.

Questions 4 through 6: An online retailer is using an experiment to decide whether to modify their website. When visitors type in the web address or click a link to the site, they are randomly re-directed to one of two versions of the website: the version that has been in use for the last year (version A) or an updated version (version B). The retailer’s goal is to maximize the amount of time (in minutes) visitors stay on the site.

  1. Identify the experimental units and variables. Note: One of the answer choices will not be used.

Experimental units: A. Version of the website (A and B)

Explanatory variable: B: Online retailers

Response variable: C: Visitors to the website

D: Time spent on the website (in minutes)

  1. Consider two possible models for analyzing time spent on this retailer’s website.

Single-mean model:

Separate-means model:

Does the version of the website appear to explain any of the variation in time spent on the site? Note: If more than one of these justifications is appropriate, select multiple answers.

    1. Yes, because the mean time spent on the site is higher for Version B than for Version A.
    2. Yes, because the SE of the residuals is smaller for the separate-means model than for the single-mean model.
    3. No, because the mean time spent on the site is not the same for Version A and for Version B.
    4. No, because the SE of the residuals is smaller for the separate-means model than for the single-mean model.
  1. The researcher decides that the difference between Version A and Version B in this study is meaningful. Is it reasonable to generalize these results to all customers of this retailer?
    1. Yes, because visitors to the website were randomly assigned to either Version A or Version B.
    2. Yes, because the study’s inclusion criteria would exclude potential subjects who are not customers.
    3. It depends whether visitors to the website knew about the research question being investigated. The study may not be double-blind.
    4. It depends who visited the website during the study period. The sample may not be representative.

Questions 7 through 8: Researchers at a university were interested in the effectiveness of a calculus workshop program for students who fail Calculus I and need to retake the course. As part of the study, students who were retaking Calculus I were allowed to enroll in a calculus workshop at their own discretion. At the end of the grading term, all students (even those with different instructors) took the same final exam. The researchers then compared the scores for those who enrolled in the workshop while re-taking calculus to those who re-took calculus without enrolling in the workshop.

  1. Is this an experiment? Justify your answer.
    1. Yes, this is an experiment, because there was a treatment group who enrolled in the calculus workshop and a control group that did not.
    2. Yes, this is an experiment, because the study was double-blind (as long as the calculus teachers did not know which students enrolled in the workshop).
    3. No, this is an observational study, because it does not take place in a laboratory or other tightly controlled research environment.
    4. No, this is an observational study, because the choice of whether to participate in the workshop was made by the students not the researchers.
  2. Which of the following are sources of unexplained variation in this study? Select all that apply.
    1. Whether or not students enrolled in the workshop
    2. Whether or not students had failed a calculus class in the past
    3. Student attendance in class (number of absences)
    4. Student motivation to study calculus
    5. Calculus instructor
    6. Difficulty of the final exam
  3. A study published in Athletic Training examined the effects of three different types of knee stabilizing braces on agility test speed. College football players from all different positions (running back, wide receiver, linebacker, lineman, etc.) were recruited to participate in the study. All players in the study had torn their ACL (anterior cruciate ligament) in the past, and needed to wear a knee brace to play football. Agility tests were administered in an outdoor football stadium, and the time to complete the test was recorded by a Lafayette photoelectric Cell and Light Time Unit (in seconds).

Put the components of the study into the correct boxes in the Sources of Variation Diagram. Note: Some boxes will include more than one answer.

Observed variation in:

Sources of
explained variation

Sources of
unexplained variation

Inclusion criteria:

Design:

    1. College football players from all different positions
    2. Type of knee brace
    3. Players’ current condition (health, mood, motivation, etc.)
    4. Time to complete agility test (in seconds)
    5. Measurement error
    6. History of torn ACL and need for a knee brace
    7. Details of the agility test
    8. Players’ natural speed and agility
    9. Environmental factors (weather, wind, etc.)
  1. In a separate-means model, the standard error of the residuals can be thought of as the typical deviation of an observed response from:
    1. The residuals (prediction errors)
    2. The response predicted by the model (group mean)
    3. The overall mean of the response variable
    4. The overall mean of the explanatory variable
  2. A study published in the Journal of Sports Science & Medicine tested the effectiveness of the Power Balance © bracelet, which has been marketed as a way to improve balance, flexibility, strength, and power through the use of hologram technology. Subjects, who were all college athletes, completed tests of their athletic performance while wearing either a Power Balance © bracelet or a plain rubber placebo bracelet. The bracelets were covered with a wristband, so the athletes and those measuring their performance could not see which bracelet was being worn. Only the researcher who analyzed the data knew which measurements corresponded to the Power Balance © bracelet and which to the placebo bracelet. Classify this study.
    1. This study is not blinded.
    2. This is a single-blind study.
    3. This is a double-blind study.
    4. There is not enough information to classify this study.
  3. Which of the following is an experiment? Select all that apply.
    1. Executives at a large department store chain selected 100 stores and randomly assigned 50 of them to reduce their hours, opening an hour later than before; hours for the other 50 stores were not changed. After six months, the executives compared revenue for the two groups of stores.
    2. A researcher recruited a group of American adults whose demographics were similar to the American population. The researcher measured each subject’s forced expiratory volume, an indicator of lung function. Then each subject was asked whether or not they smoke cigarettes.
    3. A survey was administered to a large group of high school students. The survey asked whether the students were employed outside of school (in a paying job) and how much sleep they got the night before (in hours).
  4. A university professor teaches two sections of introductory statistics: one section meets at 8:00 am and the other meets at 11:00 am. She wants to evaluate the effectiveness of a new method of teaching statistics compared to the standard method she has used in the past. She flips a coin to assign her sections to teaching methods and determines that she will use the standard method at 8:00 am and the new method at 11:00 am. At the end of the term, she compares the final exam scores for the two sections. Which of the following best describes the potential for confounding in this scenario?
    1. Different students have different levels of talent and motivation, so it is impossible to attribute differences in final exam scores to the teaching method.
    2. The two sections may not be exactly the same size, which would lead to inappropriate comparisons between the treatment and control groups.
    3. Students may find it difficult to pay attention at 8:00 am, which may negatively impact the exam scores of those taught with the standard method.
    4. Confounding is not a concern in this scenario, because the professor used random assignment as part of the study design.
  5. True or False: The standard error of the residuals is a way to measure the amount of variation in the response variable that remains unexplained after applying the model.
  6. True or False: Because of the possibility of confounding, you should always avoid using causal language (action verbs like “affect” and “lead to”) in your conclusions.

Section 1.2: Quantifying Sources of Variation

LO1.2-1: Partitioning variation in the response variable into variation explained by the model and unexplained variation.

LO1.2-2: Measuring percentage of variation explained.

LO1.2-3: Understanding effect size and practical significance.

Questions 1 through 3: Dog agility is a sport where trainers guide their dogs through an obstacle course as quickly as possible. Two trainers, Abby and Lauren, have dogs participating in agility competitions. Each dog completes the same agility course, and they measure the time it takes each dog to complete the course (in seconds). The times for Abby’s three dogs were 30, 40, and 50. The times for Lauren’s three dogs were 50, 60, and 70.

A dotplot describes the results of dogs' agility for two trainers. The horizontal axis is labeled Time (in seconds) and has markings from 10 to 90 in increments of 10. The vertical axis is labeled Trainer and has two markings, Abby and Lauren, in the order from top to bottom. For Abby, the dots are plotted as follows: 1 dot above 30; 1 dot above 40; and 1 dot above 50. There are no dots above 10, 20, 60, 70, 80, and 90. For Lauren, the dots are plotted as follows: 1 dot above 50; 1 dot above 60; and 1 dot above 70. There are no dots above 10, 20, 30, 40, 80, and 90.

  1. Calculate the Sum of Squares Total (SSTotal).

Solution:

  1. Calculate the Sums of Squared Errors (SSError).

Solution:

  1. Calculate the Sum of Squares for the Model (SSModel).

Solution:

Questions 4 and 5: Dog agility is a sport where trainers guide their dogs through an obstacle course as quickly as possible. Two trainers, Abby and Lauren, have dogs participating in agility competitions. Each dog completes the same agility course, and they measure the time it takes each dog to complete the course in seconds. Compare the sums of squares for two possible datasets that could occur in this context.

A dotplot titled, Dataset 1, describes the results of dogs' agility for two trainers. The horizontal axis is labeled Time (in seconds) and has markings from 10 to 90 in increments of 10. The vertical axis is labeled Trainer and has two markings, Abby and Lauren, in the order from top to bottom. For Abby, the dots are plotted as follows: 1 dot above 30; 1 dot above 40; and 1 dot above 50. There are no dots above 10, 20, 60, 70, 80, and 90. For Lauren, the dots are plotted as follows: 1 dot above 50; 1 dot above 60; and 1 dot above 70. There are no dots above 10, 20, 30, 40, 80, and 90.

Dataset 1:

Times for Abby’s dogs: 30, 40, 50

Times for Lauren’s dogs: 50, 60, 70

A dotplot titled, Dataset 2, describes the results of dogs' agility for two trainers. The horizontal axis is labeled Time (in seconds) and has markings from 10 to 90 in increments of 10. The vertical axis is labeled Trainer and has two markings, Abby and Lauren, in the order from top to bottom. For Abby, the dots are plotted as follows: 1 dot above 20; 1 dot above 40; and 1 dot above 60. There are no dots above 10, 30, 50, 70, 80, and 90. For Lauren, the dots are plotted as follows: 1 dot above 40; 1 dot above 60; and 1 dot above 80. There are no dots above 10, 20, 30, 50, 70, and 90.

Dataset 2:

Times for Abby’s dogs: 20, 40, 60

Times for Lauren’s dogs: 40, 60, 80

  1. SSTotal for Dataset 1 __________ (<, >, or =) SSTotal for Dataset 2

SSModel for Dataset 1 __________ (<, >, or =) SSModel for Dataset 2

SSError for Dataset 1 __________ (<, >, or =) SSError for Dataset 2

  1. The R2 value for Dataset 1 __________ (<, >, or =) the R2 value for Dataset 2

Questions 6 and 7: An office designer claims that a new ergonomic desk chair makes typing at a computer terminal faster and easier. A client company plans to test it by asking 30 employees who do a lot of typing to take part in an experiment. They will randomly assign 15 employees to use the new ergonomic chair and 15 to use a regular chair. The 30 employees will then type a selected passage for 5 minutes, recording the total number of words that are typed correctly.

Consider two hypothetical data sets that could result from this experiment:

"Two sets of two parallel dotplots with overlaid boxplots, each with a table, describe the results of hypothetical datasets. The first set of dotplots are titled as, Dataset 1. In both sets, the horizontal axis is labeled Words and has markings from 120 to 240 in increments of 60. The vertical axis is labeled Chair and has two markings, Regular and Ergonomic, in the order from top to bottom. For regular, the dots are plotted as follows: 1 dot above 125, 140, 150, 155, 165, 175, 180, 185, 195, 200, 210, 220, 235 and 2 dots above 170. In addition, the whiskers of the boxplot range from 125 to 235 and the box ranges from 165 to 200 with median at 175. For ergonomic, the dots are plotted as follows: 1 dot above 148, 163, 180, 185, 205, 210, 220, 225, 233, 250, 255 and 2 dots above 165, 170, 175, 195, and 200. In addition, the whiskers of the boxplot range from 148 to 255 and the box ranges from 170 to 200 with median at 195. All values are approximate. 
The table below the dotplot is titled, Summary statistics. The table has three rows and three columns; and the column headers are n, Mean, and S D. The data from the table reads: Row 1: Regular: n, 30; Mean, 180.73; S D, 25.43. Row 2: Ergonomic: n, 30; Mean, 191.17; S D, 25.81. Row 3: Residuals: n, 60; Mean, 0; S D, 25.62. 
The second set of dotplots are titled as, Dataset 2. In both sets, the horizontal axis is labeled Words and has markings from 140 to 280 in increments of 70. The vertical axis is labeled Chair and has two markings, Regular and Ergonomic, in the order from top to bottom. For regular, the dots are plotted as follows: 1 dot above 115, 135, 145, 150, 160, 170, 175, 180, 190, 195, 205, 215, 225 and 2 dots above 165. In addition, the whiskers of the boxplot range from 115 to 225 and the box ranges from 160 to 195 with median at 170. For ergonomic, the dots are plotted as follows: 1 dot above 155, 170, 195, 200, 205, 213, 215, 220, 225, 230, 270, 275 and 2 dots above 175, 180, 190, 205, and 210. In addition, the whiskers of the boxplot range from 155 to 275 and the box ranges from 180 to 210 with median at 205. All values are approximate. 
The table below the dotplot is titled, Summary statistics. The table has three rows and three columns; and the column headers are n, Mean, and S D. The data from the table reads: Row 1: Regular: n, 30; Mean, 170.73; S D, 25.43. Row 2: Ergonomic: n, 30; Mean, 201.17; S D, 25.81. Row 3: Residuals: n, 60; Mean, 0; S D, 25.62. "

  1. Which of the datasets would result in a larger R2 value?
    1. Dataset 1 would result in a larger R2 value, because SSModel for Dataset 1 is larger than SSModel for Dataset 2.
    2. Dataset 1 would result in a larger R2 value, because SSError for Dataset 1 is smaller than SSError for Dataset 2.
    3. Dataset 2 would result in a larger R2 value, because SSModel for Dataset 2 is larger than SSModel for Dataset 1.
    4. Dataset 2 would result in a larger R2 value, because SSError for Dataset 2 is smaller than SSError for Dataset 1.
  2. Compare the value of the effects for the two datasets.
    1. The effects for Dataset 1 would be larger (in absolute value), because in Dataset 1 there is a smaller difference between the group means.
    2. The effects for Dataset 2 would be larger (in absolute value), because in Dataset 2 there is a larger difference between the group means.
    3. The effects for Dataset 1 would be the same size as the effects for Dataset 2, because the standard deviations are the same for both datasets.
    4. The effects for Dataset 1 would be the same size as the effects for Dataset 2, because the sample sizes are the same for both datasets.

Questions 8 through 11: The graphs below display the outcomes of three different experiments to compare a treatment group with a control group.

Two parallel dotplots compare the results of a treatment group with a control group. The horizontal axis is labeled Experiment A and has markings from 10 to 14 in increments of 1. The vertical axis is labeled Group and has two markings, Treatment and Control, in the order from top to bottom. For Treatment, the dots are plotted as follows: 9 dots above 12. There are no dots above 10, 11, 13, and 14. For Control, the dots are plotted as follows: 9 dots above 12. There are no dots above 10, 11, 13, and 14.

Two parallel dotplots compare the results of a treatment group with a control group. The horizontal axis is labeled Experiment B and has markings from 10 to 14 in increments of 1. The vertical axis is labeled Group and has two markings, Treatment and Control, in the order from top to bottom. For Treatment, the dots are plotted as follows: 9 dots above 10. There are no dots above 11, 12, 13, and 14. For Control, the dots are plotted as follows: 9 dots above 14. There are no dots above 10, 11, 12, and 13.

Two parallel dotplots compare the results of a treatment group with a control group. The horizontal axis is labeled Experiment C and has markings from 10 to 14 in increments of 1. The vertical axis is labeled Group and has two markings, Treatment and Control, in the order from top to bottom. For Treatment, the dots are plotted as follows: 1 dot above 10; 2 dots above 11; 3 dots above 12; 2 dots above 13; and 1 dot above 14. For Control, the dots are plotted as follows: 1 dot above 10; 2 dots above 11; 3 dots above 12; 2 dots above 13; and 1 dot above 14.

  1. For which of the experiments does SSModel = 0? Select one or more than one.
    1. Experiment A
    2. Experiment B
    3. Experiment C
  2. For which of the experiments does SSError = 0? Select one or more than one.
    1. Experiment A
    2. Experiment B
    3. Experiment C
  3. The R2 value for Experiment B is _______ (0, 0.5, 1), because ________ (none, half, all) of the variability in outcomes is explained by the treatment group model.

The R2 value for Experiment C is ________ (0, 0.5, 1), because ________ (none, half, all) of the variability in outcomes is explained by the treatment group model.

  1. The value of the effect for the treatment group in Experiment B is _____ (-4, -2, 0, 2, or 4).

The value of the effect for the treatment group in Experiment C is _____ (-4, -2, 0, 2, or 4).

Questions 12 and 13: A statistics class conducted an experiment to investigate whether standing heart rates tend to be higher than sitting heart rates. Students were randomly assigned to either sit or stand, then the students measured their heart rates (in beats per minute). They used software to calculate the sums of squares and found that SSModel = 614.8 and SSTotal = 13232.1

  1. Calculate SSError.

Solution:

  1. Calculate the R2 value. Give your answer as a proportion.

Solution:

  1. An online retailer is using an experiment to decide whether to modify their website. When visitors type in the web address or click a link to the site, they are randomly re-directed to one of two versions of the website: the version that has been in use for the last year (version A) or an updated version (version B). The retailer’s goal is to maximize the amount of time (in minutes) visitors stay on the site.

Single-mean model:

Separate-means model:

Calculate the effect for Version A.

Solution:

Questions 15 and 16: The following output displays the amount (in dollars) that a sample of male and female college students spent on their most recent haircuts.

Two parallel dotplots with overlaid boxplots and a table describe the results of cost spent on haircuts for the male and female college students. In both sets, the horizontal axis is labeled haircut underscore cost and has ranges from 0 to 120 in increments of 60. The vertical axis is labeled gender and has two markings, Male and Female, in the order from top to bottom. For Male, the dots are plotted as follows: 1 dot above 1 and 9.5; 2 dots above 10; 3 dots above 15; 2 dots above 20; 6 dots above 25; 1 dot above 35, 40, and 55. In addition, the whiskers of the boxplot range from 1 to 55 and the box ranges from 15 to 24 with median at 24. For Female, the dots are plotted as follows: 2 dots above 1; 1 dot above 20; 2 dots above 25; 1 dot above 30, 35; 3 dots above 40; 1 dot above 45, 50; 2 dots above 55; 1 dot above 70, 75, 100, and 125. In addition, the whiskers of the boxplot range from 1 to 125 and the box ranges from 35 to 55 with median at 40. All values are approximate.                                                                                                                                         The table beside the dotplot is titled, Summary statistics. The table has three rows and three columns. The column headers are: n, Mean, and S D. The data from the table reads: Row 1: Male: n, 20; Mean, 19.20; S D, 11.05. Row 2: Female: n, 20; Mean, 41.25; S D, 32.90. Row 3: Residuals: n, 40; Mean, 0; S D, 24.54.

  1. Calculate SSModel. Note that the sample sizes are the same for the two groups.

Solution:

  1. Calculate the standard error of the residuals. Note that the sample sizes are the same for the two groups.

Solution:

  1. Match each sum of squares (SS) with its description.

SSModel: A. Measures the amount of variability in the response variable without accounting for groups

SSError: B. Measures the variability within groups, the variability unexplained by the model

SSTotal: C. Measures the variability between groups, the variability explained by the model

  1. In general, study results are considered practically significant when the R2 value is ______ (large/small) and the effect sizes are ______ (large/small).
  2. What is the cutoff for determining whether study results are practically significant?
    1. Results are practically significant when the R2 value is less than 0.05.
    2. Results are practically significant when the R2 value is less than 0.5.
    3. Results are practically significant when the R2 value is greater than 0.5.
    4. Results are practically significant when the R2 value is greater than 0.95.
    5. There is no set cut-off for practical significance. It differs based on context.
  3. If a researcher were unhappy with his/her effect size or R2 value, what steps could they take when planning a follow-up study?
    1. They could try to improve the model by adding new explanatory variables.
    2. They could try to reduce unexplained variation in the response through experimental controls or stricter inclusion criteria.
    3. Both of these strategies are reasonable.
    4. Neither of these strategies would impact the effect size or R2 value.

Section 1.3: Is the Variation Explained Statistically Significant

LO1.3-1: Carry out and evaluate a randomization test comparing two groups on a quantitative response variable.

LO1.3-2: Assess the statistical significance of a two-group comparison.

LO1.3-3: Apply two-sample t-procedures for tests of significance and confidence intervals.

Questions 1 through 3: The following output displays the amount (in dollars) that a sample of male and female college students spent on their most recent haircuts. You may assume that the sample is representative of a larger population.

"Two parallel dotplots with overlaid boxplots and a table describe the results of cost spent on haircuts for the male and female college students. In both sets, the horizontal axis is labeled haircut underscore cost and has ranges from 0 to 120 in increments of 60. The vertical axis is labeled gender and has two markings, Male and Female, in the order from top to bottom. For Male, the dots are plotted as follows: 1 dot above 1, and 9.5; 2 dots above 10; 3 dots above 15; 2 dots above 20; 6 dots above 25; 1 dot above 35, 40, and 55. In addition, the whiskers of the boxplot range from 1 to 55 and the box ranges from 15 to 24 with median at 24. For Female, the dots are plotted as follows: 2 dots above 1; 1 dot above 20; 2 dots above 25; 1 dot above 30, 35; 3 dots above 40; 1 dot above 45, 50; 2 dots above 55; 1 dot above 70, 75, 100, and 125. In addition, the whiskers of the boxplot range from 1 to 125 and the box ranges from 35 to 55 with median at 40. All values are approximate. 
The table beside the dotplot is titled, Summary statistics. The table has three rows and three columns; and the column headers are: n, Mean, and S D. The data from the table reads: Row 1: Male: n, 20; Mean, 19.10; S D, 11.05. Row 2: Female: n, 20; Mean, 41.25; S D, 32.90. Row 3: Pooled: n, 40; Mean, 30.18; S D, 24.54. Below the table, is an expression, Observed t- statistic equals 2.85. Below the expression, is a selected checkbox for 95 percent C I (s) for difference in means which is followed by an expression, Female minus Male colon (6.10, 38.20) asterisk. "

  1. Does this data provide strong evidence that female students spend more per haircut than male students, on average? Choose the appropriate statement of the null hypothesis.
  2. Interpret the 95% confidence interval.
    1. We are 95% confident that the sample mean for females is between $6.10 and $38.20 higher than the sample mean for males.
    2. We are 95% confident that the population mean for females is between $6.10 and $38.20 higher than the population mean for males.
    3. If we randomly select one male student and one female student from this sample, we are 95% confident that the haircut value for the female student will be higher.
    4. If we randomly select one male student and one female student from the population, we are 95% confident that the haircut value for the female student will be higher.
  3. Based on the 95% confidence interval, we would expect the two-sided p-value to be _______ (>, <, or =) 0.05.

Questions 4 through 6: A study published in Psychological Science in 2007 examined a possible link between mindset and health. The following is an excerpt from the abstract of the article: “84 female room attendants working in seven different hotels were measured on physiological health variables affected by exercise. Those in the informed condition were told that the work they do (cleaning hotel rooms) is good exercise and satisfies the Surgeon General's recommendations for an active lifestyle. Examples of how their work was exercise were provided. Subjects in the control group were not given this information.”

Over the course of four weeks, the informed group lost an average of 1.79 lbs and the uninformed group lost an average of 0.20 lbs. Are these results statistically significant? To decide, we can use a randomization test. The dotplot below shows 1000 simulated differences in mean weight loss: .

A dotplot depicts the results of simulated differences in mean weight loss. The horizontal axis has markings from negative 2 to 2 in increments of 0.5. The vertical axis ranges from 0 to 20 in increments of 10. A series of dots is plotted vertically for certain markings on the horizontal axis. The plotted dots are approximately bell-shaped. The series of dots begins at negative 2 on the horizontal axis. The heights of the plotted dots increase gradually from the left with a few ups and downs and reach the peak of 21 at 0.0 on the horizontal axis. Then the heights of the dots decrease gradually to the right with a few ups and downs and end at 2.0 on the horizontal axis. A highlighted arrow from the expression, null equals 0, points toward 0.0 on the horizontal axis. The dots are plotted as follows: 1 dot above negative 1.9, negative 1.8, negative 1.76; 2 dots above negative 1.72; 1 dot above negative 1.68; 2 dots above negative 1.64; 1 dot above negative 1.58, negative 1.54, negative 1.50, negative 1.48; 2 dots above negative 1.46; 1 dot above negative 1.44; 3 dots above negative 1.42; 2 dots above negative 1.40; 1 dot above negative 1.38; 2 dots above negative 1.34; 1 dot above negative 1.32, negative 1.28; 2 dots above negative 1.26; 1 dot above negative 1.24; 2 dots above negative 1.22; 1 dot above negative 1.20 and negative 1.18; 3 dots above negative 1.16; 2 dots above negative 1.14 and negative 1.12; 4 dots above negative 1.10; 1 dot above negative 1.08; 4 dots above negative 1.04; 2 dots above negative 1.02; 3 dots above negative 1.0; 4 dots above negative 0.98; 3 dots above negative 0.96; 4 dots above negative 0.94; 3 dots above negative 0.92; 4 dots above negative 0.90; 2 dots above negative 0.88; 7 dots above negative 0.86; 8 dots above negative 0.84; 10 dots above negative 0.80; 3 dots above negative 0.78; 6 dots above negative 0.76; 5 dots above negative 0.74; 6 dots above negative 0.72; 5 dots above negative 0.70; 13 dots above negative 0.68; 9 dots above negative 0.66; 3 dots above negative 0.64; 7 dots above negative 0.62 and negative 0.60; 6 dots above negative 0.58; 11 dots above negative 0.56; 3 dots above negative 0.54; 8 dots above negative 0.52 and negative 0.50; 10 dots above negative 0.48; 9 dots above negative 0.46 and negative 0.44; 8 dots above negative 0.42; 9 dots above negative 0.40; 10 dots above negative 0.38; 8 dots above negative 0.36; 3 dots above negative 0.34; 14 dots above negative 0.32; 13 dots above negative 0.30; 19 dots above negative 0.28; 11 dots above negative 0.26; 7 dots above negative 0.24; 16 dots above negative 0.22; 21 dots above negative 0.20; 5 dots above negative 0.18; 16 dots above negative 0.16; 12 dots above negative 0.14 and negative 0.12; 13 dots above negative 0.10; 8 dots above negative 0.08; 16 dots above negative 0.06 and negative 0.04; 14 dots above negative 0.02; 21 dots above 0.0; 13 dots above 0.02; 12 dots above 0.04; 8 dots above 0.06; 11 dots above0.08; 14 dots above 0.10; 7 dots above 0.12; 18 dots above 0.14; 13 dots above 0.16; 12 dots above 0.18; 9 dots above 0.20; 11 dots above 0.22; 7 dots above 0.24; 19 dots above 0.26; 12 dots above 0.28; 16 dots above 0.30; 15 dots above 0.32; 10 dots above 0.34; 18 dots above 0.36; 11 dots above 0.38; 12 dots above 0.40; 13 dots above 0.42; 8 dots above 0.44; 10 dots above 0.46; 12 dots above 0.48; 6 dots above 0.50; 12 dots above 0.52; 9 dots above 0.54; 10 dots above 0.56; 9 dots above 0.58; 7 dots above 0.60; 9 dots above 0.62; 13 dots above 0.64; 6 dots above 0.66; 5 dots above 0.68; 9 dots above 0.70; 5 dots above 0.72; 6 dots above 0.74; 4 dots above 0.76; 3 dots above 0.78; 4 dots above 0.80 and 0.82; 6 dots above 0.84; 4 dots above 0.86; 5 dots above 0.88; 1 dot above 0.90; 6 dots above 0.92; 3 dots above 0.94; 2 dots above 0.96; 7 dots above 0.98; 4 dots above 1 and 1.02; 2 dots above 1.04; 3 dots above 1.06; 1 dot above 1.08; 3 dots above 1.10, 1.12, 1.14, and 1.16; 2 dots above 1.18 and 1.22; 3 dots above 1.24; 2 dots above 1.28 and 1.30; 1 dot above 1.32 and 1.34; 2 dots above 1.38; 1 dot above 1.40, 1.42, 1.44, and 1.48; 2 dots above 1.50; 1 dot above 1.54 and 1.60; 2 dots above1.62; 1 dot above 1.66, 1.72, 1.82, 1.98, and 2.0; . All values are approximate.

  1. Suppose we want to use the 3S Strategy to investigate whether being informed about how their work qualifies as exercise affects room attendants’ weight loss. How would we design the simulation?
    1. Write the weight loss amounts on 84 cards. Shuffle and deal them into two groups to represent the informed and uninformed groups. Calculate the difference of means. Repeat.
    2. Write the group labels (informed or uninformed) on 84 cards. Shuffle and deal them into groups to represent weight loss. Calculate the difference of means. Repeat.
    3. Both of these designs are appropriate in this context.
    4. Neither of these designs is appropriate in this context.
  2. Why is the distribution of simulated statistics centered at 0?
    1. Because some of the room attendants in the sample gained weight and others lost weight, but the average is close to 0
    2. Because if we repeated this study again, some of the results would be positive and some of the results would be negative, but the average is close to 0
    3. Because the statistics were simulated under the assumption that being informed about how their work qualifies as exercise doesn’t affect room attendants’ weight loss
    4. Because the data in this study do not provide sufficient evidence to conclude that being informed about how their work qualifies as exercise affects room attendants’ weight loss
  3. Estimate the p-value. Include three decimal places in your answer.

Solution: Approximately 8 out of 1000 simulated differences were greater than or equal to 1.59, so we estimate that the p-value is 0.008

Questions 7 through 9: Researchers used an experiment to investigate whether cell phone use impairs drivers’ reaction times. 64 students who volunteered to participate in the study were assigned to one of two driving conditions: cell phone use (n1=32) or no distractions (n2=32). The students then participated in a simulation of driving situations, pressing a brake button as soon as they saw a red light. A device recorded their reaction times (in milliseconds).

  1. The standardized statistic for testing whether is . Interpret.
    1. The standard deviation of the reaction times is 2.72 milliseconds.
    2. The standard deviation of the reaction times is 2.72 milliseconds higher than we would have expected based on the null hypothesis.
    3. The sample mean for the cell phone group is 2.72 milliseconds above the sample mean for the control group.
    4. The sample mean for the cell phone group is 2.72 standard errors above the sample mean for the control group.
  2. You want to use a theory-based pooled t-test to assess the statistical significance of the difference between these two groups, so you would use a t-distribution with ____ degrees of freedom.
  3. The graph below shows the appropriate t-distribution for assessing the statistical significance of the difference between these two groups. Which of the following statements includes a reasonable p-value and conclusion?

A bell-shaped graph titled, t Distribution, describes the statistical significance of the difference between two groups. The horizontal axis has markings from negative 3 to 3 in increments of 1. The vertical axis is labeled, Density and ranges from 0.0 to 0.4 in increments of 0.1. A bell-shaped distribution curve starts from negative 3.1, reaches its peak of 0.4 at 0, and ends at 3.1. All values are approximate.

    1. The p-value = 0. This study provides only weak evidence to suggest that cell phone use impairs drivers’ reaction times.
    2. The p-value = 0.0042. This study provides strong evidence to suggest that cell phone use impairs drivers’ reaction times.
    3. The p-value = 0.2495. This study provides only weak evidence to suggest that cell phone use impairs drivers’ reaction times.
    4. The p-value = 0.4856. This study provides strong evidence to suggest that cell phone use impairs drivers’ reaction times.

Questions 10 and 11: Anchoring is the common human tendency to rely too heavily, or “anchor”, on one trait or piece of information when making decisions. A group of statistics students from California were asked to guess the population of Milwaukee, Wisconsin. Some of the students were randomly chosen to be told that the nearby city of Chicago, Illinois, has a population of about 3 million people, while the rest of the students were told that the nearby city of Green Bay, Wisconsin, has a population of about 100,000.

Two parallel dotplots with overlaid boxplots. In both sets, the horizontal axis is labeled Estimate and ranges from 0 to 4000 in increments of 1000. The vertical axis is labeled City and has two markings, Chicago and Green bay, in the order from top to bottom. For Chicago, the dots are plotted as follows: 1 dot above 50, 100, 400; 2 dots above 450; 1 dot above 800, 850, 950; 2 dots above 1000; 4 dots above 1100; 1 dot above 1150; 6 dots above 1500; 1 dot above 1600, 1700; 2 dots above 1800; 5 dots above 2000; 1 dot above 2100, 2150, 2500, 2800, 3500. In addition, the whiskers of the boxplot range from 50 to 3500 and the box ranges from 850 to 2000 with median at 1480. For Female, the dots are plotted as follows: 2 dots above 50, 100; 3 dots above 150; 1 dot above 200, 450; 2 dots above 500, 1 dot above 750; 2 dots above 950; and 1 dot above 2000. In addition, the whiskers of the boxplot range from 50 to 2000 and the box ranges from 100 to 200 with median at 140. All values are approximate. A histogram plots Shuffled differences in means against Count. The horizontal axis is labeled Shuffled differences in means and has markings from negative 1000 to 1000 in increments of 500. The vertical axis is labeled count and ranges from 0 to 1500 in increments of 300. The distribution of the bars is approximately normal. There is a bar with count 10 and 50 at the interval of negative 700 to negative 500. From negative 500 to 0, the bars extend up to counts 100, 250, 400, 600, 900, 1190, 1300, and 1350. From 0 to 500, the bars extend up to counts 1200, 1000, 750, 550, 300, 200, 40. There is a bar with count 10 and 5 at the interval of 500 to 700. The mean is 1.146, the standard deviation 199.411 and the total shuffles is 10000. All values are approximate.

City

N

Mean

SD

Chicago

35

1357.34

802.21

Green Bay

34

271.38

370.96

  1. Based on the information given above, is the effect of anchoring statistically significant in this context?
    1. Yes, because the difference of means observed in this sample would be unlikely to occur if anchoring really had no effect.
    2. Yes, because the sample means in this study are different and the sample sizes are both larger than 20.
    3. No, because the shuffled differences in means are centered at 0, so it is reasonable to conclude that anchoring has no effect.
    4. No, because shuffled differences in means generally fall between -500 and 500. That means the results of this experiment were an unlikely fluke.
  2. Are the validity conditions met for a theory-based pooled t-test?
    1. No, because the samples are not independent of each other.
    2. No, because the sample sizes are not the same.
    3. No, because the sample standard deviations for the two groups are very different.
    4. Yes. The only potential violation is the skewness in the sample distributions, but this is not a problem, because the sample sizes are both larger than 20.

Questions 12 through 14: A psychology study (Rutchick, Slepian, and Ferris, 2010) investigated whether using a red pen causes people to assign lower scores than using a blue pen. A group of 128 students in an undergraduate psychology class were asked to grade 128 different eighth graders’ essays on a scale of 0—100. Half of the students were randomly assigned a red pen while grading, and the other half were given blue. The results are given in the table below:

Pen Color

N

Mean Score

Standard Deviation

Red

64

76.20

12.29

Blue

64

80.00

9.36

  1. State the alternative hypothesis.
  2. Is it appropriate to use a pooled t-test to compare these groups?
    1. Yes, a pooled t-test is appropriate, because the group means are fairly similar.
    2. Yes, a pooled t-test is appropriate, because the group standard deviations are fairly similar.
    3. No, an unpooled two-sample t-test is more appropriate, because it is not reasonable to assume that the group means are equal in the population.
    4. No, an unpooled two-sample t-test is more appropriate, because it is not reasonable to assume that the group standard deviations are equal in the population.
  3. Calculate the t-statistic. Note that the sample sizes are equal.

Solution: SE of residuals =

  1. In the context of a pooled t-test for comparing two groups, what does it mean to say that the study results are statistically significant? Select all that apply.
    1. It means there is large difference between the means of the two groups.
    2. It means there is small difference between the means of the two groups.
    3. It means the observed sample difference in means would be unlikely to occur if there were really no difference between the two groups in the population.
    4. It means the observed sample difference in means would be likely to occur if there were really no difference between the two groups in the population.
  2. Which of the following statistics reflect both the effect size and the sample size? In other words, which of the following statistics can be used to assess statistical significance? Select all that apply.
    1. R2
    2. Difference in means,
    3. Standardized statistic, t
    4. p-value
  3. True or False: The p-value is the probability that the null hypothesis is true.
  4. True or False: A small p-value indicates strong evidence against the null hypothesis.
  5. True or False: A confidence interval is a range of plausible values for a parameter calculated using sample statistics.

Section 1.4: Comparing Several Groups

LO1.4-1: Compare more than two treatments using randomization tests.

LO1.4-2: Calculate an F-statistic and use the F-distribution to find theory-based p-values.

LO1.4-3: Assess the validity of an F-test.

LO1.4-4: Complete an Analysis of Variance table.

Questions 1 through 4: Does seeing a picture have any effect on college students’ understanding of ambiguous prose? 57 students were randomly assigned to three groups: 19 saw a picture before reading a difficult passage of text, 19 saw the picture after reading the passage, and 19 were shown no picture at all. The groups were then tested on their reading comprehension and assigned a quantitative score.

Does this data provide convincing evidence that seeing a picture has an effect on reading comprehension scores?

"Three parallel dotplots with overlaid boxplots, a histogram and a table. In all three sets, the horizontal axis is labeled Comprehension and ranges from 0 to 8 in increments of 2. The vertical axis is labeled Condition and has three markings, After, Before and None, in the order from top to bottom. For After, the dots are plotted as follows: 1 dot above 1, 4 dots above 2, 5 dots above 3, 6 dots above 4, 2 dots above 5, and 1 dot above 6. There is no dot above 0 and 8. In addition, the whiskers of the boxplot range from 1 to 6 and the box ranges from 2 to 3.95 with median at 2.95. For Before, the dots are plotted as follows: 1 dot above 2, 2 dots above 3, 3 dots above 4, 5 dots above 5, 7 dots above 6, and 1 dot above 7. There is no dot above 0 and 8. In addition, the whiskers of the boxplot range from 2 to 7 and the box ranges from 4 to 5.95 with median at 5. For None, the dots are plotted as follows: 2 dots above 1, 4 dots above 2, 6 dots above 3, 3 dots above 4, 3 dots above 5, and 1 dot above 6. There is no dot above 0 and 8. In addition, the whiskers of the boxplot range from 1 to 6 and the box ranges from 2 to 3.95 with median at 2.95. All values are approximate. 
The table below the dotplot is titled, Summary statistics. The table has three rows and three columns; and the column headers are n, Mean, and S D. The data from the table reads: Row 1: None: n, 19; Mean, 3.37; S D, 1.26. Row 2: Before: n, 19; Mean, 4.95; S D, 1.31. Row 3: After: n, 19; Mean, 3.21; S D, 1.40. Row 4: Residuals: n, 57; Mean, 0; S D, 1.32. 
To the right of the dotplot is a histogram labeled, Observed R squared Statistic equals 0.271. The horizontal axis is labeled Shuffled R-squared statistics and ranges from 0 to 0.280 in increments of 0.070. The distribution is of the bars is approximately right-skewed and it starts from 0, and ends at 0.160. The longest bar is at 0 on the horizontal axis and the bars decrease in height to the right of 0. The mean is 0.036 and the standard is 0.034. All values are approximate."

  1. Which of the following is an appropriate statement of the null hypothesis? Select all that apply.
    1. At least one differs from the others
    2. There is an association between seeing a picture (before, after, or not at all) and reading comprehension scores.
    3. There is no association between seeing a picture (before, after, or not at all) and reading comprehension scores.
  2. Which of the following is an appropriate statement of the alternative hypothesis? Select all that apply.
    1. At least one differs from the others
    2. There is an association between seeing a picture (before, after, or not at all) and reading comprehension scores.
    3. There is no association between seeing a picture (before, after, or not at all) and reading comprehension scores.
  3. What does the graph of shuffled R-squared statistics represent?
    1. The values of R2 for all 57 students who participated in this study
    2. The values of R2 that would occur if the null hypothesis were really true
    3. The values of R2 that would occur if the alternative hypothesis were really true
    4. The values of R2 that would occur if we repeated this study many times in the real world
  4. Does this study provide strong evidence that seeing a picture affects reading comprehension?
    1. Yes, because an R2 value of 0.271 does not appear in the graph of shuffled R-squared statistics, which suggests it would be unlikely to occur by chance alone.
    2. Yes, because the mean comprehension scores are different in each group, and all the validity conditions for the significance test are satisfied.
    3. No, because the graph of shuffled R-squared statistics is centered at 0.036, which suggests that very little variability is explained by the model.
    4. No, because an R2 value of 0.271 does not appear in the graph of shuffled R-squared statistics, which suggests the data from this experiment was a fluke that occurred by chance alone.

Questions 5 and 6: A study was carried out to investigate whether the type of message on the back of customer checks at a restaurant would affect tips (recorded as a percentage of the total bill). Sixty tables were selected to participate over a weekend at a restaurant in Philadelphia. Each table was randomly assigned to receive either (1) a picture of a happy face, (2) the words “Thank you!” written out, or (3) no message.

Does this study provide convincing evidence that the message written on the back of the check affects tip percentage? You can use a randomization test to decide.

  1. How would you design the physical simulation?

Write the tip percentages on cards. Shuffle and deal the cards into ____ groups.

    1. 2
    2. 3
    3. 20
    4. 60
  1. Which of the following statistics could be used to summarize each simulated sample? Select all that apply.
    1. Difference in means
    2. R2
    3. t-statistic
    4. F-statistic
  2. A food company was interested in how texture might affect the palatability of a particular food. They set up an experiment in which they looked at whether the “coarseness” of the final product (coarse or fine) affected the palatability scores given by 50 people. A partially filled in ANOVA table is given below. Calculate the F-statistic.

Source

df

SS

MS

F

Model

?

Error

6113

Total

16722

Solution:

Questions 8 through 11: Multiple researchers have conducted studies to examine the time it takes for three different medications to register in a patient’s blood system (in minutes). Each researcher wants to test whether the type of medication affects time.

  1. Which researcher would obtain a larger F-statistic based on their results?

"Two sets of three parallel dotplots with overlaid boxplots. The first set of dotplots is titled as, Researcher 1. In all the three sets, the horizontal axis ranges from 0 to 40 in increments of 10. The vertical axis has three labels, Group 1 (n equals 10), Group 2 (n equals 10), and Group 3 (n equals 10) in the order from top to bottom. For Group 1 (n equals 10), the dots are plotted as follows: 1 dot above 19; 2 dots above 20; 1 dot above 21, 22, 23; 3 dots above 24; and 1 dot above 28. In addition, the whiskers of the boxplot range from 19 to 28 and the box ranges from 20 to 24 with median at 22.5. The mean is 22.500 and the standard deviation is 2.677. For Group 2 (n equals 10), the dots are plotted as follows: 2 dots above 19; 3 dots above 20; 1 dot above 21, 24; 2 dots above 25; and 1 dot above 29. In addition, the whiskers of the boxplot range from 19 to 29 and the box ranges from 20 to 25 with median at 20.5. The mean is 22.200 and the standard deviation is 3.360. For Group 3 (n equals 10), the dots are plotted as follows: 1 dot above 21; 2 dots above 22; 3 dots above 25; 2 dots above 26; and 1 dot above 27, 28. In addition, the whiskers of the boxplot range from 21 to 28 and the box ranges from 22 to 26 with median at 25. The mean is 25 and the standard deviation is 1.886.  
The second set of dotplots is titled as, Researcher 2. In all the three sets, the horizontal axis ranges from 0 to 40 in increments of 10. The vertical axis has three labels, Group 1 (n equals 30), Group 2 (n equals 30), and Group 3 (n equals 30) in the order from top to bottom. For Group 1 (n equals 30), the dots are plotted as follows: 4 dots above 18; 2 dots above 19, 20; 7 dots above 21; 6 dots above 22; 4 dots above 23; 2 dots above 24, 25; and 1 dot above 28.In addition, the whiskers of the boxplot range from 18 to 28 and the box ranges from 20 to 23 with median at 21.5. The mean is 21.533 and the standard deviation is 2.240. For Group 2 (n equals 30), the dots are plotted as follows: 2 dots above 18; 4 dots above 19; 1 dot above 20; 5 dots above 21; 4 dots above 22; 8 dots above 23; 2 dots above 24, 25, 26; and 1 dot above 27. In addition, the whiskers of the boxplot range from 18 to 27 and the box ranges from 21 to 23 with median at 22. The mean is 22.100 and the standard deviation is 2.398. For Group 3 (n equals 30), the dots are plotted as follows: 2 dots above 20; 1 dot above 21; 4 dots above 22, 23; 5 dots above 24; 7 dots above 26; 3 dots above 27; and 4 dots above 28.In addition, the whiskers of the boxplot range from 20 to 28 and the box ranges from 22 to 26 with median at 25. The mean is 25.067 and the standard deviation is 1.999. "

    1. Researcher 1 would obtain a larger F-statistic.
    2. Researcher 2 would obtain a larger F-statistic.
    3. Researchers 1 and 2 would obtain very similar F-statistics.
    4. There is not enough information to determine which F-statistic would be larger.
  1. Compare the results for Researcher 1 and Researcher 3.

"Two sets of three parallel dotplots with overlaid boxplots. The first set of dotplot is titled as, Researcher 1. In all the three sets, the horizontal axis ranges from 0 to 40 in increments of 10. The vertical axis has three labels, Group 1 (n equals 10), Group 2 (n equals 10), and Group 3 (n equals 10) in the order from top to bottom. For Group 1 (n equals 10), the dots are plotted as follows: 1 dot above 19; 2 dots above 20; 1 dot above 21, 22, 23; 3 dots above 24; and 1 dot above 28. In addition, the whiskers of the boxplot range from 19 to 28 and the box ranges from 20 to 24 with median at 22.5. The mean is 22.500 and the standard deviation is 2.667. For Group 2 (n equals 10), the dots are plotted as follows: 2 dots above 19; 3 dots above 20; 1 dot above 21, 24; 2 dots above 25; and 1 dot above 29. In addition, the whiskers of the boxplot range from 19 to 29 and the box ranges from 20 to 25 with median at 20.5. The mean is 22.200 and the standard deviation is 3.360. For Group 3 (n equals 10), the dots are plotted as follows: 1 dot above 21; 2 dots above 22; 3 dots above 25; 2 dots above 26; and 1 dot above 27, 28. In addition, the whiskers of the boxplot range from 21 to 28 and the box ranges from 22 to 26 with median at 25. The mean is 25 and the standard deviation is 1.886. 
The second set of dotplot is titled as, Researcher 3. In all the three sets, the horizontal axis ranges from 0 to 40 in increments of 10. The vertical axis has three labels, Group 1 (n equals 10), Group 2 (n equals 10), and Group 3 (n equals 10) in the order from top to bottom. For Group 1 (n equals 10), the dots are plotted as follows: 3 dots above 18; 1 dot above 19, 20; 2 dots above 22; and 1 dot above 25, 27, 29, 31. In addition, the whiskers of the boxplot range from 18 to 31 and the box ranges from 18 to 25 with median at 21.50. The mean is 22.500 and the standard deviation is 4.346. For Group 2 (n equals 10), the dots are plotted as follows: 1 dot above 13, 14, 19, 20, 21, 22, 23, 26; and 2 dots above 31. In addition, the whiskers of the boxplot range from 13 to 31 and the box ranges from 19 to 26 with median at 22. The mean is 22.200 and the standard deviation is 5.827. For Group 3 (n equals 10), the dots are plotted as follows: 1 dot above 15; 2 dots above 18; and 1 dot above 19, 22, 27, 29, 31, 32, 35. In addition, the whiskers of the boxplot range from 15 to 35 and the box ranges from 18 to 31 with median at 25. The mean is 24.900 and the standard deviation is 7.260. "

Which researcher would obtain a larger F-statistic based on their results?

    1. Researcher 1 would obtain a larger F-statistic.
    2. Researcher 3 would obtain a larger F-statistic.
    3. Researchers 1 and 3 would obtain very similar F-statistics.
    4. There is not enough information to determine which F-statistic would be larger.
  1. Researcher 4 had a total sample size of 90, with 30 patients per treatment group, and a standardized statistic of F = 12.72. She intends to use the F-distribution to find a theory-based p-value. Which F-distribution should she use?

She should use an F-distribution with Model df = _______ and Error df = ________.

  1. Researcher 4 had a total sample size of 90, with 30 patients per treatment group, and a standardized statistic of F = 12.72. The graph below shows the appropriate F-distribution for assessing the statistical significance of the association between type of medication and time. Which of the following statements includes a reasonable p-value and conclusion?

A graph is titled F distribution. The horizontal axis ranges from 0 to 7 in increments of 1. The vertical axis ranges from 0.0 to 1.2 in increments of 0.2. A concave up, decreasing curve is drawn on the graph, which starts from (0, 1.0), gradually decreases to the right, and meets the horizontal axis at (5.2, 0.0) and then moves along the horizontal axis up to the point, (6, 0.0). All values are approximate.

    1. The p-value is close to 0. This study provides only weak evidence to suggest that type of medication has an effect on the time it takes for the medication to register.
    2. The p-value is close to 1. This study provides only weak evidence to suggest that type of medication has an effect on the time it takes for the medication to register.
    3. The p-value is close to 0. This study provides strong evidence to suggest that type of medication has an effect on the time it takes for the medication to register.
    4. The p-value is close to 1. This study provides strong evidence to suggest that type of medication has an effect on the time it takes for the medication to register.
  1. The table below summarizes the results of an experiment to compare yields (as measured by the dried weight of plants) obtained under a control and two different treatment conditions. Calculate SSModel.

Sample Size

Mean

SD

Full sample

30

5.07

0.701

Treatment 1

10

5.03

0.583

Treatment 2

10

4.66

0.794

Treatment 3

10

5.53

0.443

Solution:

  1. The following output displays the amount (in dollars) that a sample of male and female college students spent on their most recent haircuts. You may assume that the sample is representative of a larger population. Calculate the F-statistic.

"Two parallel dotplots with overlaid boxplots and a table, describe the results of amount spent on haircuts for the male and female college students. In both sets, the horizontal axis is labeled haircut underscore cost and has ranges from 0 to 120 in increments of 60. The vertical axis is labeled gender and has two markings, Male and Female, in the order from top to bottom. For Male, the dots are plotted as follows: 1 dot above 1, 10; 2 dots above 10; 3 dots above 15; 2 dots above 20; 6 dots above 25; 1 dot above 35, 40, and 55. In addition, the whiskers of the boxplot range from 1 to 55 and the box ranges from 15 to 24 with median at 24. For Female, the dots are plotted as follows: 2 dots above 1; 1 dot above 20; 2 dots above 25; 1 dot above 30, 35; 3 dots above 40; 1 dot above 45, 50; 2 dots above 55; 1 dot above 70, 75, 100, and 125. In addition, the whiskers of the boxplot range from 1 to 125 and the box ranges from 35 to 55 with median at 40. All values are approximate. 
The table beside the dotplot is titled, Summary statistics. The table has three rows and three columns; and the column headers are n, Mean, and S D. The data from the table reads: Row 1: Male: n, 20; Mean, 19.20; S D, 11.05. Row 2: Female: n, 20; Mean, 41.25; S D, 32.90. Row 3: Residuals: n, 40; Mean, 0; S D, 24.54. Below the table, is an expression, Observed t- statistic equals 2.85. Below the expression, is a selected checkbox for 95 percent C I (s) for difference in means which is followed by an expression, Female minus Male colon (6.10, 38.20) asterisk. "

Solution:

  1. A randomized experiment was conducted exploring the effectiveness of acupuncture in treating chronic lower back pain. Patients in the study were randomly assigned to one of three treatment groups: Verum acupuncture (traditional Chinese medicine), Sham acupuncture (placebo), and nonacupuncture therapy (drugs, physical therapy, etc.). After six months, each patient’s pain reduction was measured on a quantitative scale. Which inference procedure(s) could you use to test for an association between type of treatment and pain reduction?
    1. Two sample t-test (pooled or unpooled)
    2. ANOVA F-test
    3. Both of these tests are appropriate in this scenario.
    4. Neither of these tests is appropriate in this scenario.
  2. A random sample of 1450 birth records was selected from the state of North Carolina in the year 2001. One question of interest is whether the distribution of birth weights (in ounces) differs based on the race/ethnicity of the mother (White, Black, Hispanic, or other).

"Four parallel dotplots with overlaid boxplots and a table describe the results of birth weights for the mom’s race. In all the sets, the horizontal axis is labeled Birth weight ounces and has ranges from 0 to 210 in increments of 70. The vertical axis is labeled Mom race and has four markings, Other, Black, Hispanic, and White, in the order from top to bottom. 
For Other, a series of individual and overlapping dots is plotted vertically for certain markings on the horizontal axis. The series of dots begins at 70 on the horizontal axis. The heights of the plotted dots increase gradually from the left with a few ups and downs and reach the peak at 120 on the horizontal axis. Then the heights of the dots decrease gradually to the right with a few ups and downs and end at 145 on the horizontal axis. In addition, the whiskers of the boxplot range from 70 to 145 and the box ranges from 105 to 139 with median at 110. All values are approximate. 

For Black, a series of individual and overlapping dots is plotted vertically for certain markings on the horizontal axis. The series of dots begins at 15 on the horizontal axis. The heights of the plotted dots increase gradually from the left with a few ups and downs and reach the peak at 108 on the horizontal axis. Then the heights of the dots decrease gradually to the right with a few ups and downs and end at 155 on the horizontal axis. In addition, the whiskers of the boxplot range from 15 to 155 and the box ranges from 90 to 120 with median at 105. All values are approximate. 
For Hispanic, a series of individual and overlapping dots is plotted vertically for certain markings on the horizontal axis. The series of dots begins at 70 on the horizontal axis. The heights of the plotted dots increase gradually from the left with a few ups and downs and reach the peak at 138 on the horizontal axis. Then the heights of the dots decrease gradually to the right with a few ups and downs and end at 155 on the horizontal axis. In addition, the whiskers of the boxplot range from 70 to 155 and the box ranges from 105 to 138 with median at 110. All values are approximate. 
For White, a series of densely plotted overlapping dots is plotted vertically for certain markings on the horizontal axis. The series of dots begins at 15 on the horizontal axis. The heights of the plotted dots increase gradually from the left with a few ups and downs and reach the peak at 100 on the horizontal axis. Then the heights of the dots decrease gradually to the right with a few ups and downs and end at 180 on the horizontal axis. In addition, the whiskers of the boxplot range from 15 to 180 and the box ranges from 90 to 130 with median at 110. All values are approximate. 
The table beside the dotplot is titled Summary statistics. The table has five rows and three columns; and the column headers are: n, Mean, and S D. The data from the table reads: Row 1: Other: n, 48; Mean, 117.15; S D, 17.60. Row 2: Black: n, 332; Mean, 110.56; S D, 23.40. Row 3: Hispanic: n, 164; Mean, 118.52; S D, 18.17. Row 4: White: n, 906; Mean, 117.87; S D, 22.52. Row 5: Pooled: n, 1450; Mean, 116.25; S D, 22.13. "

Would an ANOVA F-test be valid for these data?

    1. No, because the samples are not independent of each other.
    2. No, because the sample sizes for the groups are not similar enough.
    3. No, because distribution of weights is slightly skewed left for two of the groups.
    4. Yes, because all of the validity conditions are met.
  1. True or False: The validity conditions for the ANOVA F-test are the same as the validity conditions for the two-sample pooled t-test.

Section 1.5: Confidence and Prediction Intervals

LO1.5-1: Apply post-hoc analysis after significant F-test (pairwise differences).

LO1.5-2: Calculate and interpret confidence intervals on single means and differences in two means.

LO1.5-3: Calculate and interpret prediction intervals on quantitative variables.

LO1.5-4: Identify factors that impact widths of confidence intervals and prediction intervals.

Questions 1 through 3: Does seeing a picture have any effect on college students’ understanding of ambiguous prose? 57 students were randomly assigned to three groups: 19 saw a picture before reading a difficult passage of text, 19 saw the picture after reading the passage, and 19 were shown no picture at all. The groups were then tested on their reading comprehension and assigned a quantitative score. The pairwise confidence intervals for the difference in mean comprehension scores are given below.

95% confidence interval for (-2.60, -0.88)

95% confidence interval for (-1.02, 0.70)

95% confidence interval for (0.72, 2.44)

  1. Based on the confidence intervals, which conditions are significantly different from each other? Select all that apply.
    1. After is significantly different from Before.
    2. After is significantly different from None.
    3. Before is significantly different from None.
  2. Which of the following letters tables is consistent with the confidence intervals given above?
    1. Group Letters

Before A

After A

None A

    1. Group Letters

Before A

After B

None B

    1. Group Letters

Before A

After B

None C

    1. The confidence intervals do not provide enough information to construct a letters table.
  1. How could you use a confidence interval to decide whether the difference in comprehension scores for the Before group and the None group is important in a practical sense?
    1. Subtract the endpoints to find the width of the interval. If the interval is wide, then the true difference is practically important.
    2. Subtract the endpoints to find the width of the interval. If the interval is narrow, then the true difference is practically important.
    3. Look to see whether the interval includes 0. If the interval does not include 0, then the true difference is practically important.
    4. Look to see whether the endpoints of the interval are close to 0 or far from 0. If the endpoints are far from 0, then the true difference may be practically important.

Questions 4 and 5: An experiment was conducted to compare yields (as measured by the dried weight of plants in kg) obtained under a control and two different treatment conditions. The pairwise confidence intervals for the difference in mean yields are given below.

95% confidence interval for (-1.44, -0.29)

95% confidence interval for (-0.94, 0.20)

95% confidence interval for (-0.08, -1.07)

  1. Based on the confidence intervals, which of the treatments are significantly different from each other?
    1. All three treatments are significantly different, because none of the confidence intervals have similar endpoints.
    2. All three treatments are significantly different, because none of the confidence intervals have midpoints that are equal to 0.
    3. None of the treatments are significantly different, because the widths of the confidence intervals are all very similar to each other.
    4. Treatment 1 is significantly different from Treatment 2, because the confidence interval for this comparison does not include 0.
  2. The grower’s goal is to find a treatment that produces yields of at least 1.4 kg, on average. Which of the treatments meets this goal?
    1. None of the treatments meet the grower’s goal.
    2. Only Treatment 2 meets the grower’s goal.
    3. Both Treatment 1 and Treatment 2 meet the grower’s goal.
    4. The pairwise confidence intervals for the difference in mean yields do not provide enough information to decide if the treatments meet the grower’s goal.

Question 6 through 8: In 2018, a sample of academic faculty were surveyed about their salaries (in US dollars). The results were classified according to academic rank: instructor, assistant professor, associate professor, and full professor. The table below shows 95% confidence intervals for the population mean of each rank.

Rank

Sample size

Group Mean

95% CI for

Instructor

75

63680

(54583, 72776)

Assistant

175

92029

(86073, 97984)

Associate

145

105133

(98591, 111676)

Full Professor

234

154509

(149359, 159659)

  1. Which of the following statements is an appropriate interpretation based on the confidence intervals? You may assume that this sample is representative of a larger population of academic faculty.
    1. The average salaries of all four ranks are significantly different from each other, because none of the confidence intervals include 0.
    2. We are 95% confident that the population mean salary for instructors is between $54,583 and $72,776.
    3. Roughly 95% of instructors in the population earn between $54,583 and $72,776 per year.
    4. More than one of these statements is an appropriate interpretation of the confidence intervals.
  2. Why is the 95% confidence interval for narrower than the other intervals?
    1. The sample size for full professors is largest, and as sample size increases, the width of the confidence interval tends to decrease.
    2. The group mean for professors is largest, and as group mean increases, the width of the confidence interval tends to decrease.
    3. We can predict with a high level of certainty that full professors make more than other ranks, so the confidence interval provides a precise estimate.
    4. None of the justifications above are reasonable, so professors’ salaries must be less variable (lower SD) compared to the other ranks.
  3. If we changed the confidence level from 95% to 99% (holding everything else constant), would the width of the confidence intervals change?
    1. The width of the confidence intervals would decrease.
    2. The width of the confidence intervals would increase.
    3. The width of some intervals would increase and the width of the other intervals would decrease.
    4. The width of all the confidence intervals would stay the same.
  4. An online retailer is using an experiment to decide whether to modify their website. When visitors type in the web address or click a link to the site, they are randomly re-directed to one of three versions of the website. The retailer’s goal is to maximize the amount of time (in minutes) visitors stay on the site.

Version

Letters

1

A

2

AB

3

B

True or False: In this study, there is a statistically significant difference in mean time spent on the site for Version 1 and Version 2.

  1. Body temperature measurements (in Fahrenheit) were taken from 65 healthy female volunteers aged 18 to 40 that were participating in vaccine trials. Based on this data, researchers calculated a 95% prediction interval: (96.90, 99.89). Interpret the interval. You may assume that the sample is representative of a larger population.
    1. Roughly 95% of healthy females in this population would have body temperatures between 96.90 and 99.89.
    2. We are 95% confident that the sample mean body temperature is between 96.90 and 99.89.
    3. We are 95% confident that the population mean body temperature is between 96.90 and 99.89.
    4. If we were to collect another sample of size 65, we are 95% confident that the sample mean body temperature would be between 96.90 and 99.89.
  2. In a context with a quantitative response variable and a multi-level categorical explanatory variable, which of the following best describes the purpose of the ANOVA F-test?
    1. The F-test helps us assess whether or not there is convincing evidence of an association between the variables.
    2. The F-test helps us measure the strength of the association between the variables by indicating how much the groups differ in terms of the mean response.
    3. The F-test helps us determine the direction of the association between the variables by indicating which group means are higher than others.
    4. The F-test serves all three of the purposes listed above.
  3. A researcher has conducted an experiment to study four different treatments, and they decide to analyze the data by comparing each treatment group mean to every other treatment group mean. This involves tests for six pairwise comparisons. If the researcher uses a significance level of for each test, then the probability of making at least one Type I error is ________ (>, <, =) 0.05.
  4. What should you do to protect against an inflated experiment-wise Type I error rate?
    1. Conduct pairwise comparisons using t-procedures first and only conduct an F-test if the p-values for the t-tests are all large.
    2. Conduct pairwise comparisons using t-procedures first and only conduct an F-test if the p-values for the t-tests are all small.
    3. Conduct an F-test first and only conduct pairwise comparisons using t-procedures if the p-value for the F-test is large.
    4. Conduct an F-test first and only conduct pairwise comparisons using t-procedures if the p-value for the F-test is small.
  5. How are prediction intervals different from confidence intervals?
    1. Prediction intervals predict the population mean for a particular group, thus they are wider than confidence intervals.
    2. Prediction intervals predict the population mean for a particular group, thus they are narrower than confidence intervals.
    3. Prediction intervals predict the response of a new individual observation, thus they are wider than confidence intervals.
    4. Prediction intervals predict the response of a new individual observation, thus they are narrower than confidence intervals.
  6. The validity conditions for confidence intervals and prediction intervals on means require that the data distribution be reasonably bell-shaped and symmetric. Is this condition always important, even when the sample size is large?
    1. This condition is not very important for confidence intervals or prediction intervals, as long as the sample size is large.
    2. This condition is very important for prediction intervals. It is less of a concern for confidence intervals, as long as the sample size is large.
    3. This condition is very important for confidence intervals. It is less of a concern for prediction intervals, as long as the sample size is large.
    4. This condition is very important for both confidence intervals and prediction intervals, regardless of sample size.
  7. Which of the following is the best way to reduce the width of a prediction interval?
    1. Increase the confidence level
    2. Increase the sample size
    3. Reduce the unexplained variation within groups
    4. Use a pooled estimate of the total variation
  8. True or False: Suppose we test using an ANOVA F-test. This is preferable to using 6 pairwise t-tests, because testing all parameters at once controls the probability of Type II error.

Section 1.6: More Study Design Considerations


LO1.6-1: Understand statistical power and how it is impacted by sample size, variability within groups, number of groups, and significance level.

LO1.6-2: Use statistical power analysis to plan the sample size of a study.

  1. A Type II error occurs when researchers _________ (do/don’t) find convincing evidence against the null hypothesis, when the null hypothesis is actually __________ (true/false).
  2. Suppose that you analyzed data from an experiment and obtained a large p-value. Which type of error is possible in this case?
    1. This result could be due to a Type I error.
    2. This result could be due to a Type II error.
    3. This result could be due to either a Type I error or a Type II error.
    4. As long as the validity conditions were met, the large p-value is not due to an error.
  3. The statistical power of a study is the probability that the researchers _______ (will/won’t) find convincing evidence against the null hypothesis, when the null hypothesis is actually __________ (true/false).
  4. Suppose researchers design a study such that the Type I error rate is 5% and the Type II error rate is 20%. Calculate the power of the study. Include the % sign in your answer.

Solution:

  1. True or False: The aspects of a study that impact the strength of evidence (sample size, unexplained variation, number of groups, etc.) are the same ones that impact a study’s power.
  2. How are the probabilities of Type I and Type II error affected by sample size? Assume that variability within groups, number of groups, and significance level remain unchanged.

As the sample size increases, the probability of making a Type I error ________ (increases / decreases / stays the same), and the probability of making a Type II error ________ (increases / decreases / stays the same).

  1. How are the probabilities of Type I and Type II error affected by the significance level, ? Assume that sample size, variability within groups, and number of groups remain unchanged.

As the significance level increases, the probability of making a Type I error ________ (increases / decreases / stays the same), and the probability of making a Type II error ________ (increases / decreases / stays the same).

  1. True or False: When comparing groups with a quantitative response, using a smaller number of groups (fewer levels of the categorical variable) always increases the statistical power of the test.
  2. Suppose a researcher wants to design an experiment with a high level of statistical power. What should they do?

Include a _________ (large/small) number of experimental units in the study.

Choose a number of groups that is as_________ (large/small) as possible without compromising the amount of variability explained.

Take steps during study design and data collection to __________ (increase/decrease) the amount of variability within groups.

Questions 10 and 11: For a class project, a statistics student plans to conduct an experiment to investigate whether standing heart rates tend to be higher than sitting heart rates. They will randomly assign their participants to either sit or stand, then the participants’ heart rates will be measured (in beats per minute).

The student considers a difference of 5 beats per minute to be practically important, so if the difference is 5 beats per minute or larger, they want to be able to detect it. The student expects the standard deviation of each group to be about 12 bpm and plans to use a significance level of and a sample size of 20 in each group.

"Two histograms plot the results of differences in sample means. The first histogram is titled, Graph A colon Simulated values of x bar subscript stand minus x bar subscript sit, assuming mu subscript stand minus mu subscript sit equals 0. The horizontal axis is labeled Difference in sample means and has markings from negative 15 to 10 in increments of 5. The distribution of the bars is approximately normal. The bars are distributed between the points, negative 12 and 13 on the horizontal axis. The bars from negative 12 to 0 are arranged in an increasing trend and the bars from 0 to 13 are arranged in a decreasing trend. The bar at negative 15 is very short and it almost collides with the horizontal axis. From negative 10 to negative 0, there are 10 bars with increase in heights. From 0 to 11, there are 10 bars with decrease in heights. The bars at 12 and 13 are very short and it almost collides with the horizontal axis. All values are approximate.

The second histogram is titled, Graph B colon Simulated values of x bar subscript stand minus x bar subscript sit, assuming mu subscript stand minus mu subscript sit equals 5. The horizontal axis is labeled Difference in sample means and has markings from negative 10 to 20 in increments of 5. The distribution of the bars is approximately normal. The bars are distributed between the points negative 8 and 20 on the horizontal axis. The bar at negative 8 is very short and it almost collides with the horizontal axis. There are 10 bars from negative 5 to 5 which are arranged in an increasing trend and there are 10 bars from 6 to 15 which are arranged in a decreasing trend. Two bars at 16 and 17 are shorter than the bar at 15 and they decrease in heights, respectively. Two bars at 18 and 19 are the shortest bars which almost collide with the horizontal axis. All values are approximate."

  1. Which of the graphs above would the student use to find the rejection region?
    1. Graph A
    2. Graph B
    3. Either of these two graphs could be used to find the rejection region.
    4. Neither of these two graphs could be used to find the rejection region.
  2. The rejection region for this study is a difference of means of 9.1 or higher. Which of the values below is closest to the power of the test, given that the difference in standing and sitting heart rates is really 5 beats per minute?
    1. 1%
    2. 15%
    3. 50%
    4. 90%

Questions 12 and 13: Do older adults (ages 65+) have lower body temperatures than younger adults (ages 18-64), on average? Researchers decide to conduct a test of vs. at the significance level with sample sizes of 25 in each group.

Suppose that the true mean body temperature for older adults is 97.5º F, the true mean body temperature for younger adults is 98.6º F, and both groups have a standard deviation of 0.75º F.

"Two histograms plot the results of differences in sample means. The first histogram is titled, Graph 1 colon Simulated values of x bar subscript older minus x bar subscript younger, assuming mu subscript older minus mu subscript younger equals 0. The horizontal axis is labeled Difference in sample means and has markings from negative 0.5 to 0.5 in increments of 0.5. The distribution of the bars is approximately normal. The bars are distributed between the points negative 0.75 and 0.75 on the horizontal axis. The bars at negative 0.75 and 0.75 are very short and they almost collide with the horizontal axis. There are 13 bars from negative 0.7 to 0 which are arranged in an increasing trend and there are 13 bars from 0.05 to 0.7 which are arranged in a decreasing trend. The bar at 0.1 is shorter than the bars at 0.05 and 0.15. From 0.55 to 0.7, there are 3 bars with decreasing heights. A red vertical line extends from negative 0.5 and covers the whole range of the vertical axis. All values are approximate. Below the histogram, is a field labeled count samples, which displays a less than or equal to symbol in a dropdown option, and to its right is a value text box which displays the value negative 0.51. An expression given at the bottom reads, Count equals 100 over 10000 equals 0.0100 (highlighted).

The second histogram is titled, Graph 2 colon Simulated values of x bar subscript older minus x bar subscript younger, assuming mu subscript older minus mu subscript younger equals 97.5 minus 98.6 equals negative 1.1. The horizontal axis is labeled Difference in sample means and has markings from negative 1.5 to negative 0.5 in increments of 0.5. The distribution of the bars is approximately normal. The bars are distributed between the points negative 1.75 and 0.49 on the horizontal axis. The bar at 0.49 is very short and it almost collides with the horizontal axis. There are 11 bars from negative 1.75 to negative 1.2 which are arranged in an increasing trend and there are 11 bars from negative 1.1 to negative 1.1 which are arranged in a decreasing trend. The bar at negative 1.15 is shorter than the bar at negative 1.2 and larger than the bar at negative 1.1. The bar at negative 0.5 is slightly higher than the bar at negative 0.49. A red vertical line extends from negative 0.5 and covers the whole range of the vertical axis. All values are approximate. Below the histogram, is a field labeled count samples, which displays a less than or equal to symbol in a dropdown option, and to its right is a value text box which displays the value negative 0.51. An expression given at the bottom reads, Count equals 9967 over 10000 equals 0.9967 (highlighted). "

  1. Match each term to its representation in the graphs above.

Power: A. The area to the left of the red line in Graph 1

Prob(Type I error): B. The area to the left of the red line in Graph 2

Prob(Type II error): C. The area to the right of the red line in Graph 2

  1. The rejection region for this study is a difference of means () of -0.51 or lower. Suppose the significance level was changed from to . The boundary of the rejection region (the red line) would shift to the _____ (left/right), and the power would ______ (increase/decrease).

Question 14 and 15: Olestra was approved by the FDA for use in snack foods as a fat substitute in the 1990s. Because there were anecdotal reports of stomach (GI) problems associated with Olestra consumption, researchers planned to carry out an experiment to compare GI symptoms after consuming Olestra potato chips or regular potato chips.

The researchers consider a difference of proportions of 0.05 to be practically significant. That is, if 20% of people experience GI problems when eating Olestra and 15% experience GI problems while eating regular potato chips, they want to be able to detect the difference between the two. They use software to conduct a power analysis to decide how large the sample size needs to be in order for the study to have 80% power with a significance level of and a one-sided alternative hypothesis.

  1. Statistical power is the probability of concluding that the risk of GI problems for those eating chips with Olestra is ___________ (higher than / the same as) the risk for those eating regular potato chips, given that the difference in proportions who experience GI problems is really equal to ____ (0 / 0.05).
  2. The power analysis suggests a sample size of at least 714. One of the researchers’ colleagues is surprised that such a large sample size is necessary. Which of the following is the best explanation?
    1. The difference between 15% and 20% is small, and small effect sizes are more difficult to detect, so a large sample size is required.
    2. The significance level is fairly high. If the researchers changed the significance level from to they wouldn’t need such a large sample size.
    3. 80% power is an unusually high value that demands an unusually large sample. Lowering the power would make their plan more acceptable to funding agencies.
    4. One-sided tests always require larger samples. If they used a two-sided test, the necessary sample size would be roughly half as large.

Document Information

Document Type:
DOCX
Chapter Number:
1
Created Date:
Aug 21, 2025
Chapter Name:
Chapter 1 Intermediate Statistical Investigations Test Bank
Author:
Nathan Tintle

Connected Book

Intermediate Statistical Investigations 1st Ed - Exam Bank

By Nathan Tintle

Test Bank General
View Product →

$24.99

100% satisfaction guarantee

Buy Full Test Bank

Benefits

Immediately available after payment
Answers are available after payment
ZIP file includes all related files
Files are in Word format (DOCX)
Check the description to see the contents of each ZIP file
We do not share your information with any third party