Ch2 Exam Questions Intermediate Statistical Investigations - Intermediate Statistical Investigations 1st Ed - Exam Bank by Nathan Tintle. DOCX document preview.

Ch2 Exam Questions Intermediate Statistical Investigations

Chapter 2

Intermediate Statistical Investigations Test Bank

Question types: FIB = Fill in the blank Calc = Calculation

Ma = Matching MS = Multiple select

MC = Multiple choice TF = True-false

CHAPTER 2 TERMINAL LEARNING OUTCOMES

TLO2-1: Analyze paired data as a way to control observational unit variation appropriately.

TLO2-2: Analyze data on quantitative response from a randomized complete block design using both simulation and theory-based approaches.

TLO2-3: Analyze an observational study with two sources of explained variation understanding adjustment for another variable in the analysis and covariation.

Section 2.1: Paired Data

LO2.1-1: Use pairing to potentially reduce unexplained variation and increase the power of a study.

LO2.1-2: Explain how to analyze paired data appropriately.

Questions 1 through 3: Statisticians recommend randomizing the order of treatments in paired experiments, but is order really important? To investigate, 22 students in a statistics class each completed a memory puzzle twice, recording the amount of time (in seconds) it took to complete the puzzle on each attempt. They also calculate the difference between their two attempts (first – second). The results of the experiment are shown below.

First Attempts

(n=22)

Second Attempts

(n=22)

Differences

(n=22)

Mean = 83.755

SD = 18.318

Mean = 75.963

SD = 18.008

Mean = 7.792

SD = 8.283

  1. What is the value of the t-statistic for this study?

Sol:

  1. What is the value of the F-statistic for this study?

Sol:

  1. The 95% confidence interval for the mean difference between first and second attempts is (4.120, 11.465). Does this interval provide strong evidence of an order effect in this context? You may assume 0.05.
    1. Yes, there is strong evidence of an order effect, because the sample mean (7.792) is inside the interval.
    2. Yes, there is strong evidence of an order effect, because 0 is outside the interval.
    3. No, there is not strong evidence of an order effect, because the sample mean (7.792) is inside the interval.
    4. No, there is not strong evidence of an order effect, because 0 is outside the interval.

Questions 4 and 5: An economist at Vanderbilt University designed a study to compare different types of online auctions. In one experiment, he compared a Dutch auction to a first-price sealed bid auction. In the Dutch auction, the item for sale starts at a very high price and is lowered gradually until someone finds the price low enough to buy. In the first-price sealed bid auction, each bidder submits a single sealed bid, and the highest bid wins.

The researcher auctioned off collectible trading cards from the game Magic: The Gathering. He placed pairs of identical cards up for auction; one would go into the Dutch auction and other to the first-price sealed bid auction. He then looked at the difference in the prices he received on each pair. He repeated this for a total of 88 pairs.

"Three histograms. The first histogram describes the results of selling price of Dutch auction. The horizontal axis is labeled Selling price (Dutch auction) and ranges from 0 to 25 in increments of 5. The distribution of the bars is approximately right-skewed. From 0 to 10, there are 5 bars with their heights progressively decreasing. From 10 to 15, 2 small bars of same heights are distributed. From 15 to 20, there is a bar which slightly extends above the horizontal axis. From 20 to 25, 2 bars of different heights are randomly distributed. From 25 to 27.5, there is a bar which slightly extends above the horizontal axis. The longest bar is at 0 and the shortest bars are at 12.5, 15, and 22.5. The tip of the bar above 0, 5, 12.5, 15, and 22.5 is highlighted. All values are approximate.
The second histogram describes the results of selling price of first price sealed bid. The horizontal axis is labeled Selling Price (First price sealed bid) and ranges from 0 to 25 in increments of 5. The distribution of the bars is approximately right-skewed. From 0 to 10, there are 5 bars with their heights progressively decreasing. From 10 to 15, there is a bar which slightly extends above the horizontal axis. From 15 to 20, there are 2 bars which almost coincide with the horizontal axis. The bars at 15, 17.5 are of the same heights which extend slightly above the horizontal axis. From 20 to 25, there is a bar which slightly extends above the horizontal axis. From 25 to 27.5, there is a bar which slightly extends above the horizontal axis. The longest bar is at 0 and the shortest bars are at 15, and 17.5. The tip of the bar above 0, 5, 10, 22.5, and 27.5 is highlighted. All values are approximate. 
The third histogram describes the results of difference of Dutch and first price. The horizontal axis is labeled Difference (Dutch minus First price) and has markings from negative 1.5 to 2.5 in increments of 0.5. The distribution of the bars is approximately normal and it starts from negative 1.5, and ends at 2.5. The longest bar is at 0 on the horizontal axis and the bars decrease in height to the left of 0 and to the right of 0. The tip of the bar above negative 1, negative 0.5, 0, 0.5, and 1 is highlighted. All values are approximate."

  1. Which of the following are appropriate ways to state the null hypothesis in this scenario? Select all that apply.
    1. There is an association between type of auction and selling price.
    2. There is no association between type of auction and selling price.
  2. Are the validity conditions met for a one-sample t-test for paired data?
    1. No, because the distribution of selling prices are skewed right for both types of auctions.
    2. No, because the spread of the differences is less than half the spread of the original selling prices.
    3. Yes, because the distribution of the differences is unimodal and approximately symmetric.
    4. Yes, because the spread of the selling prices is roughly the same for both types of auctions.

Questions 6 through 10: Is there a difference between standing and sitting heart rates? To decide, 52 students measured their heart rates (in beats per minute) in two positions: once while standing and once while sitting. Each student then calculated the difference in their measurements: standing heart rate – sitting heart rate. Data are shown below.

"Two side by side dotplots compare the student’s heart rates for two positions, namely, standing and sitting. The number line is labeled outcomes and ranges from 48 to 102 in increments of 6. For standing, the dots are plotted as follows: 1 dot above 57; 3 dots above 60; 1 dot above 61, 62; 2 dots above 65.9; 3 dots above 68; 1 dot above 69; 4 dots above 70; 5 dots above 72; 2 dots above 74; 3 dots above 76; 5 dots above 77.8; 3 dots above 80; 1 dot above 81; 2 dots above 82; 3 dots above 84; 4 dots above 86; 2 dots above 88; 2 dots above 92; and 1 dot above 94, 98, 100, 102. The mean is 77.019 and the standard deviation is 10.687. For sitting, the dots are plotted as follows: 1 dot above 48, 54; 3 dots above 57; 1 dot above 58; 3 dots above 60.1; 1 dot above 62; 7 dots above 64; 4 dots above 66; 4 dots above 68; 1 dot above 69; 4 dots above 70; 2 dots above 72.1; 3 dots above 73.2; 6 dots above 76; 3 dots above 78; 3 dots above 80; 4 dots above 82; and 1 dot above 84.1, 88. The mean is 70.000 and the standard deviation is 8.670. Multiple lines connecting the dots of standing and sitting heart rate, respectively, describe pairings. The pairing lines starts from 48 and 60 and ends at 88 and 79.9 on the horizontal axis. The concentration of the pairing lines is more in between the points, 62 to 84 on the horizontal axis. Four significant pairing lines are at the point, 57 and 64, 61 and 70, 75 and 82, 80 and 88. All values are approximate. 
Below this, another dotplot plots the differences between standing and sitting heart rate. The number line is labeled differences has markings from negative 0.24 to 0.24 in increments of 6. The dots are plotted as follows: 2 dot above negative 10, negative 9; 4 dots above negative 0.1; 8 dots above 2; 5 dots above 4, 6; 2 dots above 7; 4 dots above 8; 6 dots above 10; 5 dots above 12; 3 dots above 14, 18.1; and 1 dot above 22, 24.1, 25. The mean is 7.019 and the standard deviation is 7.681. All values are approximate. "

  1. Which of the following inference methods is appropriate for this scenario? Select all that apply.
    1. Two-sample t-test
    2. One-way ANOVA F-test
    3. t-test for paired data
  2. In this study, it is helpful to control for person-to-person variability, removing it as a source of unexplained variation. Which features of the graph above indicate that pairing will lead to a more powerful test in this scenario? Select all that apply.
    1. The connected lines show that those with high standing heart rates also tend to have high sitting heart rates and vice versa.
    2. On average, standing heart rates tend to be higher than sitting heart rates (based on sample means).
    3. The variability of the differences is less than the variability within treatment groups (based on sample standard deviations).
    4. The distribution of difference is unimodal and roughly symmetrical, so it is reasonable to assume the residuals are Normally distributed.
  3. The following statistical model could be used to predict heart rates.

Predicted heart rate = overall mean + position effect + person effect

One participant had a standing heart of 88 and a sitting heart rate of 82. Using this information and the data shown above, calculate the person effect for this participant.

Sol: Person mean = (88+82)/2 = 85; Overall mean = (77.0+70.0)/2 = 73.5; Person effect = 85–73.5 = 11.5

  1. Based on the ANOVA table below, what proportion of the variability in heart rates is due to participants (person-to-person variability)? Give your answer as a decimal.

Source

DF

Sum of Squares

Mean Squares

F

p-value

Position

1

1281.0

1281.0

43.4243

<0.001

Participant

51

8154.5

8154.5

5.4201

<0.001

Residuals

51

1504.5

29.50

Total

103

10940.0

Sol: 8154.5/10940.0 = 0.74

  1. As shown in the ANOVA table below, there is strong evidence of a position effect – a difference between standing and sitting heart rates. Suppose you had used only position as a predictor of heart rate, ignoring person-to-person variability in your model. How would the strength of evidence change?

The evidence of a position effect would be ______ (stronger/weaker). Specifically SSError would be ______ (larger/smaller), which would lead to a ______ (larger/smaller) F-statistic for testing the effect of position.

Source

DF

Sum of Squares

Mean Squares

F

p-value

Position

1

1281.0

1281.0

43.4243

<0.001

Participant

51

8154.5

8154.5

5.4201

<0.001

Residuals

51

1504.5

29.50

Total

103

10940.0

  1. In a paired experiment to compare sleep aids, participants all try two different treatments. The order of the treatments is randomized, and sleep quality is measured for each treatment. A partially completed ANOVA table is given below.

Source

DF

Sum of Squares

Mean Squares

F

Treatment

1

322

Person

19

561

Error

19

Total

39

1211

Calculate the standard error of the residuals.

Sol: SSError = ; SE of residuals =

  1. Which of the following are examples of paired designs? Select all that apply.
    1. A teacher gives each of her students two cookies: one made with eggs and butter and one made with a vegan recipe. The students rate the taste of each cookie on a scale of 1-10.
    2. The managing partner of a consulting firm wants to compare the effectiveness of two computer skills training programs. Employees are randomly assigned to complete one of the two programs. Afterward, both groups take the same computer skills test.
    3. A college student is trying to decide which of two grocery stores – Publix or Kroger – offers better prices. The student selects 50 items that are available in both stores, and for each of these items, records both the Publix price and the Kroger price.
  2. Which of the following are benefits of a paired design compared to an independent groups design?
    1. Paired designs allow us to generalize results to a larger population.
    2. Paired designs always support cause-and-effect conclusions.
    3. Paired designs are always practical, whereas independent groups designs do not make sense in some contexts.
    4. Paired designs often lead to tests with more statistical power to detect an effect.
  3. In a paired design where each participant is observed under two conditions (treatment and control), why is it inappropriate to conduct a two-sample t-test?
    1. Because it is not reasonable to consider multiple observations on the same individual as independent
    2. Because it is not reasonable to consider multiple observations on the same individual as Normally distributed
    3. Because this design leads to large differences in standard deviations for the treatment and control groups
    4. Because this design leads to large differences in means for the treatment and control group
  4. True or False: In a paired design where each participant is observed under two conditions (treatment and control), the sum of the person effects across all participants will be zero.

Section 2.2: Randomized Complete Block Designs

LO2.2-1: Analyze a randomized complete block design using simulation-based methods.

LO2.2-2: Analyze a randomized complete block design using theory-based methods.

  1. True or False: In a randomized complete block design, each block should be a diverse group of observational units, roughly mirroring the population of interest.
  2. A matched pairs design is a specific kind of block design. When can the term “matched pairs” be used to describe a block design?
    1. When there are only two blocks of experimental units
    2. When there are only two observations in each block
    3. When there are two independent groups of experimental units
    4. When the alternative hypothesis in the test for a treatment effect is two-sided
  3. A blocking variable is a variable that the researcher believes is ________ (related to/independent of) the response variable; this variable ________ (is/is not) manipulated by the researcher.
  4. The table below shows the means and standard deviations for each block in a randomized block design.

Block

Mean

Response

Standard Deviation of Responses

1

500

12

2

501

14

3

498

9

4

497

12

Was the choice of blocking variable effective in this study?

    1. Yes, because the mean responses are very similar for the four blocks.
    2. No, because the mean responses are very similar for the four blocks.
    3. Yes, because the standard deviations are fairly similar for the four blocks.
    4. No, because the standard deviations are not similar enough for the four blocks.

Questions 5 through 7: A farmer has a field with three rows, where each row contains nine plants. The farmer suspects that the row may affect plants’ yield, because of variations in sunlight across his property, so he decides to use a randomized block design to test the effectiveness of three different fertilizers.

  1. Describe an appropriate randomized complete block design.
    1. Randomly assign the plants to one of the three fertilizers, which results in nine plants per treatment. Record both the yield and the row for each plant.
    2. Randomly sample one plant from each of the three rows, then randomly assign the three selected plants to the three different fertilizers.
    3. Randomly assign each of the three rows to one of the three fertilizers, so that all plants in a given row will receive the same treatment.
    4. For each row, randomly assign the nine plants to the three fertilizers, which results in three plants per treatment in each row.
  2. The null distribution of F-statistics was produced by randomly re-assigning the plant yields within blocks and calculating the F-statistic for each shuffle. The observed F-statistic for the real study was 0.92. What do you conclude?

A histogram describes the simulated null distribution of Shuffled F-statistics. The horizontal axis is labeled Shuffled F-statistics and ranges from 0 to 1.000 in increments of 0.200. The vertical axis is labeled Count. The distribution of the bars is approximately right-skewed. From 0 to 0.200, there are 6 bars with their heights progressively decreasing. From 0.200 to 0.400, 5 bars of different heights are randomly distributed. From 0.400 to 0.600, 5 small bars of different heights are randomly distributed. From 0.600 to 0.800, there are 5 bars which slightly extend above the horizontal axis. The bars at 0.68, 0.72 are of the same heights which extend slightly above the horizontal axis and the bar above 0.76 is slightly longer than the other 4 bars. From 0.800 to 1.000, 5 bars which almost coincide with the horizontal axis. The longest bar at 0 and the shortest bars are at 0.84, 0.92, and 1.000. The mean is 0.179, the standard deviation is 0.169, and the total shuffles is 1000. All values are approximate.

    1. After adjusting for block effects, this data provides strong evidence that all three fertilizers are the same in terms of their effect on yield.
    2. After adjusting for block effects, this data provides strong evidence that all three fertilizers are different in terms of their effect on yield.
    3. After adjusting for block effects, this data provides strong evidence that at least one of the fertilizers are different in terms of its effect on yield.
    4. Even after adjusting for block effects, this data provides only weak evidence of a fertilizer effect.
  1. The null distribution of F-statistics was produced by randomly re-assigning the plant yields to treatments within blocks and calculating the F-statistic for each shuffle. How would the null distribution change if the plant yields were randomly re-assigned to treatments completely at random ignoring blocks?

A histogram describes the simulated null distribution of Shuffled F-statistics. The horizontal axis is labeled Shuffled F-statistics and ranges from 0 to 1.000 in increments of 0.200. The vertical axis is labeled Count. The distribution of the bars is approximately right-skewed. From 0 to 0.200, there are 6 bars with their heights progressively decreasing. From 0.200 to 0.400, 5 bars of different heights are randomly distributed. From 0.400 to 0.600, 5 small bars of different heights are randomly distributed. From 0.600 to 0.800, there are 5 bars which slightly extend above the horizontal axis. The bars at 0.68, 0.72 are of the same heights which extend slightly above the horizontal axis and the bar above 0.76 is slightly longer than the other 4 bars. From 0.800 to 1.000, 5 bars which almost coincide with the horizontal axis. The longest bar at 0 and the shortest bars are at 0.84, 0.92, and 1.000. The mean is 0.179, the standard deviation is 0.169, and the total shuffles is 1000. All values are approximate.

    1. The null distribution of F-statistics would have a larger spread.
    2. The null distribution of F-statistics would have a smaller spread.
    3. The null distribution of F-statistics would be skewed left instead of skewed right.
    4. The null distribution of F-statistics would be approximately Normal.

Questions 8 through 12: Concentration is a one-person memory game in which cards are laid face down on a surface and two cards are flipped face up at a time. The object of the game is to turn over matching pairs of cards.

An online version of this game includes three different sets of cards: one has images of animals on the cards, one has images of babies, and one has images of holiday scenes. Are these three variations equally difficult? To investigate, eight students tried all three versions of the game in random order. They recorded the amount of time (in seconds) it took to complete the game. The overall mean time to complete the game was 79.31 seconds. The graphs below display the raw data and the data after adjusting for block effects.

"Two set of dotplots with three parallel dotplots compare the results of raw and adjusted data for block effects. The first dotplot is titled, Raw data (not adjusted for block effects). In all the three sets, the horizontal axis is labeled Time and has markings from 50 to 100 in increments of 10. The vertical axis has three markings, Group Holiday (n equals 8), Group Baby (n equals 8), and Group Animal (n equals 8), in the order from top to bottom. For Group Holiday (n equals 8), the dots are plotted as follows: 1 dot above 57, 59, 69, 82, 86, 88, 89, and 90. There are no dots above 50, 60, 70, 80, and 100. The mean is 78.003 and the standard deviation is 14.152. A block arrow points to the point, 78 on the horizontal axis. For Group Baby (n equals 8), the dots are plotted as follows: 1 dot above 59, 73, 74, 88, 89, 92, 98, and 99. There are no dots above 50, 60, 70, 80, 90, and 100. The mean is 83.925 and the standard deviation is 13.875. A block arrow points to the point, 83.5 on the horizontal axis. For Group Animal (n equals 8), the dots are plotted as follows: 1 dot above 61; 3 overlapping dots at 63, 64, and 65; 1 dot above 78, 82, 84, and 94. There are no dots above 50, 60, 70, 80, 90, and 100. The mean is 75.985 and the standard deviation is 11.460. A block arrow points to the point, 76 on the horizontal axis. All values are approximate.
The second dotplot is titled, Adjusted data (adjusted for block effects). In all the three sets, the horizontal axis is labeled Time and has markings from 50 to 100 in increments of 10. The vertical axis has three markings, Group Holiday (n equals 8), Group Baby (n equals 8), and Group Animal (n equals 8), in the order from top to bottom. For Group Holiday (n equals 8), the dots are plotted as follows: 1 dot above 74, 75, 77, 78, 79, 81, and 82. There are no dots above 50, 60, 70, 80, and 100. The mean is 78.003 and the standard deviation is 3.520. A block arrow points to the point, 78 on the horizontal axis. For Group Baby (n equals 8), the dots are plotted as follows: 1 dot above 77, 79; 3 overlapping dots at 83, 83.5, and 84; 3 overlapping dots at 86, 86.5, 87; 1 dot above 90. There are no dots above 50, 60, 70, 80, and 100. The mean is 83.925 and the standard deviation is 4.242. A block arrow points to the point, 83.5 on the horizontal axis. For Group Animal (n equals 8), the dots are plotted as follows: 1 dot above 70 and 73; 2 overlapping dots at 74 and 74.5; 1 dot above 79, 82 and 83. There are no dots above 50, 60, 80, 90, and 100. The mean is 75.985 and the standard deviation is 4.870. A block arrow points to the point, 76 on the horizontal axis. All values are approximate."

  1. To find adjusted data values, it is first necessary to calculate block effects for each student in the sample. One student completed the animal version of the game in 65 seconds, the baby version in 73 seconds, and the holiday version in 59 seconds. Calculate the block effect for this student.

Sol: Student average = (65+73+59)/3 = 65.67; Student effect = 65.67–79.31 = -13.64

  1. Compare the variability in the adjusted data to the variability in the raw data.

In the adjusted data, the between-groups variability, as measured by SSModel, is ______ (>, <, =) the between-groups variability in the raw data.

In the adjusted data, the within-groups variability, as measured by SSError, is ______ (>, <, =) the within-groups variability in the raw data.

  1. In a test to decide whether the three versions of the game are equally difficult, which dataset would produce a larger F-statistic?
    1. The raw dataset (not accounting for blocks) would produce a larger F-statistic.
    2. The adjusted dataset (accounting for blocks) would produce a larger F-statistic.
    3. The F-statistics for the two datasets would be equal.
  2. You want to use simulation to test whether the three versions of the game are equally difficult. Which of the following is an appropriate design for the simulation. Select all that apply.
    1. Write the times from the raw dataset on cards. Shuffle and deal them into three piles representing the three versions of the game ignoring blocks. Calculate the F-statistic based on a one-variable ANOVA. Repeat.
    2. Write the times from the raw dataset on cards. Randomly re-assign the times to versions within blocks. Calculate the F-statistic based on a one-variable ANOVA. Repeat.
    3. Write the times from the adjusted dataset on cards. Shuffle and deal them into three piles representing the three versions of the game ignoring blocks. Calculate the F-statistic based on a one-variable ANOVA. Repeat.
  3. True or False: The validity conditions for ANOVA with a blocking variable are the same as the validity conditions for ANOVA without a blocking variable, except that the conditions are checked based on the adjusted data not the raw data.

Questions 13 through 15: Sixty-eight allergy sufferers were recruited to participate in a study comparing the effectiveness of different allergy treatments. During allergy season, each participant tried three treatments with active ingredients plus one placebo treatment, with the order of the treatments randomized and adequate time between treatments. Participants rated the severity of their allergy symptoms while taking each treatment. A partially filled in ANOVA table is given below.

Source

DF

Sum of Squares

Mean Squares

F

Treatment

2174.4

Participant

3913.6

Error

201

Total

271

8697.4

  1. In this study, what are the blocks?
    1. The 68 allergy sufferers
    2. The 3 allergy treatments with active ingredients
    3. The 4 allergy treatments (including the placebo)
    4. The 272 ratings recorded across participants
  2. Calculate the F-statistic that could be used to test for a treatment effect.

Sol: ; ;

  1. What if this analysis had failed to account for person-to-person variability in allergy symptoms? How would the ANOVA table change?

The row for Participant would be removed, and the amount of variation due to individual differences in allergy symptoms (SSParticipants = 3913.6) would be added to _________ (SSTreatment/SSError), which would make the F-statistic _________ (larger/smaller).

Section 2.3: Observational Study with Two Explanatory Variables

LO2.3-1: Analyze an observational study with two sources of explained variation.

LO2.3-2: Understand what it means to adjust for another variable in the analysis.

LO2.3-3: Explore effects of covariation in explanatory variables on the analysis.

Questions 1 through 4: In 2018, a sample of 628 academic faculty from universities across the country were surveyed about their salaries (in US dollars). The results were classified according to each faculty member’s academic rank (instructor, assistant professor, associate professor, and full professor) and gender (male, female).

  1. Calculate the effect of “female” on wages. Note that the sample sizes for male and female faculty members are not equal.

Salary

Mean

SD

Full sample (n = 628)

$114,946

$51,681

Male (n = 428)

$123,953

$55,281

Female (n = 200)

$95,671

$55,281

Sol: ;

  1. In the sample, male faculty members are paid more than female faculty members. Does this analysis provide evidence that gender causes differences in salary?
    1. It depends whether the results are statistically significant. If this difference results in a small p-value, then the analysis supports causal conclusions.
    2. It depends how the sample was selected. If the sample is representative of the population, then the analysis supports causal conclusions.
    3. No, because this sample is not large enough. There are over a million academic faculty in the U.S., so it is not appropriate to draw conclusions based on only 628.
    4. No, because data come from an observational study not a randomized experiment, so there may be confounding variables present.

Ans D; LO2.3-1; Difficulty: Medium; Type: MC

  1. The ANOVA table below summarizes a model that uses both gender and academic rank to predict a faculty member’s salary (in $1000s).

Source

DF

Adj SS

Adj MS

F

p-value

Gender

1

15,167

15,167

9.54

0.0021

Rank

3

575,905

191,968

120.81

<0.0001

Error

623

989,757

1,589

Total

627

1,674,691

Calculate the sum of squares for the covariation - the amount of variation in salaries that is attributed jointly to gender and rank.

    1. SSCovariation = SSGender + SSRank = 591,072
    2. SSCovariation = SSGender + SSRank + SSError = 1,580,829
    3. SSCovariation = SSTotal – SSGender – SSRank = 1,083,619
    4. SSCovariation = SSTotal – SSGender – SSRank – SSError = 93,862

Ans D; LO2.3-3; Difficulty: Medium; Type: MC

  1. Statistical software was used to calculate the adjusted effects of gender and rank. The statistical model is given below.

with

According to this model, if we compare two faculty members of the same rank, on average, the male faculty member earns _______ dollars more per year than the female faculty member.

Questions 5 through 8: A real estate agent collected information on 100 recent home sales in their town. In addition to selling prices (in $1000s), the agent recorded information about the home’s location (north side of town or south side of town) and the number of bedrooms (at least 3 or less than 3). A partially filled in ANOVA table for the two-variable model is given below.

Source

DF

Adj SS

Adj MS

F

p-value

R2

Location

1

19,843

0.0100

0.0631

Bedrooms

1

22,180

0.0066

0.0705

Error

97

278,799

Total

99

314,433

  1. What does the adjusted sum of squares for Location represent?

The variation in prices that is explained by Location, _________ (excluding/including) the variation explained jointly by Location and Bedrooms.

Ans excluding; LO2.3-2; Difficulty: Easy; Type: FIB

  1. For a one-variable model that uses only Location as a predictor of price, R2 = 0.09. Is there an association between Location and Bedrooms?
    1. Yes, there is an association between Location and Bedrooms.
    2. No, there is not an association between Location and Bedrooms.
    3. We do not have enough information to decide if there is an association between Location and Bedrooms.

Ans A; LO2.3-3; Difficulty: Medium; Type: MC

  1. Calculate the F-statistic for Location.

Sol:

  1. After adjusting for the number of bedrooms, does this data provide strong evidence of a difference in the population mean prices for houses on the north side and houses on the south side? You may assume that the sample is representative of all houses in this town.
    1. No, because the p-value for Location is small.
    2. No, because the R2 value for Location is small.
    3. Yes, because the p-value for Location is small.
    4. Yes, because the R2 value for Location is small.

Questions 9 and 10: Sixty-eight allergy sufferers were recruited to participate in a study comparing the effectiveness of different allergy medications. During allergy season, all 68 participants tried three medications with active ingredients plus one placebo medication, with the order of the treatments randomized and adequate time in between. Participants rated the severity of their allergy symptoms while taking each treatment.

  1. True or False: The mean of the four treatment means will be the same as the overall mean rating.
  2. True or False: The design of this study ensures there is no relationship between the treatments and the blocks; thus there is no covariation.

Questions 11 and 12: In an observational study, two categorical explanatory variables are used to predict a response. You may assume that there is a moderate amount of covariation.

Source

DF

Adj SS

F

p-value

Variable 1

Variable 2

Error

Total

  1. How do you calculate SSModel for this study?
    1. SSModel = SSVariable1 + SSVariable2
    2. SSModel = SSTotal – SSError
    3. Both (A) and (B) would produce the correct value for SSModel.
    4. Neither (A) nor (B) would produce the correct value for SSModel.
  2. Suppose Variable 2 were removed from the model. Would the adjusted sums of squares and p-value for Variable 1 change?
    1. Removing Variable 2 would not affect the sum of squares or p-value for Variable 1.
    2. Removing Variable 2 would not affect the sum of squares for Variable 1, but the p-value for Variable 1 may change.
    3. Removing Variable 2 would not affect the p-value for Variable 1, but the sum of squares for Variable 1 may change.
    4. Removing Variable 2 may change both the sum of squares and the p-value for Variable 1.

Questions 13 through 15: Students in a statistics class wanted to know how customers rate one of their favorite coffee shops. The students administered surveys and asked customers to rate their experience in the shop on a scale of 1-10. They also asked whether the survey respondent was a college student and how often they visit coffee shops (at least once a week or less than once a week).

  1. The students decide to use a two-variable model that predicts ratings based on whether the customer is a student and the frequency of their coffee shop visits. Based on the graph below, do you expect covariation – variation in ratings that is attributed jointly to Student and Frequency?

A mosaic plot of frequency of coffee shop visits by students has two categories: At least once per week and Less than once per week. The horizontal axis is labeled Student and has the following markings, No and Yes, in the order from left to right. The vertical axis is labeled frequency and ranges from 0 to 100 in increments of 20 percent. The heights of the rectangular bars denoting each response are as follows: For No: At least once per week, 31 percent and Less than once per week, 69 percent. For Yes: At least once per week, 55 percent and Less than once per week, 45 percent. The width of the rectangular bars for Yes is 2.3 times more than that of the width of the rectangular bars for No. All values are approximate.

    1. Yes, because more than half of the customers in the sample are students.
    2. Yes, because students tend to give higher ratings than non-students.
    3. Yes, because students are more likely to visit coffee shops at least once per week.
    4. No, because the sample is evenly divided between customers who visit coffee shops at least once per week and those who visit less than once per week.
  1. Consider two possible models for predicting ratings. A one-variable model that predicts ratings based on frequency of coffee shop visits has an R2 value of 0.06. A one-variable model that predicts ratings based on whether the customer is a student has an R2 value of 0.03. What is the R2 value of a two-variable model that uses both Frequency and Student to predict ratings?
    1. R2 = 0.06 + 0.03 = 0.09
    2. R2 = 0.06 - 0.03 = 0.03
    3. R2 = (0.06 + 0.03) 2 = 0.045
    4. There is not enough information to calculate R2 for the two-variable model.
  2. Consider two possible models to predict ratings:

Model 1:

, with

Model 2:

with

Compare the adjusted sum of squares for student in each of these two models.

The adjusted sum of squares for student in Model 1 ________ (>, <, =) the adjusted sum of squares for student in Model 2.

Document Information

Document Type:
DOCX
Chapter Number:
2
Created Date:
Aug 21, 2025
Chapter Name:
Chapter 2 Intermediate Statistical Investigations Test Bank
Author:
Nathan Tintle

Connected Book

Intermediate Statistical Investigations 1st Ed - Exam Bank

By Nathan Tintle

Test Bank General
View Product →

$24.99

100% satisfaction guarantee

Buy Full Test Bank

Benefits

Immediately available after payment
Answers are available after payment
ZIP file includes all related files
Files are in Word format (DOCX)
Check the description to see the contents of each ZIP file
We do not share your information with any third party