Full Test Bank | Data Processing and Elementary Data – Ch.15 - Social Research 6e | Test Bank Singleton by Royce A. Singleton. DOCX document preview.
CHAPTER 15
Data Processing and Elementary Data Analysis
Multiple Choice
- What is the order of steps in the quantitative analysis of survey data?
- inspect/modify data 🡪 data processing 🡪 bivariate analysis 🡪 multivariate testing
- inspect/modify data 🡪 data processing 🡪 multivariate testing 🡪 bivariate analysis
- data processing 🡪 inspect/modify data 🡪 bivariate analysis 🡪 multivariate testing
- data processing 🡪 inspect/modify data 🡪 multivariate testing 🡪 bivariate analysis
- Editing of survey data
- involves checking for inconsistencies and omitted responses.
- is carried out prior to the process of data collection.
- is applied mostly to computer-assisted interviewing surveys.
- is the sole responsibility of the project supervisor.
- Editing may include all but which one of the following activities?
- evaluating interviewers and detecting interview problems
- checking for improper responses such as multiple answers to a single item
- correcting and coding missing data
- checking for wild code data-entry errors
- Which of the following statements is not true of coding responses to open-ended questions?
- Coding categories are usually developed from a sample of 50 to 100 responses.
- The number of coding categories usually exceeds 100.
- Both theory and data guide the construction of coding categories.
- Unique numbers or codes are assigned to each category of response.
- Generally, the coding of closed-ended questions takes place
- before data collection.
- during data collection.
- after data collection.
- after data processing.
- Obtaining frequency distributions for all the variables in a data file is one way to
- do wild-code checking.
- do consistency checking.
- verify data entry.
- edit the data.
- Wild-code checking and consistency checking are techniques for
- data entry.
- data modification.
- coding data.
- cleaning data.
- What is the usual order of steps in processing completed survey interviews or questionnaires?
- data entry 🡪 coding 🡪 editing 🡪 cleaning
- editing 🡪 coding 🡪 data entry 🡪 cleaning
- cleaning 🡪 coding 🡪 data entry 🡪 editing
- coding 🡪 data entry 🡪 cleaning 🡪 editing
- In terms of data processing, one advantage of computer-assisted interviewing over paper-and-pencil questionnaire surveys is that
- it is easier to determine if interviewers are recording answers accurately and adequately.
- there is no need to code responses.
- open-ended questions can be coded more easily.
- data entry occurs directly when interviewers record respondents’ answers.
- In a data matrix, __________ are placed in rows and __________ are placed in columns.
- variables; missing data
- cases or observations; variables
- dependent variables; independent variables
- independent variables, dependent variables
- Codebooks may contain all but which one of the following?
- raw survey data
- numerical codes for each response
- question wording
- editing and coding rules
- interviewer directions
- Which type(s) of statistical analysis did Broh use to examine the relationship between playing interscholastic sports and academic achievement?
- descriptive statistics
- inferential statistics
- both descriptive and inferential statistics
- neither descriptive nor inferential statistics.
- A researcher measures job satisfaction among a random sample of employees at XYZ Company and finds that 65 percent are “very satisfied” with their jobs. To estimate job satisfaction among all XYZ employees from this information, the researcher should
- compare means.
- study more employees.
- use descriptive statistics.
- use inferential statistics.
- Percentage distributions
- may be applied only to interval-/ratio-scale variables.
- should include missing values in the computation of percentages.
- cannot be computed when there are missing data.
- provide an explicit comparative framework for interpreting distributions.
- Consider the following survey question: “How satisfied are you with the direction that the country is going at this time? Would you say you are very satisfied, somewhat satisfied, not very satisfied, or not at all satisfied?” To collapse the responses into two categories, you would be best advised to
- collapse and divide according to response similarity, such as “satisfied” versus “dissatisfied.”
- make the most frequently selected “polar” response (“very satisfied” or “not at all satisfied”) one category and combine the remaining three responses into a second category.
- combine adjacent responses to obtain an approximately equal proportion of cases in each category.
- place valid responses in one category and missing “values” in the other.
- Univariate distributions of interval-/ratio-scale variables include all but which one of the following properties?
- regression
- central tendency
- dispersion
- shape
- If the median in a distribution is 75, this means that
- 75 percent of the cases scored above the median.
- a score of 75 has the highest frequency.
- 75 is average score.
- a score of 75 divides the frequency distribution in half.
- What is the mode in the following set of data? 1, 2, 2, 3, 5, 6, 9
- 1
- 2
- 3
- 4
- 5
- What is the median in the following set of data? 1, 2, 2, 3, 5, 6, 9
- 1
- 2
- 3
- 4
- 5
- In a distribution that is positively skewed, the
- mean is greater than the median.
- standard deviation is greater than the range.
- median is greater than the mean.
- median and mean are equal.
- In the 2014 GSS, the distribution of reported number of hours of television watched per day was
- skewed.
- normal.
- abnormal.
- bell-shaped.
- Which of the following methods is not an option for handling missing data?
- index construction
- listwise deletion
- recoding
- imputation
- One method of reducing data complexity through data modification is
- imputation.
- listwise deletion.
- index or scale construction.
- the use of dummy variables.
- Contingency tables
- are temporary tables produced to make collapsing decisions.
- are designed to analyze responses to contingency questions.
- contain data on two or more variables.
- work best is summarizing relationships between interval-/ratio-scale variables.
- What are the marginals in a cross-tabulation or contingency table?
- outliers
- standard deviates
- cell frequencies
- lowest and highest frequencies
- row and column totals
- To analyze the relationship in a contingency table, the rule for calculating percentages is to compute percentages based on the
- total number of cases in the table.
- number of cases in each category of the dependent variable.
- number of cases in each category of the independent variable.
- column variable, regardless of whether it is independent or dependent.
- Consider the following table from the 2016 GSS, which shows the relationship between race and whether someone favors or opposes “the death penalty for persons convicted of murder.”
Race
White | Black | Total | ||
Attitude Toward | Favor | 1,300 | 191 | 1,491 |
Capital Punishment | Oppose | 697 | 257 | 954 |
Total | 1,997 | 448 | 2,445 |
The data in this table suggest that (the answer may require some calculation)
- there is a near-zero association between race and support for the death penalty.
- whites are more likely to favor the death penalty than blacks.
- blacks are more likely to favor the death penalty than whites.
- Consider the following table from the 2016 GSS, which shows the relationship between age and whether someone favors or opposes “the death penalty for persons convicted of murder.”
Age
18–50 | Over 50 | Total | ||
Attitude Toward | Favor | 833 | 792 | 1,625 |
Capital Punishment | Oppose | 559 | 502 | 1,061 |
Total | 1,392 | 1,294 | 2,686 |
The data in this table suggest that (the answer may require some calculation)
- there is a near-zero association between age and support for the death penalty.
- older people are more likely to favor the death penalty than younger people.
- younger people are more likely to favor the death penalty than older people.
- Suppose a researcher finds a statistically significant relationship between salary and job satisfaction among a random sample of employees. From this information, he can conclude that
- there is likely to be a relationship between job satisfaction and salary.
- differences in salary cause differences in job satisfaction.
- salary is the most important factor in job satisfaction.
- the relationship between salary and job satisfaction probably occurred at random.
- there is no relationship between job satisfaction and salary.
- The chi-square test for independence indicates
- how two variables are related to one another.
- whether a relationship exists between variables.
- the strength of the relationship between variables.
- the direction of the relationship between variables.
- The chi-square test for independence in a contingency table addresses which of the following questions?
- How independent are the contingencies?
- What is the probability that these data came from a population in which the two variables are not related?
- In the given sample, what is the degree of association between the variables?
- Is the relationship positive or negative?
- In the general formula for a linear relationship, Y = a + bX, “a” is called the
- least squares point.
- Y-intercept.
- regression coefficient.
- slope.
- For the 2016 GSS, you regress number of hours of television watched on the average day (Y) on number of years of education completed (X) and obtain the following result:
Y = 5.37 – .18X. How much change in hours of television watched is associated with a change of one year in a respondent’s education?
- 1
- 5.37 – .18
- 5.37
- –.18
- For the 2016 GSS, you regress number of hours of television watched on the average day (Y) on number of years of education completed (X) and obtain the following result:
Y = 5.37 – .18X. What would be the predicted number of hours of television watched per day if the respondent has completed 12 years of schooling?
- 2.16
- 3.21
- 5.19
- 6.57
- For the 2016 GSS, you regress respondent’s years of education completed (Y) on father’s years of education completed (X) and obtain the following result: Y = 10.24 + .32X. What would be the predicted years of a respondent’s education if his or her father had completed 12 years of schooling?
- 10.24
- 10.56
- 12.00
- 14.08
- Consider the regression equation Y = −5.43 + 4.16X. This equation tells us that
- one unit increase in X is associated with a 5.43 unit decrease in Y.
- one unit increase in X is associated with a 4.16 unit increase in Y.
- one unit increase in Y is associated with a 5.43 decrease in X.
- one unit increase in Y is associated with a 4.16 increase in X
- The difference between an actual score and the score predicted by the regression equation is called
- a slope.
- the explained variation.
- a residual.
- a regression coefficient.
- Suppose two variables are negatively related. Which of the following regression equations might describe this relationship?
- Y = 3.21 + 2.41X
- Y = −.45 + 4.12X
- Y = 18.62 – 1.21X
- A correlation of −.85 indicates a __________ relationship, and a correlation of +.10 indicates a __________ relationship.
- strong; weak
- weak; strong
- weak; moderate
- weak; weak
- Which of the following is an example of an inferential statistic?
- range
- mean
- correlation coefficient
- chi-square test for independence
- Suppose a small campus survey found that the correlation between alcohol consumption and GPA was −.20 with p < .18. This means that
- there is no relationship between drinking and grades at this college.
- there is a weak relationship between drinking and grades at this college.
- there is a strong relationship between drinking and grades at this college.
- the relationship between drinking and grades in not statistically significant.
- For the 2016 GSS, you code marital status as 0 = not married and 1 = married, then you regress the number of hours of television viewing per day (Y) on marital status (X) with the following result: Y = 3.1 − .32X. This result indicates that
- married people watch, on average, 3.1 more hours of television than unmarried people.
- married people watch, on average, 2.78 fewer hours of television than unmarried people.
- married people watch, on average, .32 fewer hours of television than unmarried people.
- there is no relationship between marital status and television viewing.
True and False
T F 1. Closed-ended questions usually are coded before data collection.
T F 2. Editing is carried out after all the data have been entered into a data file.
T F 3. The coding of open-ended questions is usually based on both theoretical and empirical considerations.
T F 4. Using computer-assisted interviewing eliminates the need for coding and data cleaning.
T F 5. Consistency checking is a cleaning process used to identify out-of-range codes.
T F 6. Verification may involve entering the data twice into separate files and then comparing the two files for noncomparable entries.
T F 7. Consistency checking compares entries in a data file with entries in the interview schedule or questionnaire.
T F 8. Descriptive and inferential statistics correspond to the scientific goals of description and explanation.
T F 9. Percentage distributions provide an explicit frame of reference for making comparisons among variable categories.
T F 10. In the absence of theoretical criteria, the best strategy for collapsing categories is to try to obtain an approximately equal proportion of cases in each category.
T F 11. Calculations in a percentage distribution usually are based on the total number of responses, including those coded “don’t know” and “not applicable.”
T F 12. The mean is a statistical property of the distribution of a nominal-scale variable.
T F 13. Outliers are unusual or suspicious values that are far removed from the preponderance of observations for a variable.
T F 14. Listwise deletion is the best method of handling missing values, regardless of the number of missing cases.
T F 15. Bivariate distributions may be constructed for variables with nominal and ordinal as well as interval and ratio measurement.
T F 16. In a cross-tabulation, the row totals and the column totals each describe univariate distributions.
T F 17. To interpret the relationship between variables in a contingency table, the rule is “percentage across, read across; percentage down, read down.”
T F 18. Tests of statistical significance may be applied only to interval- and ratio-scale variables.
T F 19. The chi-square test is a measure of degree of association.
T F 20. The chi-square statistic indicates whether a relationship between two variables is likely to exist.
T F 21. Direction is a statistical property that describes the relationship between variables with nominal measurement.
T F 22. Linear regression analysis should be used only if a straight line provides a reasonable fit to the data.
T F 23. Regression coefficients indicate, among other things, the direction of the relationship between two variables.
T F 24. The correlation coefficient measures the direction and strength of association between variables.
T F 25. A dummy variable has only two coding categories.
Essay
- The quality of data is affected at several stages of social research, including data processing. What techniques do survey researchers apply to avoid errors and enhance data quality during data processing? Are data processing errors unavoidable, like random sampling error? Explain.
- Describe the differences in the univariate analysis of nominal/ordinal variables and interval/ratio variables. What descriptive statistics are used to describe each type of variable?
- Describe the differences in the bivariate analysis of nominal/ordinal variables and interval/ratio variables. What descriptive and inferential statistics are used to describe each type of variable?
- The 1994 GSS asked the following question: Do you sometimes drink more than you think you should? The table below breaks down responses to this question by sex.
Sex
Male | Female | |||
Ever Drink | Yes | 47.3% | 29.0% | |
Too Much | No | 52.7 | 71.0 | |
Total | 100.0% | 100.0% |
(171) (170)
- What is the percentage difference for determining the association between these variables?
- Who is more likely to say that they sometimes drink more than they think they should?
- Chi-square for this table is 12.50, which is significant at p < .001. What does this indicate about the relationship between the variables?
- The value of phi for this table is .19. What does this statistic tell us about the relationship?
- For the 2012 GSS, the regression of respondent’s income in constant dollars (Y) on years of education (X) yields the following equation: Y = –45,204.31 + 5,293.56X.
- What is the value of the regression coefficient?
- How much does income (Y) increase for each increase of one year of education?
- What is the predicted income in 2012 for a person with a bachelor’s degree (16 years of education)?
- The correlation between income and years of education is .28. What does this tell you about this relationship?