Ch8 Predictive Analytics Helping To Make Sense Of Test Bank - Forecasting with Forecast X 7e Complete Test Bank by Barry Keating. DOCX document preview.

Ch8 Predictive Analytics Helping To Make Sense Of Test Bank

Forecasting and Predictive Analytics with Forecast X, 7e (Keating)

Chapter 8 Predictive Analytics: Helping to Make Sense of Big Data

1) Decile-Wise

A data mining routine has been applied to a transaction data set and has classified 88 records as fraudulent (30 correctly so) and 952 as nonfraudulent (920 correctly so).

The decile-wise lift chart for a transaction data model:

Consider the decile-wise lift chart above. Interpret the meaning of the first and second bars from the left.

A) The first variable in the model is more predictive than the second variable.

B) These bars are never interpreted for the validation data set; they are only interpreted for the training dataset.

C) Since only two bars rise above unity little explanatory power is exhibited by the model.

D) The first two bars show that this model outperforms a naïve model.

2) Decile-Wise

A data mining routine has been applied to a transaction data set and has classified 88 records as fraudulent (30 correctly so) and 952 as nonfraudulent (920 correctly so).

The decile-wise lift chart for a transaction data model:

Consider the decile-wise lift chart above. An analyst comments that you could improve the accuracy of the model by classifying everything as nonfraudulent. What will the overall error rate (also called the misclassification rate) be if you follow her advice?

A) The overall error rate will increase.

B) The overall error rate will decrease.

C) The change in the overall error rate cannot be determined.

D) The overall error rate will arbitrarily change.

3) Decile-Wise

A data mining routine has been applied to a transaction data set and has classified 88 records as fraudulent (30 correctly so) and 952 as nonfraudulent (920 correctly so).

The decile-wise lift chart for a transaction data model:

Which of the following situations represents the confusion matrix for the transactions data mentioned above?

A

Classification Confusion Matrix

 

Predicted Class

Actual Class

1

0

1

 

58

 

 

920

 

0

 

30

 

 

32

 

B

Classification Confusion Matrix

 

Predicted Class

Actual Class

1

0

1

 

32

 

 

30

 

0

 

58

 

 

920

 

C

Classification Confusion Matrix

 

Predicted Class

Actual Class

1

0

1

 

30

 

 

32

 

0

 

58

 

 

920

 

D

Classification Confusion Matrix

 

Predicted Class

Actual Class

1

0

1

 

920

 

 

58

 

0

 

30

 

 

32

 

A) A

B) B

C) C

D) D

4) What is the misclassification error rate for the following XLMiner confusion matrix?

Confusion Matrix

Actual\Predicted

 

0

1

0

 

 

970

 

 

20

 

1

 

 

2

 

 

8

 

A) 2.2%

B) 0.82%

C) 10%

D) 0.21%

E) Impossible to determine from information given.

5) Consider the Toyota Corolla data below:

Id

Model

Price

Age_08_04

Mfg_Month

Mfg_Year

1

RA 2/3-Doors

13500

23

10

2002

2

RA 2/3-Doors

13750

23

10

2002

3

RA 2/3-Doors

13950

24

9

2002

4

RA 2/3-Doors

14950

26

7

2002

5

OL 2/3-Doors

13750

30

3

2002

6

OL 2/3-Doors

12950

32

1

2002

7

RA 2/3-Doors

16900

27

6

2002

Id

KM

Fuel_Type

HP

Met_Color

Color_Black

1

46986

Diesel

90

1

0

2

72937

Diesel

90

1

0

3

41711

Diesel

90

1

0

4

48000

Diesel

90

0

1

5

38500

Diesel

90

0

1

6

61000

Diesel

90

0

0

7

94612

Diesel

90

1

0

Which variable is a dummy variable?

A) Fuel_Type

B) Color_Black

C) KM

D) HP

E) Both Fuel_Type and Color_Black qualify as dummy variables.

6) Consider the Toyota Corolla data below:

Id

Model

Price

Age_08_04

Mfg_Month

Mfg_Year

1

RA 2/3-Doors

13500

23

10

2002

2

RA 2/3-Doors

13750

23

10

2002

3

RA 2/3-Doors

13950

24

9

2002

4

RA 2/3-Doors

14950

26

7

2002

5

OL 2/3-Doors

13750

30

3

2002

6

OL 2/3-Doors

12950

32

1

2002

7

RA 2/3-Doors

16900

27

6

2002

Id

KM

Fuel_Type

HP

Met_Color

Color_Black

1

46986

Diesel

90

1

0

2

72937

Diesel

90

1

0

3

41711

Diesel

90

1

0

4

48000

Diesel

90

0

1

5

38500

Diesel

90

0

1

6

61000

Diesel

90

0

0

7

94612

Diesel

90

1

0

Which of the variables below (from the Toyota Corolla dataset) is a categorical variable?

A) Fuel_Type

B) Color_Black

C) KM

D) HP

7) Consider the following confusion matrix.

 

Predict class 1

Predict class 0

Actual 1

 

8

 

 

2

 

Actual 0

 

20

 

 

970

 

How much better did this data mining technique do as compared to a naïve model?

A) No better than a naïve model.

B) 1.2% better than a naïve model.

C) 5.6% better than a naïve model.

D) 7.8% better than a naïve model.

E) 10.1% better than a naïve model.

8) "Overfitting" refers to

A) estimating a model that explains the training set data points perfectly and leaves little error but that is unlikely to be accurate in prediction.

B) using too many attributes or classifiers in a model.

C) the process used to test data mining models for accuracy.

D) the process of estimating or scoring of new data.

9) A "training data set" is

A) used to compare models and pick the best one.

B) used to build various models of interest.

C) used to assess the performance of the normalization procedure.

D) None of the options are correct.

10) A "validation data set" is

A) used to compare models and pick the best one.

B) used to build various models of interest.

C) used to assess the performance of the normalization procedure.

D) None of the options are correct.

11)

Classification Confusion Matrix

 

Predicted Class

Actual Class

owner

non-owner

owner

 

10

 

 

0

 

non-owner

 

0

 

 

9

 

The misclassification rate in the confusion matrix above is

A) 0 percent.

B) 10 percent.

C) 9 percent.

D) 19 percent.

E) None of the options are correct.

12)

Data

 

 

Data source

Data!$A$5:$N$2504

 

 

Selected variables

ID

Age

Experience

Income

Zip Code

Family

CCAvg

Education

Partitioning Method

Randomly chosen

Random Seed

12345

# training rows

1250

# Validation rows

750

# test rows

500

 

Selected Variables

Row Id

ID

Age

Experience

Income

Zip Code

Family

CCAvg

Educa-tion

 

1

 

 

1

 

 

25

 

 

1

 

 

49

 

 

91107

 

 

4

 

 

1.60

 

 

1

 

 

4

 

 

4

 

 

35

 

 

9

 

 

100

 

 

94112

 

 

1

 

 

2.70

 

 

2

 

 

5

 

 

5

 

 

35

 

 

8

 

 

45

 

 

91330

 

 

4

 

 

1.00

 

 

2

 

 

6

 

 

6

 

 

37

 

 

13

 

 

29

 

 

92121

 

 

4

 

 

0.40

 

 

2

 

 

9

 

 

9

 

 

35

 

 

10

 

 

81

 

 

90089

 

 

3

 

 

0.60

 

 

2

 

 

10

 

 

10

 

 

34

 

 

9

 

 

180

 

 

93023

 

 

1

 

 

8.90

 

 

3

 

 

12

 

 

12

 

 

29

 

 

5

 

 

45

 

 

90277

 

 

3

 

 

0.10

 

 

2

 

 

17

 

 

17

 

 

38

 

 

14

 

 

130

 

 

95010

 

 

4

 

 

4.70

 

 

3

 

 

18

 

 

18

 

 

42

 

 

18

 

 

81

 

 

94305

 

 

4

 

 

2.40

 

 

1

 

 

21

 

 

21

 

 

56

 

 

31

 

 

25

 

 

94015

 

 

4

 

 

0.90

 

 

2

 

 

26

 

 

26

 

 

43

 

 

19

 

 

29

 

 

94305

 

 

3

 

 

0.50

 

 

1

 

The Universal Bank data represented above has been partitioned with what percentages?

A) 50%, 30%, 20% in training, validation, and test sets

B) 60%, 40% in training and validation sets

C) 60%, 20%, 20% in training, validation, and test sets

D) 50%, 20%, 30% in training, validation, and test sets

E) None of the options are correct.

13) In data mining the model should be applied to a data set that was not used in the estimation process in order to find out the accuracy on unseen data; that "unseen" data set is called

A) the training data set.

B) the validation data set.

C) the test data set.

D) the holdout data set.

E) None of the options are correct.

14)

The lift chart above shows that the data mining classification model

A) is working well in classifying unseen data.

B) is working well in classifying training data.

C) is working quite poorly.

D) is doing no better at classifying than a naïve model.

15) With most data mining techniques we "partition" the data

A) into "success" and "failure" results in order to create a target that is a dummy variable.

B) only when we require a confusion matrix to be created.

C) after estimating the appropriate technique.

D) in order to judge how our model will do when we apply it to new data.

16) Consider the following Lift Chart. Cumulative percentage of hits is the Y-axis variable. Percent of the entire list is the X-axis variable.

What is the "Lift" at 5%?

A) Exactly 4

B) About 5

C) Exactly 20

D) About 25

E) Unable to determine from information given.

17) Consider the printout below:

Validation Data scoring - Summary Report (for k = 7)

Cut off Prob. Val. for Success (Updatable): 0.5

Classification Confusion Matrix

 

Predicted Class

Actual Class

owner

non-owner

owner

 

4

 

 

0

 

non-owner

 

3

 

 

3

 

Error Report

Class

#Cases

#Errors

%Error

owner

 

4

 

 

0

 

 

0.00

 

non-owner

 

6

 

 

3

 

 

50.00

 

Overall

 

10

 

 

3

 

 

30.00

 

What is the "Misclassification Rate"?

A) 0

B) 3

C) 50

D) 30

E) It is not shown in this printout.

18)

The above table is a decile-wise lift chart. The first bar on the left indicates

A) that our attribute or attributes did little to explain predicted success in this model.

B) that the lift will not vary with the number of cases we consider.

C) that taking the 10% of the records that are ranked by the model as the most probable successes yields about as much as a naïve model.

D) that taking the 10% of the records that are ranked by the model as the most probable successes yields twice as many successes as would a random selection of 10% of the records.

19) Suppose that a data mining routine has an adjustable cutoff (threshold) mechanism by which you can alter the proportion of records classified as owner. Three cases are described below.

Describe how moving the cutoff up or down from a starting point of 0.5 affects the misclassification error rate.

Cut off Prob. Val. for Success (Updatable): 0.5

Classification Confusion Matrix

 

Predicted Class

Actual Class

owner

non-owner

owner

 

11

 

 

1

 

non-owner

 

2

 

 

10

 

Cut off Prob. Val. for Success (Updatable): 0.25

Classification Confusion Matrix

 

Predicted Class

Actual Class

owner

non-owner

owner

 

11

 

 

1

 

non-owner

 

4

 

 

8

 

Cut off Prob. Val. for Success (Updatable): 0.75

Classification Confusion Matrix

 

Predicted Class

Actual Class

owner

non-owner

owner

 

7

 

 

5

 

non-owner

 

1

 

 

11

 

A) The misclassification error rate dropped as the threshold dropped.

B) The misclassification error rate dropped as the threshold increased.

C) The misclassification error rate remained unchanged as the threshold changed.

D) The misclassification error rate changed when the threshold either increased or decreased.

20) Data Mining traditionally uses

A) only variations of s-curve models.

B) only time series data sets.

C) only classical statistical techniques.

D) only very large data sets.

21) When using data mining techniques we usually

A) impose a model on the data.

B) choose among classical statistical models for the underlying explanation.

C) impose not a model, but a pattern, on the data.

D) don't know what pattern may fit a particular set of data.

22) One of the premises of data mining is

A) that we are able to deduce patterns with very small amounts of data.

B) that classical statistical techniques are inappropriate for most business forecasting.

C) that there is a great deal of information locked up in any database.

D) that very few techniques may be used appropriately to analyze most business data sets.

23) Data today is collected

A) when users click a button on the World Wide Web.

B) when a credit card is swiped.

C) when inventory moves through a warehouse.

D) when an individual checks out at a grocery store.

E) All of the options are correct.

24) Data mining

A) is also called Online Transaction Processing (OLTP).

B) is also called Online Analytical Processing (OLAP).

C) is also called Knowledge Discovery in Databases (KDD).

D) None of the options are correct.

25) The job of a data miner

A) is the extraction of explicit intelligence in a data set.

B) is to use limited data to extract meaningful patterns that might exist in large data sets.

C) is to make sense of the available mounds of data by examining the data for patterns.

D) None of the options are correct.

26) SQL or structured query language

A) is a common data mining technique.

B) could be used to "Find all the customers likely to miss a future payment."

C) allows well defined queries of existing databases.

D) could "Group all customers with similar buying habits."

27) The three standard categories of data mining tools are

A) regression, categorization, and association.

B) prediction, categorization, and association.

C) classification, clustering, and association.

D) association, clustering, and analytics.

28) A "target" in data mining

A) is most like a dependent variable in business forecasting.

B) is another name for an attribute.

C) is synonymous with a record.

D) None of the options are correct.

29) In data mining, to "score"

A) is to partition a data set.

B) is the same as creating a target variable in business forecasting.

C) is to predict.

D) None of the options are correct.

30) In standard "business forecasting" we have been seeking verification of previously held hypotheses. In data mining, on the other hand,

A) multiple hypotheses are examined and the optimal one is selected or suggested.

B) the rules are explicitly set by the forecaster.

C) we seek the discovery of new knowledge from the data.

D) None of the options are correct.

31) A "validation set lift chart"

A) is similar to an "out of sample" test statistic used in classical business forecasting.

B) may not be used to judge the predictive power of a k-Nearest-Neighbor model.

C) is rarely used to make an inference about the predictive capability of a k-Nearest-Neighbor model.

D) None of the options are correct.

32)

Consider the Lift Chart above.

The straight line emanating from the origin and labeled "Cumulative Personal Loan using average"

A) is a reference line indicating the use of a linear model.

B) is a reference line that shows an average misclassification rate for this particular data.

C) represents the expected number of correct classifications of any class we would predict if we used the naïve model.

D) represents the expected number of positives we would predict if we used the naïve model.

33) In the following confusion matrix, how many mistakes are made when participants are predicted to be female?

 

Actually

male

Actually

female

Predicted male

 

57

 

 

4

 

Predicted female

 

6

 

 

32

 

A) 24

B) 4

C) 6

D) 8

34) A confusion matrix:

A) helps the researcher classify a variable into its component categories.

B) indicates how well the attributes correlate with the target.

C) indicates how well a model has predicted group membership.

D) helps the researcher assess statistical significance.

35) A dummy variable:

A) combines several characteristics together to give a score.

B) is a variable known to have a zero correlation with the target.

C) merely indicates whether a case has a particular characteristic or not.

D) indicates the extent to which people differ on a particular characteristic.

36) Of all the data available today, it is estimated that about 90% of data is

A) unstructured.

B) structured.

C) numerical.

D) graphical.

37) In text mining, "knowledge discovery" refers to

A) extraction of codified features.

B) analysis of feature distribution.

C) counting the number of unknown terms used in a document.

D) the measurement of single-use terms present in the text.

38)

In the accompanying diagram of an insect we wish to classify, "Spiracle Diameter" is

A) a dependent variable.

B) a target.

C) a record.

D) an attribute.

39) In data mining terminology, "scoring" refers to

A) using the algorithm to predict.

B) the estimation of the parameters of the algorithm.

C) evaluating the appropriateness and accuracy of the algorithm.

D) an examination of the misclassification rate.

40) Which of the following statements is an example of one that a data scientist would seek, as opposed to one that a database manager would seek?

A) Find all customers that use Mastercard.

B) Find all customers that are likely to miss one payment.

C) Find all customers that live in South Bend.

D) Find all customers that missed one payment.

41) IBM/SPSS Modeler uses the "SEMMA" approach to the data mining process. The second "M" in SEMMA refers to

A) Model.

B) Mitigate.

C) Mix.

D) Manage.

42)

Doctors in ER

Actual

Heart Attack

No Heart

Attack

 

Tree Algorithm (Goldman)

Actual

Heart Attack

No Heart

Attack

Predict Heart Attack

 

0.89

 

 

0.75

 

 

Predict Heart Attack

 

0.96

 

 

0.08

 

Predict No Heart Attack

 

0.11

 

 

0.25

 

 

Predict No Heart Attack

 

0.04

 

 

0.92

 

"BEFORE"

 

"AFTER"

In the book Blink by Malcolm Gladwell, predictive analytics allowed emergency room physicians to be better able to help prospective heart attack victims in Cook County Hospital. Which statement best describes what happened after the predictive analytics?

A) While the study showed more accurate classification was possible, the required information needed took too much time to collect.

B) Twenty-six attributes were able to predict heart attacks (an non-heart attacks) almost perfectly.

C) The physicians began concentrating on only three "risk factors."

D) Examining only two attributes allowed about a 67% increase in accuracy of classification.

43)

"Lift" is represented in the diagram by the distance between

A) ab.

B) bc.

C) bd.

D) bf.

E) be.

44)

Classification Confusion Matrix

 

Predicted Class

Actual Class

1

0

1

 

162

 

 

32

 

0

 

4

 

 

1802

 

Error Report

Class

#Cases

#Errors

% Error

1

 

194

 

 

32

 

 

16.49

 

0

 

1806

 

 

4

 

 

0.22

 

Overall

 

2000

 

 

36

 

 

1.80

 

Validation Confusion Matrix Universal Bank

Target = personal loan customer

Class 1 = personal loan customer

Class 0 = not a personal loan customer

What percentage of actual bank personal loan customers were misclassified as non-personal loan customers?

A) 16.5%

B) 0.22%

C) 83.5%

D) 99.78%

45)

Classification Confusion Matrix

 

Predicted Class

Actual Class

1

0

1

 

162

 

 

32

 

0

 

4

 

 

1802

 

Error Report

Class

#Cases

#Errors

% Error

1

 

194

 

 

32

 

 

16.49

 

0

 

1806

 

 

4

 

 

0.22

 

Overall

 

2000

 

 

36

 

 

1.80

 

Validation Confusion Matrix Universal Bank

Target = personal loan customer

Class 1 = personal loan customer

Class 0 = not a personal loan customer

What percentage of actual bank personal loan customers were classified correctly as personal loan customers?

A) 16.5%

B) 0.22%

C) 83.5%

D) 99.78%

46)

Classification Confusion Matrix

 

Predicted Class

Actual Class

1

0

1

 

162

 

 

32

 

0

 

4

 

 

1802

 

Error Report

Class

#Cases

#Errors

% Error

1

 

194

 

 

32

 

 

16.49

 

0

 

1806

 

 

4

 

 

0.22

 

Overall

 

2000

 

 

36

 

 

1.80

 

Validation Confusion Matrix Universal Bank

Target = personal loan customer

Class 1 = personal loan customer

Class 0 = not a personal loan customer

What percentage of actual bank non-personal loan customers were correctly classified?

A) 16.5%

B) 0.22%

C) 83.51%

D) 99.8%

47)

Classification Confusion Matrix

 

Predicted Class

Actual Class

1

0

1

 

162

 

 

32

 

0

 

4

 

 

1802

 

Error Report

Class

#Cases

#Errors

% Error

1

 

194

 

 

32

 

 

16.49

 

0

 

1806

 

 

4

 

 

0.22

 

Overall

 

2000

 

 

36

 

 

1.80

 

Validation Confusion Matrix Universal Bank

Target = personal loan customer

Class 1 = personal loan customer

Class 0 = not a personal loan customer

What percentage of actual bank non-loan customers were misclassified?

A) 16.49%

B) 0.22%

C) 83.5%

D) 99.8%

48)

Classification Confusion Matrix

 

Predicted Class

Actual Class

1

0

1

 

1078

 

 

302

 

0

 

311

 

 

1134

 

Error Report

Class

#Cases

#Errors

% Error

1

 

1380

 

 

302

 

 

21.88

 

0

 

1445

 

 

311

 

 

21.52

 

Overall

 

2825

 

 

613

 

 

21.70

 

Validation Confusion Matrix Telecommunications Churn

Target = Churn

Class 0 = customer did churn

Class 1 = customer did not churn

What percentage of customers who actually did churn were correctly classified?

A) 78.12%

B) 21.5%

C) 78.48%

D) 21.2%

49)

Classification Confusion Matrix

 

Predicted Class

Actual Class

1

0

1

 

1078

 

 

302

 

0

 

311

 

 

1134

 

Error Report

Class

#Cases

#Errors

% Error

1

 

1380

 

 

302

 

 

21.88

 

0

 

1445

 

 

311

 

 

21.52

 

Overall

 

2825

 

 

613

 

 

21.70

 

Validation Confusion Matrix Telecommunications Churn

Target = Churn

Class 0 = customer did churn

Class 1 = customer did not churn

What percentage of customers who actually did not churn were correctly classified?

A) 78.12%

B) 21.5%

C) 78.48%

D) 21.2%

50)

Classification Confusion Matrix

 

Predicted Class

Actual Class

1

0

1

 

1078

 

 

302

 

0

 

311

 

 

1134

 

Error Report

Class

#Cases

#Errors

% Error

1

 

1380

 

 

302

 

 

21.88

 

0

 

1445

 

 

311

 

 

21.52

 

Overall

 

2825

 

 

613

 

 

21.70

 

Validation Confusion Matrix Telecommunications Churn

Target = Churn

Class 0 = customer did churn

Class 1 = customer did not churn

What percentage of customers who actually did not churn were incorrectly classified?

A) 78.12%

B) 21.5%

C) 78.48%

D) 21.88%

51)

Classification Confusion Matrix

 

Predicted Class

Actual Class

1

0

1

 

1078

 

 

302

 

0

 

311

 

 

1134

 

Error Report

Class

#Cases

#Errors

% Error

1

 

1380

 

 

302

 

 

21.88

 

0

 

1445

 

 

311

 

 

21.52

 

Overall

 

2825

 

 

613

 

 

21.70

 

Validation Confusion Matrix Telecommunications Churn

Target = Churn

Class 0 = customer did churn

Class 1 = customer did not churn

What percentage of customers who actually did churn were incorrectly classified?

A) 78.12%

B) 21.52%

C) 78.48%

D) 21.2%

52) Data mining algorithms require _______.

A) an efficient sampling method

B) storage of intermediate results

C) the capacity to handle large amounts of data

D) All of the options are correct.

53) The goal of data mining is to build _______ models.

A) retrospective

B) interrogative

C) predictive

D) imperative

54) Data mining is best described as the process of

A) identifying patterns in data.

B) deducing relationships in data.

C) representing data.

D) simulating trends in data.

55) Data used to build a data mining model is called _______.

A) validation data

B) training data

C) test data

D) unseen data

56) Assume you are asked to create a model that predicts the number of new babies born per period according to the size of the stork population. In this case, the number of babies is

A) a target.

B) a feature.

C) an outcome.

D) an observation.

57) Data mining is best described as the process of

A) identifying patterns in data.

B) deducing relationships in data.

C) representing data.

D) simulating trends in data.

58) Another name for an algorithm output?

A) Predictive variable

B) Independent variable

C) Estimated variable

D) Target variable

59) KDD describes the _______.

A) whole process of extraction of knowledge from data

B) extraction of data

C) extraction of information

D) extraction of rules

60) An algorithm that is controlled by a human during its execution is a(n) _______ algorithm.

A) unsupervised learning

B) supervised learning

C) batch learning

D) incremental

61) SQL stands for _______.

A) simple query language

B) structured query language

C) strong query language

D) simple language

62) _______ analysis divides data into groups that are meaningful, useful, or both.

A) A clustering

B) An association rule mining

C) A classification

D) A relational

Document Information

Document Type:
DOCX
Chapter Number:
8
Created Date:
Aug 21, 2025
Chapter Name:
Chapter 8 Predictive Analytics Helping To Make Sense Of Big Data
Author:
Barry Keating

Connected Book

Forecasting with Forecast X 7e Complete Test Bank

By Barry Keating

Test Bank General
View Product →

$24.99

100% satisfaction guarantee

Buy Full Test Bank

Benefits

Immediately available after payment
Answers are available after payment
ZIP file includes all related files
Files are in Word format (DOCX)
Check the description to see the contents of each ZIP file
We do not share your information with any third party