Ch8 Predictive Analytics Helping To Make Sense Of Test Bank

Ch8 Predictive Analytics Helping To Make Sense Of Test Bank

$24.99

$24.99

Report Unauthorized Use

Document Information

Connected Book

Explore recommendations drawn directly from what you're reading

Quick Navigation

Benefits

Document Information

Connected Book

Explore recommendations drawn directly from what you're reading

Quick Navigation

Benefits

Added to Cart

Forecasting with Forecast X 7e Complete Test Bank

Forecasting with Forecast X 7e Complete Test Bank

Forecasting and Predictive Analytics with Forecast X, 7e (Keating)

Chapter 8 Predictive Analytics: Helping to Make Sense of Big Data

1) Decile-Wise

A data mining routine has been applied to a transaction data set and has classified 88 records as fraudulent (30 correctly so) and 952 as nonfraudulent (920 correctly so).

The decile-wise lift chart for a transaction data model:

Consider the decile-wise lift chart above. Interpret the meaning of the first and second bars from the left.

A) The first variable in the model is more predictive than the second variable.

B) These bars are never interpreted for the validation data set; they are only interpreted for the training dataset.

C) Since only two bars rise above unity little explanatory power is exhibited by the model.

D) The first two bars show that this model outperforms a naïve model.

2) Decile-Wise

A data mining routine has been applied to a transaction data set and has classified 88 records as fraudulent (30 correctly so) and 952 as nonfraudulent (920 correctly so).

The decile-wise lift chart for a transaction data model:

Consider the decile-wise lift chart above. An analyst comments that you could improve the accuracy of the model by classifying everything as nonfraudulent. What will the overall error rate (also called the misclassification rate) be if you follow her advice?

A) The overall error rate will increase.

B) The overall error rate will decrease.

C) The change in the overall error rate cannot be determined.

D) The overall error rate will arbitrarily change.

3) Decile-Wise

A data mining routine has been applied to a transaction data set and has classified 88 records as fraudulent (30 correctly so) and 952 as nonfraudulent (920 correctly so).

The decile-wise lift chart for a transaction data model:

Which of the following situations represents the confusion matrix for the transactions data mentioned above?

A) A

B) B

C) C

D) D

4) What is the misclassification error rate for the following XLMiner confusion matrix?

Confusion Matrix
Actual\Predicted	0		1
0		970		20
1		2		8

A) 2.2%

B) 0.82%

C) 10%

D) 0.21%

E) Impossible to determine from information given.

5) Consider the Toyota Corolla data below:

Id	Model	Price	Age_08_04	Mfg_Month	Mfg_Year
1	RA 2/3-Doors	13500	23	10	2002
2	RA 2/3-Doors	13750	23	10	2002
3	RA 2/3-Doors	13950	24	9	2002
4	RA 2/3-Doors	14950	26	7	2002
5	OL 2/3-Doors	13750	30	3	2002
6	OL 2/3-Doors	12950	32	1	2002
7	RA 2/3-Doors	16900	27	6	2002

Id	KM	Fuel_Type	HP	Met_Color	Color_Black
1	46986	Diesel	90	1	0
2	72937	Diesel	90	1	0
3	41711	Diesel	90	1	0
4	48000	Diesel	90	0	1
5	38500	Diesel	90	0	1
6	61000	Diesel	90	0	0
7	94612	Diesel	90	1	0

Which variable is a dummy variable?

A) Fuel_Type

B) Color_Black

C) KM

D) HP

E) Both Fuel_Type and Color_Black qualify as dummy variables.

6) Consider the Toyota Corolla data below:

Id	Model	Price	Age_08_04	Mfg_Month	Mfg_Year
1	RA 2/3-Doors	13500	23	10	2002
2	RA 2/3-Doors	13750	23	10	2002
3	RA 2/3-Doors	13950	24	9	2002
4	RA 2/3-Doors	14950	26	7	2002
5	OL 2/3-Doors	13750	30	3	2002
6	OL 2/3-Doors	12950	32	1	2002
7	RA 2/3-Doors	16900	27	6	2002

Id	KM	Fuel_Type	HP	Met_Color	Color_Black
1	46986	Diesel	90	1	0
2	72937	Diesel	90	1	0
3	41711	Diesel	90	1	0
4	48000	Diesel	90	0	1
5	38500	Diesel	90	0	1
6	61000	Diesel	90	0	0
7	94612	Diesel	90	1	0

Which of the variables below (from the Toyota Corolla dataset) is a categorical variable?

A) Fuel_Type

B) Color_Black

C) KM

D) HP

7) Consider the following confusion matrix.

	Predict class 1		Predict class 0
Actual 1		8		2
Actual 0		20		970

How much better did this data mining technique do as compared to a naïve model?

A) No better than a naïve model.

B) 1.2% better than a naïve model.

C) 5.6% better than a naïve model.

D) 7.8% better than a naïve model.

E) 10.1% better than a naïve model.

8) "Overfitting" refers to

A) estimating a model that explains the training set data points perfectly and leaves little error but that is unlikely to be accurate in prediction.

B) using too many attributes or classifiers in a model.

C) the process used to test data mining models for accuracy.

D) the process of estimating or scoring of new data.

9) A "training data set" is

A) used to compare models and pick the best one.

B) used to build various models of interest.

C) used to assess the performance of the normalization procedure.

D) None of the options are correct.

10) A "validation data set" is

A) used to compare models and pick the best one.

B) used to build various models of interest.

C) used to assess the performance of the normalization procedure.

D) None of the options are correct.

11)

The misclassification rate in the confusion matrix above is

A) 0 percent.

B) 10 percent.

C) 9 percent.

D) 19 percent.

E) None of the options are correct.

12)

Data
Data source	Data!$A$5:$N$2504
Selected variables	ID	Age	Experience	Income	Zip Code	Family	CCAvg	Education
Partitioning Method	Randomly chosen
Random Seed	12345
# training rows	1250
# Validation rows	750
# test rows	500

		Selected Variables
Row Id		ID		Age		Experience		Income		Zip Code		Family		CCAvg		Educa-tion
	1		1		25		1		49		91107		4		1.60		1
	4		4		35		9		100		94112		1		2.70		2
	5		5		35		8		45		91330		4		1.00		2
	6		6		37		13		29		92121		4		0.40		2
	9		9		35		10		81		90089		3		0.60		2
	10		10		34		9		180		93023		1		8.90		3
	12		12		29		5		45		90277		3		0.10		2
	17		17		38		14		130		95010		4		4.70		3
	18		18		42		18		81		94305		4		2.40		1
	21		21		56		31		25		94015		4		0.90		2
	26		26		43		19		29		94305		3		0.50		1

The Universal Bank data represented above has been partitioned with what percentages?

A) 50%, 30%, 20% in training, validation, and test sets

B) 60%, 40% in training and validation sets

C) 60%, 20%, 20% in training, validation, and test sets

D) 50%, 20%, 30% in training, validation, and test sets

E) None of the options are correct.

13) In data mining the model should be applied to a data set that was not used in the estimation process in order to find out the accuracy on unseen data; that "unseen" data set is called

A) the training data set.

B) the validation data set.

C) the test data set.

D) the holdout data set.

E) None of the options are correct.

14)

The lift chart above shows that the data mining classification model

A) is working well in classifying unseen data.

B) is working well in classifying training data.

C) is working quite poorly.

D) is doing no better at classifying than a naïve model.

15) With most data mining techniques we "partition" the data

A) into "success" and "failure" results in order to create a target that is a dummy variable.

B) only when we require a confusion matrix to be created.

C) after estimating the appropriate technique.

D) in order to judge how our model will do when we apply it to new data.

16) Consider the following Lift Chart. Cumulative percentage of hits is the Y-axis variable. Percent of the entire list is the X-axis variable.

What is the "Lift" at 5%?

A) Exactly 4

B) About 5

C) Exactly 20

D) About 25

E) Unable to determine from information given.

17) Consider the printout below:

Validation Data scoring - Summary Report (for k = 7)

Cut off Prob. Val. for Success (Updatable): 0.5

Classification Confusion Matrix
	Predicted Class
Actual Class	owner		non-owner
owner		4		0
non-owner		3		3

Error Report
Class	#Cases		#Errors		%Error
owner		4		0		0.00
non-owner		6		3		50.00
Overall		10		3		30.00

What is the "Misclassification Rate"?

A) 0

B) 3

C) 50

D) 30

E) It is not shown in this printout.

18)

The above table is a decile-wise lift chart. The first bar on the left indicates

A) that our attribute or attributes did little to explain predicted success in this model.

B) that the lift will not vary with the number of cases we consider.

C) that taking the 10% of the records that are ranked by the model as the most probable successes yields about as much as a naïve model.

D) that taking the 10% of the records that are ranked by the model as the most probable successes yields twice as many successes as would a random selection of 10% of the records.

19) Suppose that a data mining routine has an adjustable cutoff (threshold) mechanism by which you can alter the proportion of records classified as owner. Three cases are described below.

Describe how moving the cutoff up or down from a starting point of 0.5 affects the misclassification error rate.

Cut off Prob. Val. for Success (Updatable): 0.5

Classification Confusion Matrix
	Predicted Class
Actual Class	owner		non-owner
owner		11		1
non-owner		2		10

Cut off Prob. Val. for Success (Updatable): 0.25

Classification Confusion Matrix
	Predicted Class
Actual Class	owner		non-owner
owner		11		1
non-owner		4		8

Cut off Prob. Val. for Success (Updatable): 0.75

Classification Confusion Matrix
	Predicted Class
Actual Class	owner		non-owner
owner		7		5
non-owner		1		11

A) The misclassification error rate dropped as the threshold dropped.

B) The misclassification error rate dropped as the threshold increased.

C) The misclassification error rate remained unchanged as the threshold changed.

D) The misclassification error rate changed when the threshold either increased or decreased.

20) Data Mining traditionally uses

A) only variations of s-curve models.

B) only time series data sets.

C) only classical statistical techniques.

D) only very large data sets.

21) When using data mining techniques we usually

A) impose a model on the data.

B) choose among classical statistical models for the underlying explanation.

C) impose not a model, but a pattern, on the data.

D) don't know what pattern may fit a particular set of data.

22) One of the premises of data mining is

A) that we are able to deduce patterns with very small amounts of data.

B) that classical statistical techniques are inappropriate for most business forecasting.

C) that there is a great deal of information locked up in any database.

D) that very few techniques may be used appropriately to analyze most business data sets.

23) Data today is collected

A) when users click a button on the World Wide Web.

B) when a credit card is swiped.

C) when inventory moves through a warehouse.

D) when an individual checks out at a grocery store.

E) All of the options are correct.

24) Data mining

A) is also called Online Transaction Processing (OLTP).

B) is also called Online Analytical Processing (OLAP).

C) is also called Knowledge Discovery in Databases (KDD).

D) None of the options are correct.

25) The job of a data miner

A) is the extraction of explicit intelligence in a data set.

B) is to use limited data to extract meaningful patterns that might exist in large data sets.

C) is to make sense of the available mounds of data by examining the data for patterns.

D) None of the options are correct.

26) SQL or structured query language

A) is a common data mining technique.

B) could be used to "Find all the customers likely to miss a future payment."

C) allows well defined queries of existing databases.

D) could "Group all customers with similar buying habits."

27) The three standard categories of data mining tools are

A) regression, categorization, and association.

B) prediction, categorization, and association.

C) classification, clustering, and association.

D) association, clustering, and analytics.

28) A "target" in data mining

A) is most like a dependent variable in business forecasting.

B) is another name for an attribute.

C) is synonymous with a record.

D) None of the options are correct.

29) In data mining, to "score"

A) is to partition a data set.

B) is the same as creating a target variable in business forecasting.

C) is to predict.

D) None of the options are correct.

30) In standard "business forecasting" we have been seeking verification of previously held hypotheses. In data mining, on the other hand,

A) multiple hypotheses are examined and the optimal one is selected or suggested.

B) the rules are explicitly set by the forecaster.

C) we seek the discovery of new knowledge from the data.

D) None of the options are correct.

31) A "validation set lift chart"

A) is similar to an "out of sample" test statistic used in classical business forecasting.

B) may not be used to judge the predictive power of a k-Nearest-Neighbor model.

C) is rarely used to make an inference about the predictive capability of a k-Nearest-Neighbor model.

D) None of the options are correct.

32)

Consider the Lift Chart above.

The straight line emanating from the origin and labeled "Cumulative Personal Loan using average"

A) is a reference line indicating the use of a linear model.

B) is a reference line that shows an average misclassification rate for this particular data.

C) represents the expected number of correct classifications of any class we would predict if we used the naïve model.

D) represents the expected number of positives we would predict if we used the naïve model.

33) In the following confusion matrix, how many mistakes are made when participants are predicted to be female?

Actually

male

Actually

female

Predicted male

Predicted female

A) 24

B) 4

C) 6

D) 8

34) A confusion matrix:

A) helps the researcher classify a variable into its component categories.

B) indicates how well the attributes correlate with the target.

C) indicates how well a model has predicted group membership.

D) helps the researcher assess statistical significance.

35) A dummy variable:

A) combines several characteristics together to give a score.

B) is a variable known to have a zero correlation with the target.

C) merely indicates whether a case has a particular characteristic or not.

D) indicates the extent to which people differ on a particular characteristic.

36) Of all the data available today, it is estimated that about 90% of data is

A) unstructured.

B) structured.

C) numerical.

D) graphical.

37) In text mining, "knowledge discovery" refers to

A) extraction of codified features.

B) analysis of feature distribution.

C) counting the number of unknown terms used in a document.

D) the measurement of single-use terms present in the text.

38)

In the accompanying diagram of an insect we wish to classify, "Spiracle Diameter" is

A) a dependent variable.

B) a target.

C) a record.

D) an attribute.

39) In data mining terminology, "scoring" refers to

A) using the algorithm to predict.

B) the estimation of the parameters of the algorithm.

C) evaluating the appropriateness and accuracy of the algorithm.

D) an examination of the misclassification rate.

40) Which of the following statements is an example of one that a data scientist would seek, as opposed to one that a database manager would seek?

A) Find all customers that use Mastercard.

B) Find all customers that are likely to miss one payment.

C) Find all customers that live in South Bend.

D) Find all customers that missed one payment.

41) IBM/SPSS Modeler uses the "SEMMA" approach to the data mining process. The second "M" in SEMMA refers to

A) Model.

B) Mitigate.

C) Mix.

D) Manage.

42)

Doctors in ER	Actual Heart Attack		No Heart Attack		Tree Algorithm (Goldman)		Actual Heart Attack		No Heart Attack
Predict Heart Attack		0.89		0.75		Predict Heart Attack		0.96		0.08
Predict No Heart Attack		0.11		0.25		Predict No Heart Attack		0.04		0.92
"BEFORE"					"AFTER"

In the book Blink by Malcolm Gladwell, predictive analytics allowed emergency room physicians to be better able to help prospective heart attack victims in Cook County Hospital. Which statement best describes what happened after the predictive analytics?

A) While the study showed more accurate classification was possible, the required information needed took too much time to collect.

B) Twenty-six attributes were able to predict heart attacks (an non-heart attacks) almost perfectly.

C) The physicians began concentrating on only three "risk factors."

D) Examining only two attributes allowed about a 67% increase in accuracy of classification.

43)

"Lift" is represented in the diagram by the distance between

A) ab.

B) bc.

C) bd.

D) bf.

E) be.

44)

Classification Confusion Matrix
	Predicted Class
Actual Class	1		0
1		162		32
0		4		1802

Error Report
Class	#Cases		#Errors		% Error
1		194		32		16.49
0		1806		4		0.22
Overall		2000		36		1.80

Validation Confusion Matrix Universal Bank

Target = personal loan customer

Class 1 = personal loan customer

Class 0 = not a personal loan customer

What percentage of actual bank personal loan customers were misclassified as non-personal loan customers?

A) 16.5%

B) 0.22%

C) 83.5%

D) 99.78%

45)

Classification Confusion Matrix
	Predicted Class
Actual Class	1		0
1		162		32
0		4		1802

Error Report
Class	#Cases		#Errors		% Error
1		194		32		16.49
0		1806		4		0.22
Overall		2000		36		1.80

Validation Confusion Matrix Universal Bank

Target = personal loan customer

Class 1 = personal loan customer

Class 0 = not a personal loan customer

What percentage of actual bank personal loan customers were classified correctly as personal loan customers?

A) 16.5%

B) 0.22%

C) 83.5%

D) 99.78%

46)

Classification Confusion Matrix
	Predicted Class
Actual Class	1		0
1		162		32
0		4		1802

Error Report
Class	#Cases		#Errors		% Error
1		194		32		16.49
0		1806		4		0.22
Overall		2000		36		1.80

Validation Confusion Matrix Universal Bank

Target = personal loan customer

Class 1 = personal loan customer

Class 0 = not a personal loan customer

What percentage of actual bank non-personal loan customers were correctly classified?

A) 16.5%

B) 0.22%

C) 83.51%

D) 99.8%

47)

Classification Confusion Matrix
	Predicted Class
Actual Class	1		0
1		162		32
0		4		1802

Error Report
Class	#Cases		#Errors		% Error
1		194		32		16.49
0		1806		4		0.22
Overall		2000		36		1.80

Validation Confusion Matrix Universal Bank

Target = personal loan customer

Class 1 = personal loan customer

Class 0 = not a personal loan customer

What percentage of actual bank non-loan customers were misclassified?

A) 16.49%

B) 0.22%

C) 83.5%

D) 99.8%

48)

Error Report
Class	#Cases		#Errors		% Error
1		1380		302		21.88
0		1445		311		21.52
Overall		2825		613		21.70

Validation Confusion Matrix Telecommunications Churn

Target = Churn

Class 0 = customer did churn

Class 1 = customer did not churn

What percentage of customers who actually did churn were correctly classified?

A) 78.12%

B) 21.5%

C) 78.48%

D) 21.2%

49)

Error Report
Class	#Cases		#Errors		% Error
1		1380		302		21.88
0		1445		311		21.52
Overall		2825		613		21.70

Validation Confusion Matrix Telecommunications Churn

Target = Churn

Class 0 = customer did churn

Class 1 = customer did not churn

What percentage of customers who actually did not churn were correctly classified?

A) 78.12%

B) 21.5%

C) 78.48%

D) 21.2%

50)

Error Report
Class	#Cases		#Errors		% Error
1		1380		302		21.88
0		1445		311		21.52
Overall		2825		613		21.70

Validation Confusion Matrix Telecommunications Churn

Target = Churn

Class 0 = customer did churn

Class 1 = customer did not churn

What percentage of customers who actually did not churn were incorrectly classified?

A) 78.12%

B) 21.5%

C) 78.48%

D) 21.88%

51)

Error Report
Class	#Cases		#Errors		% Error
1		1380		302		21.88
0		1445		311		21.52
Overall		2825		613		21.70

Validation Confusion Matrix Telecommunications Churn

Target = Churn

Class 0 = customer did churn

Class 1 = customer did not churn

What percentage of customers who actually did churn were incorrectly classified?

A) 78.12%

B) 21.52%

C) 78.48%

D) 21.2%

52) Data mining algorithms require _______.

A) an efficient sampling method

B) storage of intermediate results

C) the capacity to handle large amounts of data

D) All of the options are correct.

53) The goal of data mining is to build _______ models.

A) retrospective

B) interrogative

C) predictive

D) imperative

54) Data mining is best described as the process of

A) identifying patterns in data.

B) deducing relationships in data.

C) representing data.

D) simulating trends in data.

55) Data used to build a data mining model is called _______.

A) validation data

B) training data

C) test data

D) unseen data

56) Assume you are asked to create a model that predicts the number of new babies born per period according to the size of the stork population. In this case, the number of babies is

A) a target.

B) a feature.

C) an outcome.

D) an observation.

57) Data mining is best described as the process of

A) identifying patterns in data.

B) deducing relationships in data.

C) representing data.

D) simulating trends in data.

58) Another name for an algorithm output?

A) Predictive variable

B) Independent variable

C) Estimated variable

D) Target variable

59) KDD describes the _______.

A) whole process of extraction of knowledge from data

B) extraction of data

C) extraction of information

D) extraction of rules

60) An algorithm that is controlled by a human during its execution is a(n) _______ algorithm.

A) unsupervised learning

B) supervised learning

C) batch learning

D) incremental

61) SQL stands for _______.

A) simple query language

B) structured query language

C) strong query language

D) simple language

62) _______ analysis divides data into groups that are meaningful, useful, or both.

A) A clustering

B) An association rule mining

C) A classification

D) A relational

Document Type:

DOCX

Chapter Number:

Created Date:

Aug 21, 2025

Chapter Name:

Chapter 8 Predictive Analytics Helping To Make Sense Of Big Data

Author:

Barry Keating

By Barry Keating

Test Bank General

View Product →

Chapter 6 Explanatory Models 2. Time-Series Decomposition

DOCX Ch. 6

Chapter 7 ARIMA Forecasting Models

DOCX Ch. 7

Chapter 8 Predictive Analytics Helping To Make Sense Of Big Data

DOCX Ch. 8 Current

Chapter 9 Classification Models The Most Used Models In Analytics

DOCX Ch. 9

Chapter 10 Ensemble Models And Clustering

DOCX Ch. 10

100% satisfaction guarantee

Buy Full Test Bank

Preview Document Info Connected Book Recommendations

Immediately available after payment

Answers are available after payment

ZIP file includes all related files

Files are in Word format (DOCX)

Check the description to see the contents of each ZIP file

We do not share your information with any third party

Classification Confusion Matrix

Predicted Class

Actual Class