Back to Product

Ch9 – Full Test Bank | Classification Models The Most Used

Forecasting and Predictive Analytics with Forecast X, 7e (Keating)

Chapter 9 Classification Models: The Most Used Models in Analytics

1) Flight Delays Data (Naїve Bayes Model)

N.B.

Success = 1 = Delayed

Failure = 0 = Ontime

Prior class probabilities

According to relative occurrences in training data

Class	Prob
1	0.193792581	<—Success Class
0	0.806207419

Conditional probabilities

	Classes-->
Input Variables	1				0
	Value	Prob			Value	Prob
CARRIER	CO		0.06640625	CO			0.038497653
	DH		0.33984375	DH			0.243192488
	DL		0.109375	DL			0.2
	MQ		0.1796875	MQ			0.112676056
	OH		0.01171875	OH			0.017840376
	RU		0.21484375	RU			0.170892019
	UA		0.0078125	UA			0.016901408
	US		0.0703125	US			0.2
DAY_OF_WEEK	1		0.203125	1			0.128638498
	2		0.16015625	2			0.139906103
	3		0.12890625	3			0.152112676
	4		0.12890625	4			0.159624413
	5		0.1640625	5			0.181220557
	6		0.0703125	6			0.131455399
	7		0.14453125	7			0.107042254
DEP_TIME_BLK	0600-0659		0.03515625	0600-0659			0.061971831
	0700-0759		0.05078125	0700-0759			0.060093897
	0800-0859		0.0546875	0800-0859			0.071361502
	0900-0959		0.0234375	0900-0959			0.053521127
	1000-1059		0.01953125	1000-1059			0.057276995
	1100-1159		0.01953125	1100-1159			0.038497653
	1200-1259		0.0546875	1200-1259			0.062910798
	1300-1359		0.05078125	1300-1359			0.068544601
	1400-1459		0.15234375	1400-1459			0.110798122
	1500-1559		0.08203125	1500-1559			0.064788732
	1600-1659		0.07421875	1600-1659			0.078873239
	1700-1759		0.15625	1700-1759			0.094835681
	1800-1859		0.03125	1800-1859			0.043192488
	1900-1959		0.08984375	1900-1959			0.040375587
	2000-2059		0.01953125	2000-2059			0.030985915
	2100-2159		0.0859375	2100-2159			0.051971831
DEST	EWR		0.38671875	EWR			0.283568075
	JFK		0.1875	JFK			0.176525822
	LGA		0.42578125	LGA			0.539906103
ORIGIN	BWI		0.09375	BWI			0.068544601
	DCA		0.484375	DCA			0.635680751
	IAD		0.421875	IAD			0.295774648
Weather	0		0.92578125	0			1
	1		0.077421875	1			0

Using the Flight Delays data above that was computed using a Naïve Bayes Model, calculate the ontime probability for the following flight:

Carrier = DL

Day of Week = 7

Departure Time = 1000 - 1059

Destination = LGA

Origin = DCA

Weather = 0

A) 87%

B) 92%

C) 95%

D) 97%

E) 99%

2) Flight Delays Data (Naїve Bayes Model)

N.B.

Success = 1 = Delayed

Failure = 0 = Ontime

Prior class probabilities

According to relative occurrences in training data

Class	Prob
1	0.193792581	<--Success Class
0	0.806207419

Conditional probabilities

	Classes-->
Input Variables	1				0
	Value	Prob			Value	Prob
CARRIER	CO		0.06640625	CO			0.038497653
	DH		0.33984375	DH			0.243192488
	DL		0.109375	DL			0.2
	MQ		0.1796875	MQ			0.112676056
	OH		0.01171875	OH			0.017840376
	RU		0.21484375	RU			0.170892019
	UA		0.0078125	UA			0.016901408
	US		0.0703125	US			0.2
DAY_OF_WEEK	1		0.203125	1			0.128638498
	2		0.16015625	2			0.139906103
	3		0.12890625	3			0.152112676
	4		0.12890625	4			0.159624413
	5		0.1640625	5			0.181220557
	6		0.0703125	6			0.131455399
	7		0.14453125	7			0.107042254
DEP_TIME_BLK	0600-0659		0.03515625	0600-0659			0.061971831
	0700-0759		0.05078125	0700-0759			0.060093897
	0800-0859		0.0546875	0800-0859			0.071361502
	0900-0959		0.0234375	0900-0959			0.053521127
	1000-1059		0.01953125	1000-1059			0.057276995
	1100-1159		0.01953125	1100-1159			0.038497653
	1200-1259		0.0546875	1200-1259			0.062910798
	1300-1359		0.05078125	1300-1359			0.068544601
	1400-1459		0.15234375	1400-1459			0.110798122
	1500-1559		0.08203125	1500-1559			0.064788732
	1600-1659		0.07421875	1600-1659			0.078873239
	1700-1759		0.15625	1700-1759			0.094835681
	1800-1859		0.03125	1800-1859			0.043192488
	1900-1959		0.08984375	1900-1959			0.040375587
	2000-2059		0.01953125	2000-2059			0.030985915
	2100-2159		0.0859375	2100-2159			0.051971831
DEST	EWR		0.38671875	EWR			0.283568075
	JFK		0.1875	JFK			0.176525822
	LGA		0.42578125	LGA			0.539906103
ORIGIN	BWI		0.09375	BWI			0.068544601
	DCA		0.484375	DCA			0.635680751
	IAD		0.421875	IAD			0.295774648
Weather	0		0.92578125	0			1
	1		0.077421875	1			0

"Bayesian Probability" as used in the Naїve Bayes Model

A) uses naїve probabilities to estimate class probabilities.

B) uses only a single classifying variable to estimate the class probabilities.

C) uses simple probabilities instead of conditional probabilities.

D) uses derived probabilities to obtain class probabilities.

3) How does a "k-nearest neighbor" model work?

A) It uses conditional probabilities to estimate the prior probability of interest.

B) It uses geometric distances from observations in the data to select a class for an unknown.

C) It uses a continuous target estimated with any type of attribute.

D) It is based upon the concept of algorithmic minimization.

4) Logistic Regression

The following diagram is a Logistics Regression coefficient table for the Universal Bank data. The "Y" variable is the dichotomous variable Loan Offer (success =1). The multiple R2 for this Logistics Regression is reported as 0.6544.

The Regression Model

Input variables	Coefficient		Std. Error		p-value		Odds
Constant term		−13.20165825		2.46772742		0.00000009		*
Age		−0.04453737		0.09096102		0.62439483		0.95643985
Experience		0.05657264		0.09005365		0.5298661		1.05820346
Income		0.0657607		0.00422134		0		1.06797111
Family		0.57155931		0.10119002		0.00000002		1.77102649
CCAvg		0.18724874		0.06153848		0.00234395		1.20592725
Mortgage		0.00175308		0.00080375		0.02917421		1.00175464
Securities Account		−0.85484785		0.41863668		0.04115349		0.42534789
CD Account		3.46900773		0.44893095		0		32.10486984
Online		−0.84355801		0.22832377		0.00022026		0.43017724
CreditCard		−0.96406376		0.28254223		0.00064463		0.38134006
EducGrad		4.58909273		0.38708162		0		98.40509796
EducProf		4.52272701		0.38425466		0		92.08635712

For the Logistics Regression Model, the positive coefficients for dummy variables CD Account, EducGrad, and EducProf

A) are associated with higher probabilities of accepting the loan offer.

B) are insignificant because of their p-values and therefore irrelevant.

C) have Odds that are too high to be considered relevant.

D) are proved to be causally related to the loan offer variable.

5) Logistic Regression

The following diagram is a Logistics Regression coefficient table for the Universal Bank data. The "Y" variable is the dichotomous variable Loan Offer (success = 1). The multiple R2 for this Logistics Regression is reported as 0.6544.

The Regression Model

Input variables	Coefficient		Std. Error		p-value		Odds
Constant term		−13.20165825		2.46772742		0.00000009		*
Age		−0.04453737		0.09096102		0.62439483		0.95643985
Experience		0.05657264		0.09005365		0.5298661		1.05820346
Income		0.0657607		0.00422134		0		1.06797111
Family		0.57155931		0.10119002		0.00000002		1.77102649
CCAvg		0.18724874		0.06153848		0.00234395		1.20592725
Mortgage		0.00175308		0.00080375		0.02917421		1.00175464
Securities Account		−0.85484785		0.41863668		0.04115349		0.42534789
CD Account		3.46900773		0.44893095		0		32.10486984
Online		−0.84355801		0.22832377		0.00022026		0.43017724
CreditCard		−0.96406376		0.28254223		0.00064463		0.38134006
EducGrad		4.58909273		0.38708162		0		98.40509796
EducProof		4.52272701		0.38425466		0		92.08635712

Consider the Logistic Regression Mode for the Universal Bank data. The coefficient on the continuous variable Income means that

A) Income is inversely related to the loan offer variable.

B) Income is irrelevant because of its p-value.

C) higher values of Income are associated with greater probability of accepting the loan offer.

D) Income is likely not associated with the loan offer variable.

6) For the Logistic Regression using the Universal Bank data, the Pseudo-R2 reported by XLMiner™ was 0.6544. The lift chart was given as:

A) Neither the lift chart nor the Pseudo-R2 indicate a high degree of confidence in the model.

B) Both the lift chart and the Pseudo-R2 indicate a high degree of confidence in the model.

C) The lift chart indicates high confidence in the model but the Pseudo-R2 is at odds with this conclusion.

D) Because only a single bar of the decile-wise lift chart is above 1, there is little confidence in the model.

7) Consider the Logistics Regression Model for the Universal Bank data.

Which variable or variables appear to be insignificant?

A) Only Age

B) Age and Experience

C) Income, CD Account, EducGrad, and EducProf

D) All variables with "odds" less than one

8) Consider the Logistic Regression Model for the Universal Bank data.

A) Strong collinearity can lead to problems with the model.

B) Strong correlation among the attributes is not a difficulty when using Logit.

C) The Logit Model automatically adjusts for collinearity.

D) None of the options are correct.

9) Riding Lawnmower Problem

Household number		Income ($ 000's)		Lot Size (000's Ft2)			Ownership of, riding mower
	1		60		18.4	Owner
	2		85.5		16.8	Owner
	3		64.8		21.6	Owner
	4		61.5		20.8	Owner
	5		87		23.6	Owner
	6		110.1		19.2	Owner
	7		108		17.6	Owner
	8		82.8		22.4	Owner
	9		69		20	Owner
	10		93		20.8	Owner
	11		51		22	Owner
	12		81		20	Owner
	13		75		19.6	Non-Owner
	14		52.8		20.8	Non-Owner
	15		64.8		17.2	Non-Owner
	16		43.2		20.4	Non-Owner
	17		84		17.6	Non-Owner
	18		49.2		17.6	Non-Owner
	19		59.4		16	Non-Owner
	20		66		18.4	Non-Owner
	21		47.4		16.4	Non-Owner
	22		33		18.8	Non-Owner
	23		51		14	Non-Owner
	24		63		14.8	Non-Owner

Validation error log for different k

Value of k		% Error Training		% Error Validation
	1		0.00		33.33
	2		16.67		33.33
	3		11.11		33.33
	4		22.22		33.33
	5		11.11		33.33
	6		27.78		33.33
	7		22.22		33.33
	8		22.22		16.67	<—Best k
	9		22.22		16.67
	10		22.22		16.67
	11		16.67		33.33
	12		16.67		16.67
	13		11.11		33.33
	14		11.11		16.67
	15		5.56		33.33
	16		16.67		33.33
	17		11.11		33.33
	18		50.00		50.00

Consider the Riding Lawnmower data and the K-Nearest Neighbor Model results shown.

A) The best value of k was 8 because there was an almost even split between owners and non-owners.

B) The best value of k should always be less than the number of attributes.

C) The best value of k is the number of "neighbors" the model has chosen to poll when selecting a category choice.

D) The best value of k is irrelevant since we most often let k = 1.

10) Riding Lawnmower Problem

Household number		Income ($ 000's)		Lot Size (000's Ft2)			Ownership of, riding mower
	1		60		18.4	Owner
	2		85.5		16.8	Owner
	3		64.8		21.6	Owner
	4		61.5		20.8	Owner
	5		87		23.6	Owner
	6		110.1		19.2	Owner
	7		108		17.6	Owner
	8		82.8		22.4	Owner
	9		69		20	Owner
	10		93		20.8	Owner
	11		51		22	Owner
	12		81		20	Owner
	13		75		19.6	Non-Owner
	14		52.8		20.8	Non-Owner
	15		64.8		17.2	Non-Owner
	16		43.2		20.4	Non-Owner
	17		84		17.6	Non-Owner
	18		49.2		17.6	Non-Owner
	19		59.4		16	Non-Owner
	20		66		18.4	Non-Owner
	21		47.4		16.4	Non-Owner
	22		33		18.8	Non-Owner
	23		51		14	Non-Owner
	24		63		14.8	Non-Owner

Validation error log for different k

Value of k		% Error Training		% Error Validation
	1		0.00		33.33
	2		16.67		33.33
	3		11.11		33.33
	4		22.22		33.33
	5		11.11		33.33
	6		27.78		33.33
	7		22.22		33.33
	8		22.22		16.67	<—Best k
	9		22.22		16.67
	10		22.22		16.67
	11		16.67		33.33
	12		16.67		16.67
	13		11.11		33.33
	14		11.11		16.67
	15		5.56		33.33
	16		16.67		33.33
	17		11.11		33.33
	18		50.00		50.00

Examine the Riding Lawnmower data. Consider a new household with $60,000 income and lot size 20,000 ft2. Using k = 1, would you classify this individual as an owner or non-owner?

A) Owner

B) Non-owner

C) Impossible to tell.

11) Riding Lawnmower Problem

Household number		Income ($ 000's)		Lot Size (000's Ft2)			Ownership of, riding mower
	1		60		18.4	Owner
	2		85.5		16.8	Owner
	3		64.8		21.6	Owner
	4		61.5		20.8	Owner
	5		87		23.6	Owner
	6		110.1		19.2	Owner
	7		108		17.6	Owner
	8		82.8		22.4	Owner
	9		69		20	Owner
	10		93		20.8	Owner
	11		51		22	Owner
	12		81		20	Owner
	13		75		19.6	Non-Owner
	14		52.8		20.8	Non-Owner
	15		64.8		17.2	Non-Owner
	16		43.2		20.4	Non-Owner
	17		84		17.6	Non-Owner
	18		49.2		17.6	Non-Owner
	19		59.4		16	Non-Owner
	20		66		18.4	Non-Owner
	21		47.4		16.4	Non-Owner
	22		33		18.8	Non-Owner
	23		51		14	Non-Owner
	24		63		14.8	Non-Owner

Validation error log for different k

Value of k		% Error Training		% Error Validation
	1		0.00		33.33
	2		16.67		33.33
	3		11.11		33.33
	4		22.22		33.33
	5		11.11		33.33
	6		27.78		33.33
	7		22.22		33.33
	8		22.22		16.67	<—Best k
	9		22.22		16.67
	10		22.22		16.67
	11		16.67		33.33
	12		16.67		16.67
	13		11.11		33.33
	14		11.11		16.67
	15		5.56		33.33
	16		16.67		33.33
	17		11.11		33.33
	18		50.00		50.00

Consider the Riding Lawnmower data above. Consider a new household with $60,000 income and lot size 20,000 ft2. Using k = 3, would you classify this individual as an owner or non-owner?

A) Owner

B) Non-owner

C) Impossible to tell.

12) Riding Lawnmower Problem

Household number		Income ($ 000's)		Lot Size (000's Ft2)			Ownership of, riding mower
	1		60		18.4	Owner
	2		85.5		16.8	Owner
	3		64.8		21.6	Owner
	4		61.5		20.8	Owner
	5		87		23.6	Owner
	6		110.1		19.2	Owner
	7		108		17.6	Owner
	8		82.8		22.4	Owner
	9		69		20	Owner
	10		93		20.8	Owner
	11		51		22	Owner
	12		81		20	Owner
	13		75		19.6	Non-Owner
	14		52.8		20.8	Non-Owner
	15		64.8		17.2	Non-Owner
	16		43.2		20.4	Non-Owner
	17		84		17.6	Non-Owner
	18		49.2		17.6	Non-Owner
	19		59.4		16	Non-Owner
	20		66		18.4	Non-Owner
	21		47.4		16.4	Non-Owner
	22		33		18.8	Non-Owner
	23		51		14	Non-Owner
	24		63		14.8	Non-Owner

Validation error log for different k

Value of k		% Error Training		% Error Validation
	1		0.00		33.33
	2		16.67		33.33
	3		11.11		33.33
	4		22.22		33.33
	5		11.11		33.33
	6		27.78		33.33
	7		22.22		33.33
	8		22.22		16.67	<—Best k
	9		22.22		16.67
	10		22.22		16.67
	11		16.67		33.33
	12		16.67		16.67
	13		11.11		33.33
	14		11.11		16.67
	15		5.56		33.33
	16		16.67		33.33
	17		11.11		33.33
	18		50.00		50.00

Consider the Riding Lawnmower data above. Why would the model choose a higher value of k than k = 1?

A) The model will rarely choose higher values of k unless there is collinearity in the attributes.

B) The model only chooses higher values of k if the data set is large.

C) The choice of k is made by the researcher alone and not the software.

D) Higher values of k may provide smoothing that reduces the risk of overfitting due to noise in the training data.

13)

The diagram above represents which data mining technique?

A) kNN

B) Regression tree

C) Naїve Bayes

D) Logit

14)

The above diagram represents what data mining classification scheme?

A) kNN

B) Classification tree

C) Naїve Bayes

D) Logit

15)

The information above was provided for an e-mail that was classified as spam. What data mining algorithm was probably used to make the classification?

A) kNN

B) Regression tree

C) Naїve Bayes

D) Logit

16)

Which data mining algorithm (represented above) uses a quadratic classifier?

A) kNN

B) Regression tree

C) Naїve Bayes

D) Logit

17)

What data mining technique is represented in the diagram of a classification scheme above?

A) kNN

B) Classification tree

C) Naїve Bayes

D) Logit

18) In the k-Nearest Neighbor technique in data mining, the "k" refers to

A) the originator of the technique, Jonathan Knowlton.

B) the number of neighbors used.

C) the number of classes into which the variable may be divided.

D) the weight of the target.

E) None of the options are correct.

19)

The data mining technique represented above is probably

A) a k-nearest neighbor model.

B) a Naїve Bayes model.

C) a classification tree.

D) a logistic regression.

20)

In setting up this k-nearest neighbor model

A) the user is allowing XLMiner™ to select the optimal value of k.

B) the optimal k is set by the user at 10.

C) the data is normalized in order to take into account the categorical variables.

D) it is necessary to set an optimal value for k.

21) The diagram below depicts the probability that a person takes out a loan given their level of income. The function shown is

A) an ordinary least squares model (OLS).

B) a linear probability model (LPM).

C) the odds function.

D) a logit.

22) Consider the equation below.

p(/d) =

This equation is the basis of

A) the logit model.

B) the Naїve Bayes Model.

C) the k-nearest neighbor model.

D) classification tree models.

23) "Pruning" is used in what data mining model?

A) Naїve Bayes

B) Logit

C) K-Nearest Neighbor

D) Classification Trees

24) "Pruning" is used

A) to overcome correlation among the attributes.

B) only when the attributes are dichotomous.

C) to prevent the model from overfitting the data.

D) as a "data utility" in order to create a validation set.

25) "Entropy" measures are used in which data mining algorithm?

A) Logit

B) Classification Trees

C) Naїve Bayes

D) K-Nearest Neighbor

E) Neural Networks

26) "Information Gain" and "Entropy"

A) are used in Classification Trees to determine when to stop the algorithm.

B) are two components of Bayes Theorem.

C) are related ways of categorizing risk.

D) are unrelated.

27) If I choose to classify Insects as either Katydids or Grasshoppers by examining the distribution of the lengths of the antennas of a sample of the two insects (as shown below), this would be the beginning analysis of what data mining algorithm?

A) Naїve Bayes

B) Logit

C) Regression Tree

D) K-Nearest Neighbor

28) What data mining algorithm result is being depicted below?

A) K-Nearest Neighbor

B) Naїve Bayes

C) Decision Tree

D) Logit

E) Neural Net

29) Consider the calculations below:

p ( male ) = =

p ( female ) = =

These are examples of

A) prior probability calculations.

B) posterior probability calculations.

C) entropy calculations.

D) information gain calculations.

30) XLMiner : Naїve Bayes

Output Navigator
Inputs	Train. Score - Summary	Valid. Score - Summary	Test Score - Summary
Elapsed Time	Train. Score - Detailed Rep.	Valid Score - Detailed Rep.	Test Score - Detailed Rep.
Prior Class Pr	Training Lift Charts	Validation Lift Charts	Test Lift Chart

According to relative occurrences in training data

Class	Prob.
Alive	0.314912945	<—Success Class
Dead	0.685087055

Conditional probabilities

	Classes-->
	Alive			Dead
Input Variables	Value	Prob		Value	Prob
Age	Adult		0.9375	Adult		0.964640884
	Child		0.0625	Child		0.035359116
Sex	Female		0.46875	Female		0.082872928
	Male		0.53125	Male		0.917127072
Class	Crew		0.295673077	Crew		0.450828729
	First		0.295673077	First		0.071823204
	Second		0.146634615	Second		0.10718232
	Third		0.262019231	Third		0.370165746

Examine the Naїve Bayes output that describes the Titanic survival model.

What is the probability of survival if you are a crew member, male, and adult?

A) 0.613324957

B) 0.001352846

C) 0.442673445

D) 0.145090831

31) XLMiner : Naїve Bayes

Output Navigator
Inputs	Train. Score - Summary	Valid. Score - Summary	Test Score - Summary
Elapsed Time	Train. Score - Detailed Rep.	Valid Score - Detailed Rep.	Test Score - Detailed Rep.
Prior Class Pr	Training Lift Charts	Validation Lift Charts	Test Lift Chart

According to relative occurrences in training data

Class	Prob.
Alive	0.314912945	<—Success Class
Dead	0.685087055

Conditional probabilities

	Classes-->
	Alive				Dead
Input variables	Value	Prob		Value		Prob
Age	Adult		0.9375	Adult			0.964640884
	Child		0.0625	Child			0.035359116
Sex	Female		0.46875	Female			0.082872928
	Male		0.53125	Male			0.917127072
Class	Crew		0.295673077	Crew			0.450828729
	First		0.295673077	First			0.071823204
	Second		0.146634615	Second			0.10718232
	Third		0.262019231	Third			0.370165746

Examine the Naїve Bayes output that describes the Titanic survival model.

What is the conditional probability of being dead if you are a crew member, male, and adult?

A) 0.3988

B) 0.917127072

C) 0.450828729

D) 0.685087055

32) The "logit" is

A) a linear function with a Z distribution.

B) an attribute in a logistics regression.

C) the natural log of an odds ratio.

D) the conditional probability that the success rate is greater than the cutoff value.

33)

The diagram above represents

A) the locus of all points that could cause the success rate to be above 50 percent.

B) a logistics regression output from XLMiner.

C) the Naїve Bayes classifier as being between zero and one.

D) a graph of the possible values of the logit in a logistics regression.

34) In logistics regression data mining, P/(1-P) represents

A) the logit.

B) the log likelihood of success.

C) the odds of success.

D) the cutoff value.

35)

The regression line shown above was estimated using an ordinary least squares regression technique. This regression is inappropriate to use on this data because

A) the attribute measured here is dichotomous.

B) there is no apparent relationship between hours of study and outcome.

C) there is only a single attribute in the model.

D) the target variable is categorical.

36) Among the advantages to using the Naїve Bayes model is

A) it is quite sensitive to irrelevant features.

B) it is fast at classification.

C) it can be used in situations in which the target variable is continuous.

D) All of the options are correct.

37) Naїve Bayes is called "Naїve" because

A) very few attributes are needed to obtain accurate classifications.

B) the model assumes that only continuous variables can be used as attributes.

C) it tends to be used only as a "baseline" model in order to measure the effectiveness of other data mining techniques.

D) the attributes are assumed to be independent of one another.

38) In a Naїve Bayes model it is necessary

A) that all attributes be categorical.

B) to partition the data into three parts (training, validation, and scoring).

C) to set cutoff values to less than 0.75.

D) to have a continuous target variable.

39) Naїve Bayes models

A) use a linear classifier.

B) use a nonlinear classifier.

C) use a waveform classifier.

D) use a logit as a classifier.

40) Which classification technique that we covered assumed that the attributes had independent distributions?

A) k-Nearest Neighbor

B) Classification trees

C) Naїve Bayes

D) Logistics Regression

41) Our confidence that X is an apple given that we have seen X is red and round

A) is a coincident probability.

B) could lead us to misclassify similar objects.

C) is a prior probability.

D) is a posterior probability.

42)

• We want to classify an insect we have found. Its antennae are 3 units long. How can we classify it?

• We can just ask ourselves, given the distributions of antennae lengths we have seen, is it more probable that our insect is a Grasshopper or a Katydid.

• There is a formal way to discuss the most probable classification...

p(cj|d) = probability of class cj, given that we have observed d

What data mining technique is demonstrated here?

A) k-Nearest Neighbor

B) Classification Tree

C) Naïve Bayes

D) Logistic Regression

43)

#Decision Nodes

#Terminal Nodes

Level	NodeID		ParentID		SplitVar		SplitValue		Cases		Left Child		Right Child		Class		Node Type
0		0		N/A		Income		100.5	2000	1		2		0		Decision
1		1		0		CCAvg		2.95	1507	3		4		0		Decision
1		2		0		Education		1.5	493	5		6		0		Decision
2		3		1		N/A		N/A	1422	N/A		N/A		0		Terminal
2		4		1		Income		82.5	85	7		8		0		Decision
2		5		2		Family		2.5	316	9		10		0		Decision
2		6		2		Income		116.5	177	11		12		1		Decision
3		7		4		Age		30.5	45	13		14		0		Decision
3		8		4		CCAvg		4.35	40	15		16		0		Decision
3		9		5		N/A		N/A	276	N/A		N/A		0		Terminal
3		10		5		Income		116	40	17		18		1		Decision
3		11		6		CCAvg		2.45	56	19		20		0		Decision
3		12		6		N/A		N/A	121	N/A		N/A		1		Terminal
4		13		7		N/A		N/A	1	N/A		N/A		1		Terminal
4		14		7		N/A		N/A	44	N/A		N/A		0		Terminal

The table above is part of the output from a data mining algorithm seeking to predict whether an individual will take out a personal loan given a set of attributes. What data mining technique is probably being used here?

A) k-Nearest Neighbor

B) Classification Tree

C) Naïve Bayes

D) Regression Tree

44)

#Decision Nodes	% Error
40	2.35
39	2.35
38	2.20
37	2.20
36	2.05
35	1.90
34	1.90
33	1.90
32	1.90
31	1.90
30	1.90
29	1.90
28	1.90
27	1.90
26	1.75
25	1.75
24	1.95
23	1.95
22	1.80
21	1.65
20	1.70
19	1.65
18	1.65
17	1.65	<--Min. Err. Tree	Std. Err.	0.002848486
16	1.80

The above is a prune log for a data mining technique. What technique would have this type of output?

A) k-Nearest Neighbor

B) Classification Tree

C) Naïve Bayes

D) Logistic Regression

45)

Information Gain
(Reduction in Entropy)
Hair Length	0.0911
Weight	0.5900
Age	0.0183

Which attribute above provides the greatest reduction in entropy?

A) Hair Length

B) Weight

C) Age

D) The above are not reasonable entropy measures; rather they show information gain.

46)

Information Gain
(Reduction in Entropy)
Hair Length	0.0911
Weight	0.5900
Age	0.0183

When a collection of objects is completely uniform,

A) entropy is at a maximum.

B) entropy is at a minimum.

C) entropy would be about 0.5.

D) Uniformity has nothing to do with entropy.

47) When estimating a k-Nearest Neighbor model we use a subset of the total data we have available

A) called the training data.

B) because "in-sample" test statistics are not available.

C) because most software packages are incapable of handling the entire data set.

D) called the verification data set.

E) None of the options are correct.

48)

Securities Account	CD Account		Online		CreditCard		Binned_Zip Code
1		0		0		0		4
1		0		0		0		2
0		0		0		0		16
0		0		0		0		13
0		0		0		1		4
0		0		1		0		7
0		0		1		0		5
0		0		0		1		11
0		0		1		0		2
0		0		0		0		10

The data above is a portion of the Universal Bank data we used in class. The last column of data is "binned." This refers to

A) the use of a binary (or "binned") variable.

B) the fact that this variable has been "normalized."

C) the creation of "categories" of zip codes that cover a number of individual zip codes.

D) None of the options are correct.

49) The k-Nearest Neighbor technique

A) is a classification technique.

B) is a regression technique.

C) is an association technique.

D) is a verification technique.

50) The k-Nearest Neighbor technique

A) may only use a maximum of two features.

B) selects using only the nearest neighbor.

C) uses "k" attributes.

D) None of the options are correct.

51) The Logit is

A) an instruction to record the data.

B) the cube root of the sample size.

C) the natural logarithm of the odds ratio.

D) a logarithm of a digit.

52) Which of the following is true?

A) Logistic regression is analogous to multiple regression.

B) Logistic regression is estimated like a linear least squares regression.

C) Logistic regression is just another name for multiple regression.

D) Logistic regression can only be used with a continuous target.

53) In logistic regression, the target

A) is expressed in bits.

B) is a score.

C) consists of two categories.

D) is like the median and is split into two equal halves.

54) A model in logistic regression is

A) a miniature version of the analysis based on a small number of records.

B) a set of attributes which classify cases of the target well.

C) the most common score.

D) a worked example.

55) An attribute used in logistics regression consists of three categories—black, white, and red. Which of the following would not be a dummy variable in the analysis?

A) Red versus others

B) Black versus others

C) White versus others

D) Any of the previous three

56) Multinomial logistic regression is just a special case of

A) kNN Models.

B) multiple regression.

C) multiple comparisons tests such as the Duncan test.

D) None of the options are correct.

57) Which of the following statements is false concerning the linear probability model (LPM)?

A) There is nothing in the model to ensure that the estimated probabilities lie between zero and one.

B) Even if the probabilities are truncated at zero and one, there will probably be many observations for which the probability is either exactly zero or exactly one.

C) The error terms will be heteroscedastic and not normally distributed.

D) The model is much harder to estimate than a standard regression model with a continuous target.

58) The process of partitioning the ranges of quantitative attributes into intervals, is called _________.

A) splitting

B) grouping

C) binning

D) None of the options are correct.

59) In order for a linear classification model to classify N classes,

A) at least N attributes must be included in the data set.

B) the data set must contain about an even number of records of each classification.

C) at least N records must be present.

D) only N − 1 lines need to be fitted.

60) Linear classifiers

A) may only be two-dimensional (e.g., two attributes).

B) may be dimensional of any size.

C) are limited to relatively small data sets (e.g., less than about 100 records).

D) are among the most complicated of data mining classification algorithms.

61)

In the accompanying diagram of an insect we wish to classify, "Spiracle Diameter" is

A) a dependent variable.

B) a target.

C) a record.

D) an attribute.

62) In data mining terminology, "scoring" refers to

A) using the algorithm to predict.

B) the estimation of the parameters of the algorithm.

C) evaluating the appropriateness and accuracy of the algorithm.

D) an examination of the misclassification rate.

63) Classification Models in data mining include all of the following except

A) Neural nets.

B) Naïve Bayes.

C) Logit.

D) K-Nearest Neighbor.

E) Association rule mining.

64) In the k-Nearest Neighbor algorithm the attributes are translated to _________.

A) values

B) points in multidimensional space

C) strings of characters

D) nodes

65) In a CART Model classification rules are extracted from _________.

A) the root node

B) the decision tree

C) the siblings

D) the leaves

66) Which of the following is a valid production rule for the decision tree below?

A) IF Business Appointment = No and Temp above 70 = No THEN Decision = wear slacks

B) IF Business Appointment = Yes and Temp above 70 = Yes THEN Decision = wear shorts

C) IF Temp above 70 = No THEN Decision = wear shorts

D) IF Business Appointment = No and Temp above 70 = No THEN Decision = wear jeans

67) A nearest neighbor approach is best used _________.

A) with large-sized data sets

B) when irrelevant attributes have been removed from the data

C) when a generalized model of the data is desirable

D) when an explanation of what has been found is of primary importance

68) Classification problems are distinguished from estimation problems in that

A) classification problems require the output attribute to be numeric.

B) classification problems require the output attribute to be categorical.

C) classification problems do not allow an output attribute.

D) classification problems are designed to predict future outcome.

69) Logistic regression is a ________ regression technique that is used to model data having a ________ outcome.

A) linear; numeric

B) linear; binary

C) nonlinear; numeric

D) nonlinear; binary

70) This technique associates a conditional probability value with each data instance.

A) Linear regression

B) Logistic regression

C) Simple regression

D) Multiple linear regression

71) This supervised learning technique can process both numeric and categorical input attributes.

A) Linear regression

B) Bayes classifier

C) Logistic regression

D) Backpropagation learning

72) With Bayes classifier, missing data items are

A) treated as equal compares.

B) treated as unequal compares.

C) replaced with a default value.

D) ignored.

73) The table below contains counts and ratios for a set of data instances to be used for supervised Bayesian learning. The output attribute is sex with possible values male and female. Consider an individual who has said no to the life insurance promotion, yes to the magazine promotion, yes to the watch promotion and has credit card insurance. Use the values in the table together with Bayes classifier to determine which of a, b, c or d represents the probability that this individual is male.

Magazine

Promotion

Watch

Promotion

Life Insurance Promotion

Credit Card

Insurance

male

female

male

female

male

female

male

female

Yes

4/6

3/4

2/6

2/4

2/6

3/4

2/6

1/4

2/6

1/4

4/6

2/4

4/6

1/4

4/6

3/4

A) (4/6) (2/6) (2/6) (2/6) (6/10)/P(E)

B) (4/6) (2/6) (3/4) (2/6) (3/4)/P(E)

C) (4/6) (4/6) (2/6) (2/6) (6/10)/P(E)

D) (2/6) (4/6) (4/6) (2/6) (4/10)/P(E)

74) Method in which previously calculated probabilities are revised with new probabilities is classified as

A) updating theorem.

B) revised theorem.

C) Bayes theorem.

D) dependency theorem.

75) In the k-Nearest Neighbors method, when the value of k is set to 1,

A) the classification or prediction of a new observation is based solely on the single most similar observation from the training set.

B) the new observation's class is naïvely assigned to the most common class in the training set.

C) the new observation's prediction is used to estimate the anticipated error rate on future data over the entire training set.

D) the classification or prediction of a new observation is subject to the smallest possible classification error.

76) This is Officer Drew.

Is Officer Drew a male or Female?

Luckily, we have a small database with names and sex.

We can use it to apply Bayes Theorem.

Database

Name	Sex
Drew	Male
Claudia	Female
Drew	Female
Drew	Female
Alberto	Male
Karin	Female
Nina	Female
Sergio	Male

p(/d) =

Consider the "Officer Drew" situation we discussed in class. Bayes Theorem is also shown in the diagram. Calculate the probability that the individual shown is "Male" given that the person's name is "Drew." Use the database information as a basis for your calculation.

A) ~ 41%

B) ~ 62%

C) ~ 25%

D) ~ 33%

77) This is Officer Drew.

Is Officer Drew a male or Female?

Luckily, we have a small database with names and sex.

We can use it to apply Bayes Theorem.

Database

Name	Sex
Drew	Male
Claudia	Female
Drew	Female
Drew	Female
Alberto	Male
Karin	Female
Nina	Female
Sergio	Male

p(/d) =

Consider the "Officer Drew" situation we discussed in class. Bayes Theorem is also shown in the diagram. Calculate the probability that the individual shown is "Female" given that the person's name is "Drew." Use the database information as a basis for your calculation.

A) ~ 66%

B) ~ 59%

C) ~ 75%

D) ~ 38%

78) Consider the following situation as a Bayesian; the situation is very similar to the "Kahneman's Cab" situation we discussed in class. In a given population 1% of the people might have cancer. Tests can be taken to identify cancer. The following are the details about the test:

1) Test will be positive 90% of the time if someone has cancer.

2) Test will be negative 90% of the time if someone does not have cancer.

If you take the cancer test and it comes out as a positive, "what is the probability you actually have cancer?"

A) ~ 90%

B) ~ 8%

C) ~ 81%

D) ~ 18%

79) Assume you are asked to create a model that predicts the number of new babies born per period according to the size of the stork population. In this case, the number of babies is

A) a target.

B) a feature.

C) an outcome.

D) an observation.

80) A nearest neighbor approach is best used

A) with large-sized data sets.

B) when irrelevant attributes have been removed from the data.

C) when a generalized model of the data is desirable.

D) when an explanation of what has been found is of primary importance.

81) Classification problems are distinguished from estimation problems in that

A) classification problems require the output attribute to be numeric.

B) classification problems require the output attribute to be categorical.

C) classification problems do not allow an output attribute.

D) classification problems are designed to predict future outcome.

82) Which statement is true about the decision tree attribute selection process?

A) A categorical attribute may appear in a tree node several times but a numeric attribute may appear at most once.

B) A numeric attribute may appear in several tree nodes but a categorical attribute may appear at most once.

C) Both numeric and categorical attributes may appear in several tree nodes.

D) Numeric and categorical attributes may appear in at most one tree node.

83) Consider the following situation as a Bayesian; the situation is very similar to the "Kahneman's Cab" situation we discussed in class. In a given population, 1% of the people might have cancer. Tests can be taken to identify cancer. The following are the details about the test:

1) Test will be positive 80% of the time if someone has cancer (and therefore

20% miss it).

2) 9.6% of tests detect cancer when it's not there (and therefore 90.4% correctly

return a negative result).

Suppose you get a positive test. What is the actual probability you have cancer?

A) ~80%

B) ~1%

C) ~99%

D) ~8%

84)

The equation shown above

A) is called a "normal" equation.

B) calculates a test statistic for autocorrelation.

C) measures entropy for classification models.

D) is useful for transforming nonstationary data.

85) "Tree Pruning" seeks to

A) reduce the possibility of misclassification error.

B) reduce the AUC.

C) identify and remove branches that reflect noise or outliers.

D) reduce the number of categories being classified.

86)

Use the tree created in XLMiner to classify the unknown Iris whose attributes are listed. Recall that there are only three classes of Iris to choose in this system: Versicolor, Virginica, or Setosa.

A) Versicolor

B) Virginica

C) Setosa

D) This tree is incapable of classifying the listed Iris.

87)

Some of the nodes in the Iris Classification Tree are called "Terminal Nodes." A Terminal Node

A) contains only a single instance of the items being classified.

B) contains classes that are nonhomogeneous.

C) contains a roughly even number of each class of item.

D) contains only a single class of the items being classified.

88) The "right-sized" classification tree

A) is rarely a pruned tree.

B) is sometimes determined by using v-fold cross validation.

C) is the one with the lowest misclassification rate in the training partition.

D) is never more than 7 levels deep.

89) Which of the following methods do we use to best fit the data in Logistic Regression?

A) Least Square Error

B) Maximum Likelihood

C) Jaccard Distance

D) Both "Least Square Error" and "Maximum Likelihood" are correct.

90) Which of the following diagnostic metrics cannot be applied in case of logistic regression output to compare with target?

A) AUC-ROC

B) Lift Chart

C) Cumulative Gains Chart

D) Mean Squared Error

91) Suppose you have been given a fair coin and you want to find out the odds of getting heads. Which of the following option is true for such a case?

A) Odds will be 0.

B) Odds will be 0.5.

C) Odds will be 1.

D) None of the options are correct.

92) Suppose you applied a Logistic Regression model on a given data and got a training accuracy X and testing accuracy Y. Now, you want to add a few new features in the same data. Select the option(s) which is/are correct in such a case.

A) Training accuracy increases.

B) Training accuracy increases or remains the same.

C) Testing accuracy decreases.

D) Testing accuracy increases or remains the same.

E) Both "Training accuracy increases" and "Training accuracy increases or remains the same" are correct.

93) The below figure shows three ROC curves for three different logistic regression models. Which of the following ROC curves signifies the best result?

A) The top (highest) ROC curve

B) The middle ROC curve

C) The bottom (lowest) ROC curve

D) Unable to determine an answer from information given.

94) Below are the 8 actual values of target variable in the training partition.

[0, 0, 0, 1, 1, 1, 1, 1]

What is the entropy of the target variable?

A) −5/8 log2(5/8) − 3/8 log2(3/8))

B) 5/8 log2(5/8) − 3/8 log2(3/8)

C) −3/8 log2(5/8) + 5/8 log2(3/8)

D) −5/8 log2(3/8) + 3/8 log2(5/8)

95) What is a "decision tree"?

A) It is a flowchart like tree structure, where each internal node denotes a test on an attribute and each branch represents an outcome of the test.

B) It is a clustering algorithm based upon reducing entropy from one branch to the succeeding branches.

C) Decision trees are part of the pre-processing necessary when beginning to engage in text mining; it is a form of dimension reduction.

D) None of the options are correct.

96) _________ is an example of case based-learning.

A) A decision tree

B) A k-Nearest Neighbor algorithm

C) A neural network

D) None of the options are correct.

97) In the kNN algorithm the input is translated to _________.

A) values

B) points in multidimensional space

C) strings of characters

D) nodes

98) Claude Shannon's name is most closely associated with

A) information entropy.

B) Bayesian algorithms.

C) decision trees.

D) natural language processing.

Document Information

Document Type:

DOCX

Chapter Number:

Created Date:

Aug 21, 2025

Chapter Name:

Chapter 9 Classification Models The Most Used Models In Analytics

Author:

Barry Keating

Connected Book

Forecasting with Forecast X 7e Complete Test Bank

By Barry Keating

Test Bank General

View Product →

Explore recommendations drawn directly from what you're reading

Chapter 7 ARIMA Forecasting Models

DOCX Ch. 7

Chapter 8 Predictive Analytics Helping To Make Sense Of Big Data

DOCX Ch. 8

Chapter 9 Classification Models The Most Used Models In Analytics

DOCX Ch. 9 Current

Chapter 10 Ensemble Models And Clustering

DOCX Ch. 10

Chapter 11 Text Mining

DOCX Ch. 11

$24.99

100% satisfaction guarantee

Buy Full Test Bank

Quick Navigation

Preview Document Info Connected Book Recommendations

Benefits

Immediately available after payment

Answers are available after payment

ZIP file includes all related files

Files are in Word format (DOCX)

Check the description to see the contents of each ZIP file

We do not share your information with any third party

Ch9 – Full Test Bank | Classification Models The Most Used

Document Information

Connected Book

Forecasting with Forecast X 7e Complete Test Bank

Explore recommendations drawn directly from what you're reading

$24.99

Quick Navigation

Benefits

Report Unauthorized Use

Added to Cart