Ch9 – Full Test Bank | Classification Models The Most Used - Forecasting with Forecast X 7e Complete Test Bank by Barry Keating. DOCX document preview.
Forecasting and Predictive Analytics with Forecast X, 7e (Keating)
Chapter 9 Classification Models: The Most Used Models in Analytics
1) Flight Delays Data (Naїve Bayes Model)
N.B.
Success = 1 = Delayed
Failure = 0 = Ontime
Prior class probabilities
According to relative occurrences in training data
Class | Prob |
| |
1 |
| 0.193792581 | <—Success Class |
0 |
| 0.806207419 |
|
Conditional probabilities
| Classes--> | ||||||||||||
Input Variables | 1 | 0 | |||||||||||
Value | Prob | Value | Prob | ||||||||||
CARRIER | CO |
|
| 0.06640625 |
| CO |
|
| 0.038497653 |
| |||
DH |
|
| 0.33984375 |
| DH |
|
| 0.243192488 |
| ||||
DL |
|
| 0.109375 |
| DL |
|
| 0.2 |
| ||||
MQ |
|
| 0.1796875 |
| MQ |
|
| 0.112676056 |
| ||||
OH |
|
| 0.01171875 |
| OH |
|
| 0.017840376 |
| ||||
RU |
|
| 0.21484375 |
| RU |
|
| 0.170892019 |
| ||||
UA |
|
| 0.0078125 |
| UA |
|
| 0.016901408 |
| ||||
US |
|
| 0.0703125 |
| US |
|
| 0.2 |
| ||||
DAY_OF_WEEK | 1 |
|
| 0.203125 |
| 1 |
|
| 0.128638498 |
| |||
2 |
|
| 0.16015625 |
| 2 |
|
| 0.139906103 |
| ||||
3 |
|
| 0.12890625 |
| 3 |
|
| 0.152112676 |
| ||||
4 |
|
| 0.12890625 |
| 4 |
|
| 0.159624413 |
| ||||
5 |
|
| 0.1640625 |
| 5 |
|
| 0.181220557 |
| ||||
6 |
|
| 0.0703125 |
| 6 |
|
| 0.131455399 |
| ||||
7 |
|
| 0.14453125 |
| 7 |
|
| 0.107042254 |
| ||||
DEP_TIME_BLK | 0600-0659 |
|
| 0.03515625 |
| 0600-0659 |
|
| 0.061971831 |
| |||
0700-0759 |
|
| 0.05078125 |
| 0700-0759 |
|
| 0.060093897 |
| ||||
0800-0859 |
|
| 0.0546875 |
| 0800-0859 |
|
| 0.071361502 |
| ||||
0900-0959 |
|
| 0.0234375 |
| 0900-0959 |
|
| 0.053521127 |
| ||||
1000-1059 |
|
| 0.01953125 |
| 1000-1059 |
|
| 0.057276995 |
| ||||
1100-1159 |
|
| 0.01953125 |
| 1100-1159 |
|
| 0.038497653 |
| ||||
1200-1259 |
|
| 0.0546875 |
| 1200-1259 |
|
| 0.062910798 |
| ||||
1300-1359 |
|
| 0.05078125 |
| 1300-1359 |
|
| 0.068544601 |
| ||||
1400-1459 |
|
| 0.15234375 |
| 1400-1459 |
|
| 0.110798122 |
| ||||
1500-1559 |
|
| 0.08203125 |
| 1500-1559 |
|
| 0.064788732 |
| ||||
1600-1659 |
|
| 0.07421875 |
| 1600-1659 |
|
| 0.078873239 |
| ||||
1700-1759 |
|
| 0.15625 |
| 1700-1759 |
|
| 0.094835681 |
| ||||
1800-1859 |
|
| 0.03125 |
| 1800-1859 |
|
| 0.043192488 |
| ||||
1900-1959 |
|
| 0.08984375 |
| 1900-1959 |
|
| 0.040375587 |
| ||||
2000-2059 |
|
| 0.01953125 |
| 2000-2059 |
|
| 0.030985915 |
| ||||
2100-2159 |
|
| 0.0859375 |
| 2100-2159 |
|
| 0.051971831 |
| ||||
DEST | EWR |
|
| 0.38671875 |
| EWR |
|
| 0.283568075 |
| |||
JFK |
|
| 0.1875 |
| JFK |
|
| 0.176525822 |
| ||||
LGA |
|
| 0.42578125 |
| LGA |
|
| 0.539906103 |
| ||||
ORIGIN | BWI |
|
| 0.09375 |
| BWI |
|
| 0.068544601 |
| |||
DCA |
|
| 0.484375 |
| DCA |
|
| 0.635680751 |
| ||||
IAD |
|
| 0.421875 |
| IAD |
|
| 0.295774648 |
| ||||
Weather | 0 |
|
| 0.92578125 |
| 0 |
|
| 1 |
| |||
1 |
|
| 0.077421875 |
| 1 |
|
| 0 |
|
Using the Flight Delays data above that was computed using a Naïve Bayes Model, calculate the ontime probability for the following flight:
Carrier = DL
Day of Week = 7
Departure Time = 1000 - 1059
Destination = LGA
Origin = DCA
Weather = 0
A) 87%
B) 92%
C) 95%
D) 97%
E) 99%
2) Flight Delays Data (Naїve Bayes Model)
N.B.
Success = 1 = Delayed
Failure = 0 = Ontime
Prior class probabilities
According to relative occurrences in training data
Class | Prob |
| |
1 |
| 0.193792581 | <--Success Class |
0 |
| 0.806207419 |
|
Conditional probabilities
| Classes--> | |||||||||||||
Input Variables | 1 | 0 | ||||||||||||
Value | Prob | Value | Prob | |||||||||||
CARRIER | CO |
|
| 0.06640625 |
| CO |
|
| 0.038497653 |
| ||||
DH |
|
| 0.33984375 |
| DH |
|
| 0.243192488 |
| |||||
DL |
|
| 0.109375 |
| DL |
|
| 0.2 |
| |||||
MQ |
|
| 0.1796875 |
| MQ |
|
| 0.112676056 |
| |||||
OH |
|
| 0.01171875 |
| OH |
|
| 0.017840376 |
| |||||
RU |
|
| 0.21484375 |
| RU |
|
| 0.170892019 |
| |||||
UA |
|
| 0.0078125 |
| UA |
|
| 0.016901408 |
| |||||
US |
|
| 0.0703125 |
| US |
|
| 0.2 |
| |||||
DAY_OF_WEEK | 1 |
|
| 0.203125 |
| 1 |
|
| 0.128638498 |
| ||||
2 |
|
| 0.16015625 |
| 2 |
|
| 0.139906103 |
| |||||
3 |
|
| 0.12890625 |
| 3 |
|
| 0.152112676 |
| |||||
4 |
|
| 0.12890625 |
| 4 |
|
| 0.159624413 |
| |||||
5 |
|
| 0.1640625 |
| 5 |
|
| 0.181220557 |
| |||||
6 |
|
| 0.0703125 |
| 6 |
|
| 0.131455399 |
| |||||
7 |
|
| 0.14453125 |
| 7 |
|
| 0.107042254 |
| |||||
DEP_TIME_BLK | 0600-0659 |
|
| 0.03515625 |
| 0600-0659 |
|
| 0.061971831 |
| ||||
0700-0759 |
|
| 0.05078125 |
| 0700-0759 |
|
| 0.060093897 |
| |||||
0800-0859 |
|
| 0.0546875 |
| 0800-0859 |
|
| 0.071361502 |
| |||||
0900-0959 |
|
| 0.0234375 |
| 0900-0959 |
|
| 0.053521127 |
| |||||
1000-1059 |
|
| 0.01953125 |
| 1000-1059 |
|
| 0.057276995 |
| |||||
1100-1159 |
|
| 0.01953125 |
| 1100-1159 |
|
| 0.038497653 |
| |||||
1200-1259 |
|
| 0.0546875 |
| 1200-1259 |
|
| 0.062910798 |
| |||||
1300-1359 |
|
| 0.05078125 |
| 1300-1359 |
|
| 0.068544601 |
| |||||
1400-1459 |
|
| 0.15234375 |
| 1400-1459 |
|
| 0.110798122 |
| |||||
1500-1559 |
|
| 0.08203125 |
| 1500-1559 |
|
| 0.064788732 |
| |||||
1600-1659 |
|
| 0.07421875 |
| 1600-1659 |
|
| 0.078873239 |
| |||||
1700-1759 |
|
| 0.15625 |
| 1700-1759 |
|
| 0.094835681 |
| |||||
1800-1859 |
|
| 0.03125 |
| 1800-1859 |
|
| 0.043192488 |
| |||||
1900-1959 |
|
| 0.08984375 |
| 1900-1959 |
|
| 0.040375587 |
| |||||
2000-2059 |
|
| 0.01953125 |
| 2000-2059 |
|
| 0.030985915 |
| |||||
2100-2159 |
|
| 0.0859375 |
| 2100-2159 |
|
| 0.051971831 |
| |||||
DEST | EWR |
|
| 0.38671875 |
| EWR |
|
| 0.283568075 |
| ||||
JFK |
|
| 0.1875 |
| JFK |
|
| 0.176525822 |
| |||||
LGA |
|
| 0.42578125 |
| LGA |
|
| 0.539906103 |
| |||||
ORIGIN | BWI |
|
| 0.09375 |
| BWI |
|
| 0.068544601 |
| ||||
DCA |
|
| 0.484375 |
| DCA |
|
| 0.635680751 |
| |||||
IAD |
|
| 0.421875 |
| IAD |
|
| 0.295774648 |
| |||||
Weather | 0 |
|
| 0.92578125 |
| 0 |
|
| 1 |
| ||||
1 |
|
| 0.077421875 |
| 1 |
|
| 0 |
|
"Bayesian Probability" as used in the Naїve Bayes Model
A) uses naїve probabilities to estimate class probabilities.
B) uses only a single classifying variable to estimate the class probabilities.
C) uses simple probabilities instead of conditional probabilities.
D) uses derived probabilities to obtain class probabilities.
3) How does a "k-nearest neighbor" model work?
A) It uses conditional probabilities to estimate the prior probability of interest.
B) It uses geometric distances from observations in the data to select a class for an unknown.
C) It uses a continuous target estimated with any type of attribute.
D) It is based upon the concept of algorithmic minimization.
4) Logistic Regression
The following diagram is a Logistics Regression coefficient table for the Universal Bank data. The "Y" variable is the dichotomous variable Loan Offer (success =1). The multiple R2 for this Logistics Regression is reported as 0.6544.
The Regression Model
Input variables | Coefficient | Std. Error | p-value | Odds | ||||||||||||
Constant term |
| −13.20165825 |
|
| 2.46772742 |
|
| 0.00000009 |
|
| * |
| ||||
Age |
| −0.04453737 |
|
| 0.09096102 |
|
| 0.62439483 |
|
| 0.95643985 |
| ||||
Experience |
| 0.05657264 |
|
| 0.09005365 |
|
| 0.5298661 |
|
| 1.05820346 |
| ||||
Income |
| 0.0657607 |
|
| 0.00422134 |
|
| 0 |
|
| 1.06797111 |
| ||||
Family |
| 0.57155931 |
|
| 0.10119002 |
|
| 0.00000002 |
|
| 1.77102649 |
| ||||
CCAvg |
| 0.18724874 |
|
| 0.06153848 |
|
| 0.00234395 |
|
| 1.20592725 |
| ||||
Mortgage |
| 0.00175308 |
|
| 0.00080375 |
|
| 0.02917421 |
|
| 1.00175464 |
| ||||
Securities Account |
| −0.85484785 |
|
| 0.41863668 |
|
| 0.04115349 |
|
| 0.42534789 |
| ||||
CD Account |
| 3.46900773 |
|
| 0.44893095 |
|
| 0 |
|
| 32.10486984 |
| ||||
Online |
| −0.84355801 |
|
| 0.22832377 |
|
| 0.00022026 |
|
| 0.43017724 |
| ||||
CreditCard |
| −0.96406376 |
|
| 0.28254223 |
|
| 0.00064463 |
|
| 0.38134006 |
| ||||
EducGrad |
| 4.58909273 |
|
| 0.38708162 |
|
| 0 |
|
| 98.40509796 |
| ||||
EducProf |
| 4.52272701 |
|
| 0.38425466 |
|
| 0 |
|
| 92.08635712 |
|
For the Logistics Regression Model, the positive coefficients for dummy variables CD Account, EducGrad, and EducProf
A) are associated with higher probabilities of accepting the loan offer.
B) are insignificant because of their p-values and therefore irrelevant.
C) have Odds that are too high to be considered relevant.
D) are proved to be causally related to the loan offer variable.
5) Logistic Regression
The following diagram is a Logistics Regression coefficient table for the Universal Bank data. The "Y" variable is the dichotomous variable Loan Offer (success = 1). The multiple R2 for this Logistics Regression is reported as 0.6544.
The Regression Model
Input variables | Coefficient | Std. Error | p-value | Odds | ||||||||||||
Constant term |
| −13.20165825 |
|
| 2.46772742 |
|
| 0.00000009 |
|
| * |
| ||||
Age |
| −0.04453737 |
|
| 0.09096102 |
|
| 0.62439483 |
|
| 0.95643985 |
| ||||
Experience |
| 0.05657264 |
|
| 0.09005365 |
|
| 0.5298661 |
|
| 1.05820346 |
| ||||
Income |
| 0.0657607 |
|
| 0.00422134 |
|
| 0 |
|
| 1.06797111 |
| ||||
Family |
| 0.57155931 |
|
| 0.10119002 |
|
| 0.00000002 |
|
| 1.77102649 |
| ||||
CCAvg |
| 0.18724874 |
|
| 0.06153848 |
|
| 0.00234395 |
|
| 1.20592725 |
| ||||
Mortgage |
| 0.00175308 |
|
| 0.00080375 |
|
| 0.02917421 |
|
| 1.00175464 |
| ||||
Securities Account |
| −0.85484785 |
|
| 0.41863668 |
|
| 0.04115349 |
|
| 0.42534789 |
| ||||
CD Account |
| 3.46900773 |
|
| 0.44893095 |
|
| 0 |
|
| 32.10486984 |
| ||||
Online |
| −0.84355801 |
|
| 0.22832377 |
|
| 0.00022026 |
|
| 0.43017724 |
| ||||
CreditCard |
| −0.96406376 |
|
| 0.28254223 |
|
| 0.00064463 |
|
| 0.38134006 |
| ||||
EducGrad |
| 4.58909273 |
|
| 0.38708162 |
|
| 0 |
|
| 98.40509796 |
| ||||
EducProof |
| 4.52272701 |
|
| 0.38425466 |
|
| 0 |
|
| 92.08635712 |
|
Consider the Logistic Regression Mode for the Universal Bank data. The coefficient on the continuous variable Income means that
A) Income is inversely related to the loan offer variable.
B) Income is irrelevant because of its p-value.
C) higher values of Income are associated with greater probability of accepting the loan offer.
D) Income is likely not associated with the loan offer variable.
6) For the Logistic Regression using the Universal Bank data, the Pseudo-R2 reported by XLMiner™ was 0.6544. The lift chart was given as:
A) Neither the lift chart nor the Pseudo-R2 indicate a high degree of confidence in the model.
B) Both the lift chart and the Pseudo-R2 indicate a high degree of confidence in the model.
C) The lift chart indicates high confidence in the model but the Pseudo-R2 is at odds with this conclusion.
D) Because only a single bar of the decile-wise lift chart is above 1, there is little confidence in the model.
7) Consider the Logistics Regression Model for the Universal Bank data.
Which variable or variables appear to be insignificant?
A) Only Age
B) Age and Experience
C) Income, CD Account, EducGrad, and EducProf
D) All variables with "odds" less than one
8) Consider the Logistic Regression Model for the Universal Bank data.
A) Strong collinearity can lead to problems with the model.
B) Strong correlation among the attributes is not a difficulty when using Logit.
C) The Logit Model automatically adjusts for collinearity.
D) None of the options are correct.
9) Riding Lawnmower Problem
Household number | Income ($ 000's) | Lot Size (000's Ft2) | Ownership of, riding mower | |||||||||
| 1 |
|
| 60 |
|
| 18.4 |
| Owner |
| ||
| 2 |
|
| 85.5 |
|
| 16.8 |
| Owner |
| ||
| 3 |
|
| 64.8 |
|
| 21.6 |
| Owner |
| ||
| 4 |
|
| 61.5 |
|
| 20.8 |
| Owner |
| ||
| 5 |
|
| 87 |
|
| 23.6 |
| Owner |
| ||
| 6 |
|
| 110.1 |
|
| 19.2 |
| Owner |
| ||
| 7 |
|
| 108 |
|
| 17.6 |
| Owner |
| ||
| 8 |
|
| 82.8 |
|
| 22.4 |
| Owner |
| ||
| 9 |
|
| 69 |
|
| 20 |
| Owner |
| ||
| 10 |
|
| 93 |
|
| 20.8 |
| Owner |
| ||
| 11 |
|
| 51 |
|
| 22 |
| Owner |
| ||
| 12 |
|
| 81 |
|
| 20 |
| Owner |
| ||
| 13 |
|
| 75 |
|
| 19.6 |
| Non-Owner |
| ||
| 14 |
|
| 52.8 |
|
| 20.8 |
| Non-Owner |
| ||
| 15 |
|
| 64.8 |
|
| 17.2 |
| Non-Owner |
| ||
| 16 |
|
| 43.2 |
|
| 20.4 |
| Non-Owner |
| ||
| 17 |
|
| 84 |
|
| 17.6 |
| Non-Owner |
| ||
| 18 |
|
| 49.2 |
|
| 17.6 |
| Non-Owner |
| ||
| 19 |
|
| 59.4 |
|
| 16 |
| Non-Owner |
| ||
| 20 |
|
| 66 |
|
| 18.4 |
| Non-Owner |
| ||
| 21 |
|
| 47.4 |
|
| 16.4 |
| Non-Owner |
| ||
| 22 |
|
| 33 |
|
| 18.8 |
| Non-Owner |
| ||
| 23 |
|
| 51 |
|
| 14 |
| Non-Owner |
| ||
| 24 |
|
| 63 |
|
| 14.8 |
| Non-Owner |
|
Validation error log for different k
Value of k | % Error Training | % Error Validation |
| ||||||||||
| 1 |
|
| 0.00 |
|
| 33.33 |
|
| ||||
| 2 |
|
| 16.67 |
|
| 33.33 |
|
| ||||
| 3 |
|
| 11.11 |
|
| 33.33 |
|
| ||||
| 4 |
|
| 22.22 |
|
| 33.33 |
|
| ||||
| 5 |
|
| 11.11 |
|
| 33.33 |
|
| ||||
| 6 |
|
| 27.78 |
|
| 33.33 |
|
| ||||
| 7 |
|
| 22.22 |
|
| 33.33 |
|
| ||||
| 8 |
|
| 22.22 |
|
| 16.67 |
| <—Best k | ||||
| 9 |
|
| 22.22 |
|
| 16.67 |
|
| ||||
| 10 |
|
| 22.22 |
|
| 16.67 |
|
| ||||
| 11 |
|
| 16.67 |
|
| 33.33 |
|
| ||||
| 12 |
|
| 16.67 |
|
| 16.67 |
|
| ||||
| 13 |
|
| 11.11 |
|
| 33.33 |
|
| ||||
| 14 |
|
| 11.11 |
|
| 16.67 |
|
| ||||
| 15 |
|
| 5.56 |
|
| 33.33 |
|
| ||||
| 16 |
|
| 16.67 |
|
| 33.33 |
|
| ||||
| 17 |
|
| 11.11 |
|
| 33.33 |
|
| ||||
| 18 |
|
| 50.00 |
|
| 50.00 |
|
|
Consider the Riding Lawnmower data and the K-Nearest Neighbor Model results shown.
A) The best value of k was 8 because there was an almost even split between owners and non-owners.
B) The best value of k should always be less than the number of attributes.
C) The best value of k is the number of "neighbors" the model has chosen to poll when selecting a category choice.
D) The best value of k is irrelevant since we most often let k = 1.
10) Riding Lawnmower Problem
Household number | Income ($ 000's) | Lot Size (000's Ft2) | Ownership of, riding mower | |||||||||
| 1 |
|
| 60 |
|
| 18.4 |
| Owner |
| ||
| 2 |
|
| 85.5 |
|
| 16.8 |
| Owner |
| ||
| 3 |
|
| 64.8 |
|
| 21.6 |
| Owner |
| ||
| 4 |
|
| 61.5 |
|
| 20.8 |
| Owner |
| ||
| 5 |
|
| 87 |
|
| 23.6 |
| Owner |
| ||
| 6 |
|
| 110.1 |
|
| 19.2 |
| Owner |
| ||
| 7 |
|
| 108 |
|
| 17.6 |
| Owner |
| ||
| 8 |
|
| 82.8 |
|
| 22.4 |
| Owner |
| ||
| 9 |
|
| 69 |
|
| 20 |
| Owner |
| ||
| 10 |
|
| 93 |
|
| 20.8 |
| Owner |
| ||
| 11 |
|
| 51 |
|
| 22 |
| Owner |
| ||
| 12 |
|
| 81 |
|
| 20 |
| Owner |
| ||
| 13 |
|
| 75 |
|
| 19.6 |
| Non-Owner |
| ||
| 14 |
|
| 52.8 |
|
| 20.8 |
| Non-Owner |
| ||
| 15 |
|
| 64.8 |
|
| 17.2 |
| Non-Owner |
| ||
| 16 |
|
| 43.2 |
|
| 20.4 |
| Non-Owner |
| ||
| 17 |
|
| 84 |
|
| 17.6 |
| Non-Owner |
| ||
| 18 |
|
| 49.2 |
|
| 17.6 |
| Non-Owner |
| ||
| 19 |
|
| 59.4 |
|
| 16 |
| Non-Owner |
| ||
| 20 |
|
| 66 |
|
| 18.4 |
| Non-Owner |
| ||
| 21 |
|
| 47.4 |
|
| 16.4 |
| Non-Owner |
| ||
| 22 |
|
| 33 |
|
| 18.8 |
| Non-Owner |
| ||
| 23 |
|
| 51 |
|
| 14 |
| Non-Owner |
| ||
| 24 |
|
| 63 |
|
| 14.8 |
| Non-Owner |
|
Validation error log for different k
Value of k | % Error Training | % Error Validation |
| ||||||||||
| 1 |
|
| 0.00 |
|
| 33.33 |
|
| ||||
| 2 |
|
| 16.67 |
|
| 33.33 |
|
| ||||
| 3 |
|
| 11.11 |
|
| 33.33 |
|
| ||||
| 4 |
|
| 22.22 |
|
| 33.33 |
|
| ||||
| 5 |
|
| 11.11 |
|
| 33.33 |
|
| ||||
| 6 |
|
| 27.78 |
|
| 33.33 |
|
| ||||
| 7 |
|
| 22.22 |
|
| 33.33 |
|
| ||||
| 8 |
|
| 22.22 |
|
| 16.67 |
| <—Best k | ||||
| 9 |
|
| 22.22 |
|
| 16.67 |
|
| ||||
| 10 |
|
| 22.22 |
|
| 16.67 |
|
| ||||
| 11 |
|
| 16.67 |
|
| 33.33 |
|
| ||||
| 12 |
|
| 16.67 |
|
| 16.67 |
|
| ||||
| 13 |
|
| 11.11 |
|
| 33.33 |
|
| ||||
| 14 |
|
| 11.11 |
|
| 16.67 |
|
| ||||
| 15 |
|
| 5.56 |
|
| 33.33 |
|
| ||||
| 16 |
|
| 16.67 |
|
| 33.33 |
|
| ||||
| 17 |
|
| 11.11 |
|
| 33.33 |
|
| ||||
| 18 |
|
| 50.00 |
|
| 50.00 |
|
|
Examine the Riding Lawnmower data. Consider a new household with $60,000 income and lot size 20,000 ft2. Using k = 1, would you classify this individual as an owner or non-owner?
A) Owner
B) Non-owner
C) Impossible to tell.
11) Riding Lawnmower Problem
Household number | Income ($ 000's) | Lot Size (000's Ft2) | Ownership of, riding mower | |||||||||
| 1 |
|
| 60 |
|
| 18.4 |
| Owner |
| ||
| 2 |
|
| 85.5 |
|
| 16.8 |
| Owner |
| ||
| 3 |
|
| 64.8 |
|
| 21.6 |
| Owner |
| ||
| 4 |
|
| 61.5 |
|
| 20.8 |
| Owner |
| ||
| 5 |
|
| 87 |
|
| 23.6 |
| Owner |
| ||
| 6 |
|
| 110.1 |
|
| 19.2 |
| Owner |
| ||
| 7 |
|
| 108 |
|
| 17.6 |
| Owner |
| ||
| 8 |
|
| 82.8 |
|
| 22.4 |
| Owner |
| ||
| 9 |
|
| 69 |
|
| 20 |
| Owner |
| ||
| 10 |
|
| 93 |
|
| 20.8 |
| Owner |
| ||
| 11 |
|
| 51 |
|
| 22 |
| Owner |
| ||
| 12 |
|
| 81 |
|
| 20 |
| Owner |
| ||
| 13 |
|
| 75 |
|
| 19.6 |
| Non-Owner |
| ||
| 14 |
|
| 52.8 |
|
| 20.8 |
| Non-Owner |
| ||
| 15 |
|
| 64.8 |
|
| 17.2 |
| Non-Owner |
| ||
| 16 |
|
| 43.2 |
|
| 20.4 |
| Non-Owner |
| ||
| 17 |
|
| 84 |
|
| 17.6 |
| Non-Owner |
| ||
| 18 |
|
| 49.2 |
|
| 17.6 |
| Non-Owner |
| ||
| 19 |
|
| 59.4 |
|
| 16 |
| Non-Owner |
| ||
| 20 |
|
| 66 |
|
| 18.4 |
| Non-Owner |
| ||
| 21 |
|
| 47.4 |
|
| 16.4 |
| Non-Owner |
| ||
| 22 |
|
| 33 |
|
| 18.8 |
| Non-Owner |
| ||
| 23 |
|
| 51 |
|
| 14 |
| Non-Owner |
| ||
| 24 |
|
| 63 |
|
| 14.8 |
| Non-Owner |
|
Validation error log for different k
Value of k | % Error Training | % Error Validation |
| ||||||||||
| 1 |
|
| 0.00 |
|
| 33.33 |
|
| ||||
| 2 |
|
| 16.67 |
|
| 33.33 |
|
| ||||
| 3 |
|
| 11.11 |
|
| 33.33 |
|
| ||||
| 4 |
|
| 22.22 |
|
| 33.33 |
|
| ||||
| 5 |
|
| 11.11 |
|
| 33.33 |
|
| ||||
| 6 |
|
| 27.78 |
|
| 33.33 |
|
| ||||
| 7 |
|
| 22.22 |
|
| 33.33 |
|
| ||||
| 8 |
|
| 22.22 |
|
| 16.67 |
| <—Best k | ||||
| 9 |
|
| 22.22 |
|
| 16.67 |
|
| ||||
| 10 |
|
| 22.22 |
|
| 16.67 |
|
| ||||
| 11 |
|
| 16.67 |
|
| 33.33 |
|
| ||||
| 12 |
|
| 16.67 |
|
| 16.67 |
|
| ||||
| 13 |
|
| 11.11 |
|
| 33.33 |
|
| ||||
| 14 |
|
| 11.11 |
|
| 16.67 |
|
| ||||
| 15 |
|
| 5.56 |
|
| 33.33 |
|
| ||||
| 16 |
|
| 16.67 |
|
| 33.33 |
|
| ||||
| 17 |
|
| 11.11 |
|
| 33.33 |
|
| ||||
| 18 |
|
| 50.00 |
|
| 50.00 |
|
|
Consider the Riding Lawnmower data above. Consider a new household with $60,000 income and lot size 20,000 ft2. Using k = 3, would you classify this individual as an owner or non-owner?
A) Owner
B) Non-owner
C) Impossible to tell.
12) Riding Lawnmower Problem
Household number | Income ($ 000's) | Lot Size (000's Ft2) | Ownership of, riding mower | |||||||||
| 1 |
|
| 60 |
|
| 18.4 |
| Owner |
| ||
| 2 |
|
| 85.5 |
|
| 16.8 |
| Owner |
| ||
| 3 |
|
| 64.8 |
|
| 21.6 |
| Owner |
| ||
| 4 |
|
| 61.5 |
|
| 20.8 |
| Owner |
| ||
| 5 |
|
| 87 |
|
| 23.6 |
| Owner |
| ||
| 6 |
|
| 110.1 |
|
| 19.2 |
| Owner |
| ||
| 7 |
|
| 108 |
|
| 17.6 |
| Owner |
| ||
| 8 |
|
| 82.8 |
|
| 22.4 |
| Owner |
| ||
| 9 |
|
| 69 |
|
| 20 |
| Owner |
| ||
| 10 |
|
| 93 |
|
| 20.8 |
| Owner |
| ||
| 11 |
|
| 51 |
|
| 22 |
| Owner |
| ||
| 12 |
|
| 81 |
|
| 20 |
| Owner |
| ||
| 13 |
|
| 75 |
|
| 19.6 |
| Non-Owner |
| ||
| 14 |
|
| 52.8 |
|
| 20.8 |
| Non-Owner |
| ||
| 15 |
|
| 64.8 |
|
| 17.2 |
| Non-Owner |
| ||
| 16 |
|
| 43.2 |
|
| 20.4 |
| Non-Owner |
| ||
| 17 |
|
| 84 |
|
| 17.6 |
| Non-Owner |
| ||
| 18 |
|
| 49.2 |
|
| 17.6 |
| Non-Owner |
| ||
| 19 |
|
| 59.4 |
|
| 16 |
| Non-Owner |
| ||
| 20 |
|
| 66 |
|
| 18.4 |
| Non-Owner |
| ||
| 21 |
|
| 47.4 |
|
| 16.4 |
| Non-Owner |
| ||
| 22 |
|
| 33 |
|
| 18.8 |
| Non-Owner |
| ||
| 23 |
|
| 51 |
|
| 14 |
| Non-Owner |
| ||
| 24 |
|
| 63 |
|
| 14.8 |
| Non-Owner |
|
Validation error log for different k
Value of k | % Error Training | % Error Validation |
| ||||||||||
| 1 |
|
| 0.00 |
|
| 33.33 |
|
| ||||
| 2 |
|
| 16.67 |
|
| 33.33 |
|
| ||||
| 3 |
|
| 11.11 |
|
| 33.33 |
|
| ||||
| 4 |
|
| 22.22 |
|
| 33.33 |
|
| ||||
| 5 |
|
| 11.11 |
|
| 33.33 |
|
| ||||
| 6 |
|
| 27.78 |
|
| 33.33 |
|
| ||||
| 7 |
|
| 22.22 |
|
| 33.33 |
|
| ||||
| 8 |
|
| 22.22 |
|
| 16.67 |
| <—Best k | ||||
| 9 |
|
| 22.22 |
|
| 16.67 |
|
| ||||
| 10 |
|
| 22.22 |
|
| 16.67 |
|
| ||||
| 11 |
|
| 16.67 |
|
| 33.33 |
|
| ||||
| 12 |
|
| 16.67 |
|
| 16.67 |
|
| ||||
| 13 |
|
| 11.11 |
|
| 33.33 |
|
| ||||
| 14 |
|
| 11.11 |
|
| 16.67 |
|
| ||||
| 15 |
|
| 5.56 |
|
| 33.33 |
|
| ||||
| 16 |
|
| 16.67 |
|
| 33.33 |
|
| ||||
| 17 |
|
| 11.11 |
|
| 33.33 |
|
| ||||
| 18 |
|
| 50.00 |
|
| 50.00 |
|
|
Consider the Riding Lawnmower data above. Why would the model choose a higher value of k than k = 1?
A) The model will rarely choose higher values of k unless there is collinearity in the attributes.
B) The model only chooses higher values of k if the data set is large.
C) The choice of k is made by the researcher alone and not the software.
D) Higher values of k may provide smoothing that reduces the risk of overfitting due to noise in the training data.
13)
The diagram above represents which data mining technique?
A) kNN
B) Regression tree
C) Naїve Bayes
D) Logit
14)
The above diagram represents what data mining classification scheme?
A) kNN
B) Classification tree
C) Naїve Bayes
D) Logit
15)
The information above was provided for an e-mail that was classified as spam. What data mining algorithm was probably used to make the classification?
A) kNN
B) Regression tree
C) Naїve Bayes
D) Logit
16)
Which data mining algorithm (represented above) uses a quadratic classifier?
A) kNN
B) Regression tree
C) Naїve Bayes
D) Logit
17)
What data mining technique is represented in the diagram of a classification scheme above?
A) kNN
B) Classification tree
C) Naїve Bayes
D) Logit
18) In the k-Nearest Neighbor technique in data mining, the "k" refers to
A) the originator of the technique, Jonathan Knowlton.
B) the number of neighbors used.
C) the number of classes into which the variable may be divided.
D) the weight of the target.
E) None of the options are correct.
19)
The data mining technique represented above is probably
A) a k-nearest neighbor model.
B) a Naїve Bayes model.
C) a classification tree.
D) a logistic regression.
20)
In setting up this k-nearest neighbor model
A) the user is allowing XLMiner™ to select the optimal value of k.
B) the optimal k is set by the user at 10.
C) the data is normalized in order to take into account the categorical variables.
D) it is necessary to set an optimal value for k.
21) The diagram below depicts the probability that a person takes out a loan given their level of income. The function shown is
A) an ordinary least squares model (OLS).
B) a linear probability model (LPM).
C) the odds function.
D) a logit.
22) Consider the equation below.
p(/d) =
This equation is the basis of
A) the logit model.
B) the Naїve Bayes Model.
C) the k-nearest neighbor model.
D) classification tree models.
23) "Pruning" is used in what data mining model?
A) Naїve Bayes
B) Logit
C) K-Nearest Neighbor
D) Classification Trees
24) "Pruning" is used
A) to overcome correlation among the attributes.
B) only when the attributes are dichotomous.
C) to prevent the model from overfitting the data.
D) as a "data utility" in order to create a validation set.
25) "Entropy" measures are used in which data mining algorithm?
A) Logit
B) Classification Trees
C) Naїve Bayes
D) K-Nearest Neighbor
E) Neural Networks
26) "Information Gain" and "Entropy"
A) are used in Classification Trees to determine when to stop the algorithm.
B) are two components of Bayes Theorem.
C) are related ways of categorizing risk.
D) are unrelated.
27) If I choose to classify Insects as either Katydids or Grasshoppers by examining the distribution of the lengths of the antennas of a sample of the two insects (as shown below), this would be the beginning analysis of what data mining algorithm?
A) Naїve Bayes
B) Logit
C) Regression Tree
D) K-Nearest Neighbor
28) What data mining algorithm result is being depicted below?
A) K-Nearest Neighbor
B) Naїve Bayes
C) Decision Tree
D) Logit
E) Neural Net
29) Consider the calculations below:
p ( male ) = =
p ( female ) = =
These are examples of
A) prior probability calculations.
B) posterior probability calculations.
C) entropy calculations.
D) information gain calculations.
30) XLMiner : Naїve Bayes
Output Navigator |
|
| |
Inputs | Train. Score - Summary | Valid. Score - Summary | Test Score - Summary |
Elapsed Time | Train. Score - Detailed Rep. | Valid Score - Detailed Rep. | Test Score - Detailed Rep. |
Prior Class Pr | Training Lift Charts | Validation Lift Charts | Test Lift Chart |
According to relative occurrences in training data
Class | Prob. |
|
Alive | 0.314912945 | <—Success Class |
Dead | 0.685087055 |
|
Conditional probabilities
| Classes--> | ||||||||
| Alive | Dead | |||||||
Input Variables | Value | Prob | Value | Prob | |||||
Age | Adult |
| 0.9375 |
| Adult |
| 0.964640884 |
| |
Child |
| 0.0625 |
| Child |
| 0.035359116 |
| ||
Sex | Female |
| 0.46875 |
| Female |
| 0.082872928 |
| |
Male |
| 0.53125 |
| Male |
| 0.917127072 |
| ||
Class | Crew |
| 0.295673077 |
| Crew |
| 0.450828729 |
| |
First |
| 0.295673077 |
| First |
| 0.071823204 |
| ||
Second |
| 0.146634615 |
| Second |
| 0.10718232 |
| ||
Third |
| 0.262019231 |
| Third |
| 0.370165746 |
|
Examine the Naїve Bayes output that describes the Titanic survival model.
What is the probability of survival if you are a crew member, male, and adult?
A) 0.613324957
B) 0.001352846
C) 0.442673445
D) 0.145090831
31) XLMiner : Naїve Bayes
Output Navigator |
|
| |
Inputs | Train. Score - Summary | Valid. Score - Summary | Test Score - Summary |
Elapsed Time | Train. Score - Detailed Rep. | Valid Score - Detailed Rep. | Test Score - Detailed Rep. |
Prior Class Pr | Training Lift Charts | Validation Lift Charts | Test Lift Chart |
According to relative occurrences in training data
Class | Prob. |
|
Alive | 0.314912945 | <—Success Class |
Dead | 0.685087055 |
|
Conditional probabilities
| Classes--> | ||||||||||
| Alive | Dead | |||||||||
Input variables | Value | Prob | Value | Prob | |||||||
Age | Adult |
| 0.9375 |
| Adult |
| 0.964640884 |
| |||
Child |
| 0.0625 |
| Child |
| 0.035359116 |
| ||||
Sex | Female |
| 0.46875 |
| Female |
| 0.082872928 |
| |||
Male |
| 0.53125 |
| Male |
| 0.917127072 |
| ||||
Class | Crew |
| 0.295673077 |
| Crew |
| 0.450828729 |
| |||
First |
| 0.295673077 |
| First |
| 0.071823204 |
| ||||
Second |
| 0.146634615 |
| Second |
| 0.10718232 |
| ||||
Third |
| 0.262019231 |
| Third |
| 0.370165746 |
|
Examine the Naїve Bayes output that describes the Titanic survival model.
What is the conditional probability of being dead if you are a crew member, male, and adult?
A) 0.3988
B) 0.917127072
C) 0.450828729
D) 0.685087055
32) The "logit" is
A) a linear function with a Z distribution.
B) an attribute in a logistics regression.
C) the natural log of an odds ratio.
D) the conditional probability that the success rate is greater than the cutoff value.
33)
The diagram above represents
A) the locus of all points that could cause the success rate to be above 50 percent.
B) a logistics regression output from XLMiner.
C) the Naїve Bayes classifier as being between zero and one.
D) a graph of the possible values of the logit in a logistics regression.
34) In logistics regression data mining, P/(1-P) represents
A) the logit.
B) the log likelihood of success.
C) the odds of success.
D) the cutoff value.
35)
The regression line shown above was estimated using an ordinary least squares regression technique. This regression is inappropriate to use on this data because
A) the attribute measured here is dichotomous.
B) there is no apparent relationship between hours of study and outcome.
C) there is only a single attribute in the model.
D) the target variable is categorical.
36) Among the advantages to using the Naїve Bayes model is
A) it is quite sensitive to irrelevant features.
B) it is fast at classification.
C) it can be used in situations in which the target variable is continuous.
D) All of the options are correct.
37) Naїve Bayes is called "Naїve" because
A) very few attributes are needed to obtain accurate classifications.
B) the model assumes that only continuous variables can be used as attributes.
C) it tends to be used only as a "baseline" model in order to measure the effectiveness of other data mining techniques.
D) the attributes are assumed to be independent of one another.
38) In a Naїve Bayes model it is necessary
A) that all attributes be categorical.
B) to partition the data into three parts (training, validation, and scoring).
C) to set cutoff values to less than 0.75.
D) to have a continuous target variable.
39) Naїve Bayes models
A) use a linear classifier.
B) use a nonlinear classifier.
C) use a waveform classifier.
D) use a logit as a classifier.
40) Which classification technique that we covered assumed that the attributes had independent distributions?
A) k-Nearest Neighbor
B) Classification trees
C) Naїve Bayes
D) Logistics Regression
41) Our confidence that X is an apple given that we have seen X is red and round
A) is a coincident probability.
B) could lead us to misclassify similar objects.
C) is a prior probability.
D) is a posterior probability.
42)
• We want to classify an insect we have found. Its antennae are 3 units long. How can we classify it?
• We can just ask ourselves, given the distributions of antennae lengths we have seen, is it more probable that our insect is a Grasshopper or a Katydid.
• There is a formal way to discuss the most probable classification...
p(cj|d) = probability of class cj, given that we have observed d
What data mining technique is demonstrated here?
A) k-Nearest Neighbor
B) Classification Tree
C) Naïve Bayes
D) Logistic Regression
43)
#Decision Nodes | 17 |
| #Terminal Nodes | 18 |
Level | NodeID | ParentID | SplitVar | SplitValue | Cases | Left Child | Right Child | Class | Node Type | ||||||||||||||||||
0 |
| 0 |
| N/A |
| Income |
| 100.5 |
| 2000 |
| 1 |
| 2 |
| 0 |
| Decision | |||||||||
1 |
| 1 |
| 0 |
| CCAvg |
| 2.95 |
| 1507 |
| 3 |
| 4 |
| 0 |
| Decision | |||||||||
1 |
| 2 |
| 0 |
| Education |
| 1.5 |
| 493 |
| 5 |
| 6 |
| 0 |
| Decision | |||||||||
2 |
| 3 |
| 1 |
| N/A |
| N/A |
| 1422 |
| N/A |
| N/A |
| 0 |
| Terminal | |||||||||
2 |
| 4 |
| 1 |
| Income |
| 82.5 |
| 85 |
| 7 |
| 8 |
| 0 |
| Decision | |||||||||
2 |
| 5 |
| 2 |
| Family |
| 2.5 |
| 316 |
| 9 |
| 10 |
| 0 |
| Decision | |||||||||
2 |
| 6 |
| 2 |
| Income |
| 116.5 |
| 177 |
| 11 |
| 12 |
| 1 |
| Decision | |||||||||
3 |
| 7 |
| 4 |
| Age |
| 30.5 |
| 45 |
| 13 |
| 14 |
| 0 |
| Decision | |||||||||
3 |
| 8 |
| 4 |
| CCAvg |
| 4.35 |
| 40 |
| 15 |
| 16 |
| 0 |
| Decision | |||||||||
3 |
| 9 |
| 5 |
| N/A |
| N/A |
| 276 |
| N/A |
| N/A |
| 0 |
| Terminal | |||||||||
3 |
| 10 |
| 5 |
| Income |
| 116 |
| 40 |
| 17 |
| 18 |
| 1 |
| Decision | |||||||||
3 |
| 11 |
| 6 |
| CCAvg |
| 2.45 |
| 56 |
| 19 |
| 20 |
| 0 |
| Decision | |||||||||
3 |
| 12 |
| 6 |
| N/A |
| N/A |
| 121 |
| N/A |
| N/A |
| 1 |
| Terminal | |||||||||
4 |
| 13 |
| 7 |
| N/A |
| N/A |
| 1 |
| N/A |
| N/A |
| 1 |
| Terminal | |||||||||
4 |
| 14 |
| 7 |
| N/A |
| N/A |
| 44 |
| N/A |
| N/A |
| 0 |
| Terminal |
The table above is part of the output from a data mining algorithm seeking to predict whether an individual will take out a personal loan given a set of attributes. What data mining technique is probably being used here?
A) k-Nearest Neighbor
B) Classification Tree
C) Naïve Bayes
D) Regression Tree
44)
#Decision Nodes | % Error |
|
|
|
40 | 2.35 |
|
|
|
39 | 2.35 |
|
|
|
38 | 2.20 |
|
|
|
37 | 2.20 |
|
|
|
36 | 2.05 |
|
|
|
35 | 1.90 |
|
|
|
34 | 1.90 |
|
|
|
33 | 1.90 |
|
|
|
32 | 1.90 |
|
|
|
31 | 1.90 |
|
|
|
30 | 1.90 |
|
|
|
29 | 1.90 |
|
|
|
28 | 1.90 |
|
|
|
27 | 1.90 |
|
|
|
26 | 1.75 |
|
|
|
25 | 1.75 |
|
|
|
24 | 1.95 |
|
|
|
23 | 1.95 |
|
|
|
22 | 1.80 |
|
|
|
21 | 1.65 |
|
|
|
20 | 1.70 |
|
|
|
19 | 1.65 |
|
|
|
18 | 1.65 |
|
|
|
17 | 1.65 | <--Min. Err. Tree | Std. Err. | 0.002848486 |
16 | 1.80 |
|
|
|
The above is a prune log for a data mining technique. What technique would have this type of output?
A) k-Nearest Neighbor
B) Classification Tree
C) Naïve Bayes
D) Logistic Regression
45)
Information Gain | |||
(Reduction in Entropy) | |||
Hair Length | 0.0911 |
| |
Weight | 0.5900 |
| |
Age | 0.0183 |
|
Which attribute above provides the greatest reduction in entropy?
A) Hair Length
B) Weight
C) Age
D) The above are not reasonable entropy measures; rather they show information gain.
46)
Information Gain | |||
(Reduction in Entropy) | |||
Hair Length | 0.0911 |
| |
Weight | 0.5900 |
| |
Age | 0.0183 |
|
When a collection of objects is completely uniform,
A) entropy is at a maximum.
B) entropy is at a minimum.
C) entropy would be about 0.5.
D) Uniformity has nothing to do with entropy.
47) When estimating a k-Nearest Neighbor model we use a subset of the total data we have available
A) called the training data.
B) because "in-sample" test statistics are not available.
C) because most software packages are incapable of handling the entire data set.
D) called the verification data set.
E) None of the options are correct.
48)
Securities Account | CD Account | Online | CreditCard |
| Binned_Zip Code | |||||||||||
1 |
| 0 |
| 0 |
| 0 |
|
| 4 |
| ||||||
1 |
| 0 |
| 0 |
| 0 |
|
| 2 |
| ||||||
0 |
| 0 |
| 0 |
| 0 |
|
| 16 |
| ||||||
0 |
| 0 |
| 0 |
| 0 |
|
| 13 |
| ||||||
0 |
| 0 |
| 0 |
| 1 |
|
| 4 |
| ||||||
0 |
| 0 |
| 1 |
| 0 |
|
| 7 |
| ||||||
0 |
| 0 |
| 1 |
| 0 |
|
| 5 |
| ||||||
0 |
| 0 |
| 0 |
| 1 |
|
| 11 |
| ||||||
0 |
| 0 |
| 1 |
| 0 |
|
| 2 |
| ||||||
0 |
| 0 |
| 0 |
| 0 |
|
| 10 |
|
The data above is a portion of the Universal Bank data we used in class. The last column of data is "binned." This refers to
A) the use of a binary (or "binned") variable.
B) the fact that this variable has been "normalized."
C) the creation of "categories" of zip codes that cover a number of individual zip codes.
D) None of the options are correct.
49) The k-Nearest Neighbor technique
A) is a classification technique.
B) is a regression technique.
C) is an association technique.
D) is a verification technique.
50) The k-Nearest Neighbor technique
A) may only use a maximum of two features.
B) selects using only the nearest neighbor.
C) uses "k" attributes.
D) None of the options are correct.
51) The Logit is
A) an instruction to record the data.
B) the cube root of the sample size.
C) the natural logarithm of the odds ratio.
D) a logarithm of a digit.
52) Which of the following is true?
A) Logistic regression is analogous to multiple regression.
B) Logistic regression is estimated like a linear least squares regression.
C) Logistic regression is just another name for multiple regression.
D) Logistic regression can only be used with a continuous target.
53) In logistic regression, the target
A) is expressed in bits.
B) is a score.
C) consists of two categories.
D) is like the median and is split into two equal halves.
54) A model in logistic regression is
A) a miniature version of the analysis based on a small number of records.
B) a set of attributes which classify cases of the target well.
C) the most common score.
D) a worked example.
55) An attribute used in logistics regression consists of three categories—black, white, and red. Which of the following would not be a dummy variable in the analysis?
A) Red versus others
B) Black versus others
C) White versus others
D) Any of the previous three
56) Multinomial logistic regression is just a special case of
A) kNN Models.
B) multiple regression.
C) multiple comparisons tests such as the Duncan test.
D) None of the options are correct.
57) Which of the following statements is false concerning the linear probability model (LPM)?
A) There is nothing in the model to ensure that the estimated probabilities lie between zero and one.
B) Even if the probabilities are truncated at zero and one, there will probably be many observations for which the probability is either exactly zero or exactly one.
C) The error terms will be heteroscedastic and not normally distributed.
D) The model is much harder to estimate than a standard regression model with a continuous target.
58) The process of partitioning the ranges of quantitative attributes into intervals, is called _________.
A) splitting
B) grouping
C) binning
D) None of the options are correct.
59) In order for a linear classification model to classify N classes,
A) at least N attributes must be included in the data set.
B) the data set must contain about an even number of records of each classification.
C) at least N records must be present.
D) only N − 1 lines need to be fitted.
60) Linear classifiers
A) may only be two-dimensional (e.g., two attributes).
B) may be dimensional of any size.
C) are limited to relatively small data sets (e.g., less than about 100 records).
D) are among the most complicated of data mining classification algorithms.
61)
In the accompanying diagram of an insect we wish to classify, "Spiracle Diameter" is
A) a dependent variable.
B) a target.
C) a record.
D) an attribute.
62) In data mining terminology, "scoring" refers to
A) using the algorithm to predict.
B) the estimation of the parameters of the algorithm.
C) evaluating the appropriateness and accuracy of the algorithm.
D) an examination of the misclassification rate.
63) Classification Models in data mining include all of the following except
A) Neural nets.
B) Naïve Bayes.
C) Logit.
D) K-Nearest Neighbor.
E) Association rule mining.
64) In the k-Nearest Neighbor algorithm the attributes are translated to _________.
A) values
B) points in multidimensional space
C) strings of characters
D) nodes
65) In a CART Model classification rules are extracted from _________.
A) the root node
B) the decision tree
C) the siblings
D) the leaves
66) Which of the following is a valid production rule for the decision tree below?
A) IF Business Appointment = No and Temp above 70 = No THEN Decision = wear slacks
B) IF Business Appointment = Yes and Temp above 70 = Yes THEN Decision = wear shorts
C) IF Temp above 70 = No THEN Decision = wear shorts
D) IF Business Appointment = No and Temp above 70 = No THEN Decision = wear jeans
67) A nearest neighbor approach is best used _________.
A) with large-sized data sets
B) when irrelevant attributes have been removed from the data
C) when a generalized model of the data is desirable
D) when an explanation of what has been found is of primary importance
68) Classification problems are distinguished from estimation problems in that
A) classification problems require the output attribute to be numeric.
B) classification problems require the output attribute to be categorical.
C) classification problems do not allow an output attribute.
D) classification problems are designed to predict future outcome.
69) Logistic regression is a ________ regression technique that is used to model data having a ________ outcome.
A) linear; numeric
B) linear; binary
C) nonlinear; numeric
D) nonlinear; binary
70) This technique associates a conditional probability value with each data instance.
A) Linear regression
B) Logistic regression
C) Simple regression
D) Multiple linear regression
71) This supervised learning technique can process both numeric and categorical input attributes.
A) Linear regression
B) Bayes classifier
C) Logistic regression
D) Backpropagation learning
72) With Bayes classifier, missing data items are
A) treated as equal compares.
B) treated as unequal compares.
C) replaced with a default value.
D) ignored.
73) The table below contains counts and ratios for a set of data instances to be used for supervised Bayesian learning. The output attribute is sex with possible values male and female. Consider an individual who has said no to the life insurance promotion, yes to the magazine promotion, yes to the watch promotion and has credit card insurance. Use the values in the table together with Bayes classifier to determine which of a, b, c or d represents the probability that this individual is male.
| Magazine Promotion | Watch Promotion | Life Insurance Promotion | Credit Card Insurance | ||||||||
| male | female | male | female | male | female | male | female | ||||
Yes | 4 | 3 | 2 | 2 | 2 | 3 | 2 | 1 | ||||
No | 2 | 1 | 4 | 2 | 4 | 1 | 4 | 3 | ||||
Yes | 4/6 | 3/4 | 2/6 | 2/4 | 2/6 | 3/4 | 2/6 | 1/4 | ||||
No | 2/6 | 1/4 | 4/6 | 2/4 | 4/6 | 1/4 | 4/6 | 3/4 |
A) (4/6) (2/6) (2/6) (2/6) (6/10)/P(E)
B) (4/6) (2/6) (3/4) (2/6) (3/4)/P(E)
C) (4/6) (4/6) (2/6) (2/6) (6/10)/P(E)
D) (2/6) (4/6) (4/6) (2/6) (4/10)/P(E)
74) Method in which previously calculated probabilities are revised with new probabilities is classified as
A) updating theorem.
B) revised theorem.
C) Bayes theorem.
D) dependency theorem.
75) In the k-Nearest Neighbors method, when the value of k is set to 1,
A) the classification or prediction of a new observation is based solely on the single most similar observation from the training set.
B) the new observation's class is naïvely assigned to the most common class in the training set.
C) the new observation's prediction is used to estimate the anticipated error rate on future data over the entire training set.
D) the classification or prediction of a new observation is subject to the smallest possible classification error.
76) This is Officer Drew.
Is Officer Drew a male or Female?
Luckily, we have a small database with names and sex.
We can use it to apply Bayes Theorem.
Database
Name | Sex |
Drew | Male |
Claudia | Female |
Drew | Female |
Drew | Female |
Alberto | Male |
Karin | Female |
Nina | Female |
Sergio | Male |
p(/d) =
Consider the "Officer Drew" situation we discussed in class. Bayes Theorem is also shown in the diagram. Calculate the probability that the individual shown is "Male" given that the person's name is "Drew." Use the database information as a basis for your calculation.
A) ~ 41%
B) ~ 62%
C) ~ 25%
D) ~ 33%
77) This is Officer Drew.
Is Officer Drew a male or Female?
Luckily, we have a small database with names and sex.
We can use it to apply Bayes Theorem.
Database
Name | Sex |
Drew | Male |
Claudia | Female |
Drew | Female |
Drew | Female |
Alberto | Male |
Karin | Female |
Nina | Female |
Sergio | Male |
p(/d) =
Consider the "Officer Drew" situation we discussed in class. Bayes Theorem is also shown in the diagram. Calculate the probability that the individual shown is "Female" given that the person's name is "Drew." Use the database information as a basis for your calculation.
A) ~ 66%
B) ~ 59%
C) ~ 75%
D) ~ 38%
78) Consider the following situation as a Bayesian; the situation is very similar to the "Kahneman's Cab" situation we discussed in class. In a given population 1% of the people might have cancer. Tests can be taken to identify cancer. The following are the details about the test:
1) Test will be positive 90% of the time if someone has cancer.
2) Test will be negative 90% of the time if someone does not have cancer.
If you take the cancer test and it comes out as a positive, "what is the probability you actually have cancer?"
A) ~ 90%
B) ~ 8%
C) ~ 81%
D) ~ 18%
79) Assume you are asked to create a model that predicts the number of new babies born per period according to the size of the stork population. In this case, the number of babies is
A) a target.
B) a feature.
C) an outcome.
D) an observation.
80) A nearest neighbor approach is best used
A) with large-sized data sets.
B) when irrelevant attributes have been removed from the data.
C) when a generalized model of the data is desirable.
D) when an explanation of what has been found is of primary importance.
81) Classification problems are distinguished from estimation problems in that
A) classification problems require the output attribute to be numeric.
B) classification problems require the output attribute to be categorical.
C) classification problems do not allow an output attribute.
D) classification problems are designed to predict future outcome.
82) Which statement is true about the decision tree attribute selection process?
A) A categorical attribute may appear in a tree node several times but a numeric attribute may appear at most once.
B) A numeric attribute may appear in several tree nodes but a categorical attribute may appear at most once.
C) Both numeric and categorical attributes may appear in several tree nodes.
D) Numeric and categorical attributes may appear in at most one tree node.
83) Consider the following situation as a Bayesian; the situation is very similar to the "Kahneman's Cab" situation we discussed in class. In a given population, 1% of the people might have cancer. Tests can be taken to identify cancer. The following are the details about the test:
1) Test will be positive 80% of the time if someone has cancer (and therefore
20% miss it).
2) 9.6% of tests detect cancer when it's not there (and therefore 90.4% correctly
return a negative result).
Suppose you get a positive test. What is the actual probability you have cancer?
A) ~80%
B) ~1%
C) ~99%
D) ~8%
84)
The equation shown above
A) is called a "normal" equation.
B) calculates a test statistic for autocorrelation.
C) measures entropy for classification models.
D) is useful for transforming nonstationary data.
85) "Tree Pruning" seeks to
A) reduce the possibility of misclassification error.
B) reduce the AUC.
C) identify and remove branches that reflect noise or outliers.
D) reduce the number of categories being classified.
86)
Use the tree created in XLMiner to classify the unknown Iris whose attributes are listed. Recall that there are only three classes of Iris to choose in this system: Versicolor, Virginica, or Setosa.
A) Versicolor
B) Virginica
C) Setosa
D) This tree is incapable of classifying the listed Iris.
87)
Some of the nodes in the Iris Classification Tree are called "Terminal Nodes." A Terminal Node
A) contains only a single instance of the items being classified.
B) contains classes that are nonhomogeneous.
C) contains a roughly even number of each class of item.
D) contains only a single class of the items being classified.
88) The "right-sized" classification tree
A) is rarely a pruned tree.
B) is sometimes determined by using v-fold cross validation.
C) is the one with the lowest misclassification rate in the training partition.
D) is never more than 7 levels deep.
89) Which of the following methods do we use to best fit the data in Logistic Regression?
A) Least Square Error
B) Maximum Likelihood
C) Jaccard Distance
D) Both "Least Square Error" and "Maximum Likelihood" are correct.
90) Which of the following diagnostic metrics cannot be applied in case of logistic regression output to compare with target?
A) AUC-ROC
B) Lift Chart
C) Cumulative Gains Chart
D) Mean Squared Error
91) Suppose you have been given a fair coin and you want to find out the odds of getting heads. Which of the following option is true for such a case?
A) Odds will be 0.
B) Odds will be 0.5.
C) Odds will be 1.
D) None of the options are correct.
92) Suppose you applied a Logistic Regression model on a given data and got a training accuracy X and testing accuracy Y. Now, you want to add a few new features in the same data. Select the option(s) which is/are correct in such a case.
A) Training accuracy increases.
B) Training accuracy increases or remains the same.
C) Testing accuracy decreases.
D) Testing accuracy increases or remains the same.
E) Both "Training accuracy increases" and "Training accuracy increases or remains the same" are correct.
93) The below figure shows three ROC curves for three different logistic regression models. Which of the following ROC curves signifies the best result?
A) The top (highest) ROC curve
B) The middle ROC curve
C) The bottom (lowest) ROC curve
D) Unable to determine an answer from information given.
94) Below are the 8 actual values of target variable in the training partition.
[0, 0, 0, 1, 1, 1, 1, 1]
What is the entropy of the target variable?
A) −5/8 log2(5/8) − 3/8 log2(3/8))
B) 5/8 log2(5/8) − 3/8 log2(3/8)
C) −3/8 log2(5/8) + 5/8 log2(3/8)
D) −5/8 log2(3/8) + 3/8 log2(5/8)
95) What is a "decision tree"?
A) It is a flowchart like tree structure, where each internal node denotes a test on an attribute and each branch represents an outcome of the test.
B) It is a clustering algorithm based upon reducing entropy from one branch to the succeeding branches.
C) Decision trees are part of the pre-processing necessary when beginning to engage in text mining; it is a form of dimension reduction.
D) None of the options are correct.
96) _________ is an example of case based-learning.
A) A decision tree
B) A k-Nearest Neighbor algorithm
C) A neural network
D) None of the options are correct.
97) In the kNN algorithm the input is translated to _________.
A) values
B) points in multidimensional space
C) strings of characters
D) nodes
98) Claude Shannon's name is most closely associated with
A) information entropy.
B) Bayesian algorithms.
C) decision trees.
D) natural language processing.