Ch8 Predictive Analytics Helping To Make Sense Of Test Bank - Forecasting with Forecast X 7e Complete Test Bank by Barry Keating. DOCX document preview.
Forecasting and Predictive Analytics with Forecast X, 7e (Keating)
Chapter 8 Predictive Analytics: Helping to Make Sense of Big Data
1) Decile-Wise
A data mining routine has been applied to a transaction data set and has classified 88 records as fraudulent (30 correctly so) and 952 as nonfraudulent (920 correctly so).
The decile-wise lift chart for a transaction data model:
Consider the decile-wise lift chart above. Interpret the meaning of the first and second bars from the left.
A) The first variable in the model is more predictive than the second variable.
B) These bars are never interpreted for the validation data set; they are only interpreted for the training dataset.
C) Since only two bars rise above unity little explanatory power is exhibited by the model.
D) The first two bars show that this model outperforms a naïve model.
2) Decile-Wise
A data mining routine has been applied to a transaction data set and has classified 88 records as fraudulent (30 correctly so) and 952 as nonfraudulent (920 correctly so).
The decile-wise lift chart for a transaction data model:
Consider the decile-wise lift chart above. An analyst comments that you could improve the accuracy of the model by classifying everything as nonfraudulent. What will the overall error rate (also called the misclassification rate) be if you follow her advice?
A) The overall error rate will increase.
B) The overall error rate will decrease.
C) The change in the overall error rate cannot be determined.
D) The overall error rate will arbitrarily change.
3) Decile-Wise
A data mining routine has been applied to a transaction data set and has classified 88 records as fraudulent (30 correctly so) and 952 as nonfraudulent (920 correctly so).
The decile-wise lift chart for a transaction data model:
Which of the following situations represents the confusion matrix for the transactions data mentioned above?
A
Classification Confusion Matrix | ||||||
| Predicted Class | |||||
Actual Class | 1 | 0 | ||||
1 |
| 58 |
|
| 920 |
|
0 |
| 30 |
|
| 32 |
|
B
Classification Confusion Matrix | ||||||
| Predicted Class | |||||
Actual Class | 1 | 0 | ||||
1 |
| 32 |
|
| 30 |
|
0 |
| 58 |
|
| 920 |
|
C
Classification Confusion Matrix | ||||||
| Predicted Class | |||||
Actual Class | 1 | 0 | ||||
1 |
| 30 |
|
| 32 |
|
0 |
| 58 |
|
| 920 |
|
D
Classification Confusion Matrix | ||||||
| Predicted Class | |||||
Actual Class | 1 | 0 | ||||
1 |
| 920 |
|
| 58 |
|
0 |
| 30 |
|
| 32 |
|
A) A
B) B
C) C
D) D
4) What is the misclassification error rate for the following XLMiner confusion matrix?
Confusion Matrix | ||||||||
Actual\Predicted |
| 0 | 1 | |||||
0 |
|
| 970 |
|
| 20 |
| |
1 |
|
| 2 |
|
| 8 |
|
A) 2.2%
B) 0.82%
C) 10%
D) 0.21%
E) Impossible to determine from information given.
5) Consider the Toyota Corolla data below:
Id | Model | Price | Age_08_04 | Mfg_Month | Mfg_Year |
1 | RA 2/3-Doors | 13500 | 23 | 10 | 2002 |
2 | RA 2/3-Doors | 13750 | 23 | 10 | 2002 |
3 | RA 2/3-Doors | 13950 | 24 | 9 | 2002 |
4 | RA 2/3-Doors | 14950 | 26 | 7 | 2002 |
5 | OL 2/3-Doors | 13750 | 30 | 3 | 2002 |
6 | OL 2/3-Doors | 12950 | 32 | 1 | 2002 |
7 | RA 2/3-Doors | 16900 | 27 | 6 | 2002 |
Id | KM | Fuel_Type | HP | Met_Color | Color_Black |
1 | 46986 | Diesel | 90 | 1 | 0 |
2 | 72937 | Diesel | 90 | 1 | 0 |
3 | 41711 | Diesel | 90 | 1 | 0 |
4 | 48000 | Diesel | 90 | 0 | 1 |
5 | 38500 | Diesel | 90 | 0 | 1 |
6 | 61000 | Diesel | 90 | 0 | 0 |
7 | 94612 | Diesel | 90 | 1 | 0 |
Which variable is a dummy variable?
A) Fuel_Type
B) Color_Black
C) KM
D) HP
E) Both Fuel_Type and Color_Black qualify as dummy variables.
6) Consider the Toyota Corolla data below:
Id | Model | Price | Age_08_04 | Mfg_Month | Mfg_Year |
1 | RA 2/3-Doors | 13500 | 23 | 10 | 2002 |
2 | RA 2/3-Doors | 13750 | 23 | 10 | 2002 |
3 | RA 2/3-Doors | 13950 | 24 | 9 | 2002 |
4 | RA 2/3-Doors | 14950 | 26 | 7 | 2002 |
5 | OL 2/3-Doors | 13750 | 30 | 3 | 2002 |
6 | OL 2/3-Doors | 12950 | 32 | 1 | 2002 |
7 | RA 2/3-Doors | 16900 | 27 | 6 | 2002 |
Id | KM | Fuel_Type | HP | Met_Color | Color_Black |
1 | 46986 | Diesel | 90 | 1 | 0 |
2 | 72937 | Diesel | 90 | 1 | 0 |
3 | 41711 | Diesel | 90 | 1 | 0 |
4 | 48000 | Diesel | 90 | 0 | 1 |
5 | 38500 | Diesel | 90 | 0 | 1 |
6 | 61000 | Diesel | 90 | 0 | 0 |
7 | 94612 | Diesel | 90 | 1 | 0 |
Which of the variables below (from the Toyota Corolla dataset) is a categorical variable?
A) Fuel_Type
B) Color_Black
C) KM
D) HP
7) Consider the following confusion matrix.
| Predict class 1 | Predict class 0 | ||||
Actual 1 |
| 8 |
|
| 2 |
|
Actual 0 |
| 20 |
|
| 970 |
|
How much better did this data mining technique do as compared to a naïve model?
A) No better than a naïve model.
B) 1.2% better than a naïve model.
C) 5.6% better than a naïve model.
D) 7.8% better than a naïve model.
E) 10.1% better than a naïve model.
8) "Overfitting" refers to
A) estimating a model that explains the training set data points perfectly and leaves little error but that is unlikely to be accurate in prediction.
B) using too many attributes or classifiers in a model.
C) the process used to test data mining models for accuracy.
D) the process of estimating or scoring of new data.
9) A "training data set" is
A) used to compare models and pick the best one.
B) used to build various models of interest.
C) used to assess the performance of the normalization procedure.
D) None of the options are correct.
10) A "validation data set" is
A) used to compare models and pick the best one.
B) used to build various models of interest.
C) used to assess the performance of the normalization procedure.
D) None of the options are correct.
11)
Classification Confusion Matrix | ||||||
| Predicted Class | |||||
Actual Class | owner | non-owner | ||||
owner |
| 10 |
|
| 0 |
|
non-owner |
| 0 |
|
| 9 |
|
The misclassification rate in the confusion matrix above is
A) 0 percent.
B) 10 percent.
C) 9 percent.
D) 19 percent.
E) None of the options are correct.
12)
Data |
|
| ||||||
Data source | Data!$A$5:$N$2504 |
|
| |||||
Selected variables | ID | Age | Experience | Income | Zip Code | Family | CCAvg | Education |
Partitioning Method | Randomly chosen | |||||||
Random Seed | 12345 | |||||||
# training rows | 1250 | |||||||
# Validation rows | 750 | |||||||
# test rows | 500 |
| Selected Variables | ||||||||||||||||||||||||||||||||||
Row Id | ID | Age | Experience | Income | Zip Code | Family | CCAvg | Educa-tion | |||||||||||||||||||||||||||
| 1 |
|
| 1 |
|
| 25 |
|
| 1 |
|
| 49 |
|
| 91107 |
|
| 4 |
|
| 1.60 |
|
| 1 |
| |||||||||
| 4 |
|
| 4 |
|
| 35 |
|
| 9 |
|
| 100 |
|
| 94112 |
|
| 1 |
|
| 2.70 |
|
| 2 |
| |||||||||
| 5 |
|
| 5 |
|
| 35 |
|
| 8 |
|
| 45 |
|
| 91330 |
|
| 4 |
|
| 1.00 |
|
| 2 |
| |||||||||
| 6 |
|
| 6 |
|
| 37 |
|
| 13 |
|
| 29 |
|
| 92121 |
|
| 4 |
|
| 0.40 |
|
| 2 |
| |||||||||
| 9 |
|
| 9 |
|
| 35 |
|
| 10 |
|
| 81 |
|
| 90089 |
|
| 3 |
|
| 0.60 |
|
| 2 |
| |||||||||
| 10 |
|
| 10 |
|
| 34 |
|
| 9 |
|
| 180 |
|
| 93023 |
|
| 1 |
|
| 8.90 |
|
| 3 |
| |||||||||
| 12 |
|
| 12 |
|
| 29 |
|
| 5 |
|
| 45 |
|
| 90277 |
|
| 3 |
|
| 0.10 |
|
| 2 |
| |||||||||
| 17 |
|
| 17 |
|
| 38 |
|
| 14 |
|
| 130 |
|
| 95010 |
|
| 4 |
|
| 4.70 |
|
| 3 |
| |||||||||
| 18 |
|
| 18 |
|
| 42 |
|
| 18 |
|
| 81 |
|
| 94305 |
|
| 4 |
|
| 2.40 |
|
| 1 |
| |||||||||
| 21 |
|
| 21 |
|
| 56 |
|
| 31 |
|
| 25 |
|
| 94015 |
|
| 4 |
|
| 0.90 |
|
| 2 |
| |||||||||
| 26 |
|
| 26 |
|
| 43 |
|
| 19 |
|
| 29 |
|
| 94305 |
|
| 3 |
|
| 0.50 |
|
| 1 |
|
The Universal Bank data represented above has been partitioned with what percentages?
A) 50%, 30%, 20% in training, validation, and test sets
B) 60%, 40% in training and validation sets
C) 60%, 20%, 20% in training, validation, and test sets
D) 50%, 20%, 30% in training, validation, and test sets
E) None of the options are correct.
13) In data mining the model should be applied to a data set that was not used in the estimation process in order to find out the accuracy on unseen data; that "unseen" data set is called
A) the training data set.
B) the validation data set.
C) the test data set.
D) the holdout data set.
E) None of the options are correct.
14)
The lift chart above shows that the data mining classification model
A) is working well in classifying unseen data.
B) is working well in classifying training data.
C) is working quite poorly.
D) is doing no better at classifying than a naïve model.
15) With most data mining techniques we "partition" the data
A) into "success" and "failure" results in order to create a target that is a dummy variable.
B) only when we require a confusion matrix to be created.
C) after estimating the appropriate technique.
D) in order to judge how our model will do when we apply it to new data.
16) Consider the following Lift Chart. Cumulative percentage of hits is the Y-axis variable. Percent of the entire list is the X-axis variable.
What is the "Lift" at 5%?
A) Exactly 4
B) About 5
C) Exactly 20
D) About 25
E) Unable to determine from information given.
17) Consider the printout below:
Validation Data scoring - Summary Report (for k = 7)
Cut off Prob. Val. for Success (Updatable): 0.5
Classification Confusion Matrix | |||||||
| Predicted Class | ||||||
Actual Class | owner | non-owner | |||||
owner |
| 4 |
|
| 0 |
| |
non-owner |
| 3 |
|
| 3 |
|
Error Report | ||||||||||
Class | #Cases | #Errors | %Error | |||||||
owner |
| 4 |
|
| 0 |
|
| 0.00 |
| |
non-owner |
| 6 |
|
| 3 |
|
| 50.00 |
| |
Overall |
| 10 |
|
| 3 |
|
| 30.00 |
|
What is the "Misclassification Rate"?
A) 0
B) 3
C) 50
D) 30
E) It is not shown in this printout.
18)
The above table is a decile-wise lift chart. The first bar on the left indicates
A) that our attribute or attributes did little to explain predicted success in this model.
B) that the lift will not vary with the number of cases we consider.
C) that taking the 10% of the records that are ranked by the model as the most probable successes yields about as much as a naïve model.
D) that taking the 10% of the records that are ranked by the model as the most probable successes yields twice as many successes as would a random selection of 10% of the records.
19) Suppose that a data mining routine has an adjustable cutoff (threshold) mechanism by which you can alter the proportion of records classified as owner. Three cases are described below.
Describe how moving the cutoff up or down from a starting point of 0.5 affects the misclassification error rate.
Cut off Prob. Val. for Success (Updatable): 0.5
Classification Confusion Matrix | |||||||
| Predicted Class | ||||||
Actual Class | owner | non-owner | |||||
owner |
| 11 |
|
| 1 |
| |
non-owner |
| 2 |
|
| 10 |
|
Cut off Prob. Val. for Success (Updatable): 0.25
Classification Confusion Matrix | |||||||
| Predicted Class | ||||||
Actual Class | owner | non-owner | |||||
owner |
| 11 |
|
| 1 |
| |
non-owner |
| 4 |
|
| 8 |
|
Cut off Prob. Val. for Success (Updatable): 0.75
Classification Confusion Matrix | |||||||
| Predicted Class | ||||||
Actual Class | owner | non-owner | |||||
owner |
| 7 |
|
| 5 |
| |
non-owner |
| 1 |
|
| 11 |
|
A) The misclassification error rate dropped as the threshold dropped.
B) The misclassification error rate dropped as the threshold increased.
C) The misclassification error rate remained unchanged as the threshold changed.
D) The misclassification error rate changed when the threshold either increased or decreased.
20) Data Mining traditionally uses
A) only variations of s-curve models.
B) only time series data sets.
C) only classical statistical techniques.
D) only very large data sets.
21) When using data mining techniques we usually
A) impose a model on the data.
B) choose among classical statistical models for the underlying explanation.
C) impose not a model, but a pattern, on the data.
D) don't know what pattern may fit a particular set of data.
22) One of the premises of data mining is
A) that we are able to deduce patterns with very small amounts of data.
B) that classical statistical techniques are inappropriate for most business forecasting.
C) that there is a great deal of information locked up in any database.
D) that very few techniques may be used appropriately to analyze most business data sets.
23) Data today is collected
A) when users click a button on the World Wide Web.
B) when a credit card is swiped.
C) when inventory moves through a warehouse.
D) when an individual checks out at a grocery store.
E) All of the options are correct.
24) Data mining
A) is also called Online Transaction Processing (OLTP).
B) is also called Online Analytical Processing (OLAP).
C) is also called Knowledge Discovery in Databases (KDD).
D) None of the options are correct.
25) The job of a data miner
A) is the extraction of explicit intelligence in a data set.
B) is to use limited data to extract meaningful patterns that might exist in large data sets.
C) is to make sense of the available mounds of data by examining the data for patterns.
D) None of the options are correct.
26) SQL or structured query language
A) is a common data mining technique.
B) could be used to "Find all the customers likely to miss a future payment."
C) allows well defined queries of existing databases.
D) could "Group all customers with similar buying habits."
27) The three standard categories of data mining tools are
A) regression, categorization, and association.
B) prediction, categorization, and association.
C) classification, clustering, and association.
D) association, clustering, and analytics.
28) A "target" in data mining
A) is most like a dependent variable in business forecasting.
B) is another name for an attribute.
C) is synonymous with a record.
D) None of the options are correct.
29) In data mining, to "score"
A) is to partition a data set.
B) is the same as creating a target variable in business forecasting.
C) is to predict.
D) None of the options are correct.
30) In standard "business forecasting" we have been seeking verification of previously held hypotheses. In data mining, on the other hand,
A) multiple hypotheses are examined and the optimal one is selected or suggested.
B) the rules are explicitly set by the forecaster.
C) we seek the discovery of new knowledge from the data.
D) None of the options are correct.
31) A "validation set lift chart"
A) is similar to an "out of sample" test statistic used in classical business forecasting.
B) may not be used to judge the predictive power of a k-Nearest-Neighbor model.
C) is rarely used to make an inference about the predictive capability of a k-Nearest-Neighbor model.
D) None of the options are correct.
32)
Consider the Lift Chart above.
The straight line emanating from the origin and labeled "Cumulative Personal Loan using average"
A) is a reference line indicating the use of a linear model.
B) is a reference line that shows an average misclassification rate for this particular data.
C) represents the expected number of correct classifications of any class we would predict if we used the naïve model.
D) represents the expected number of positives we would predict if we used the naïve model.
33) In the following confusion matrix, how many mistakes are made when participants are predicted to be female?
| Actually male | Actually female | ||||||
Predicted male |
| 57 |
|
| 4 |
| ||
Predicted female |
| 6 |
|
| 32 |
|
A) 24
B) 4
C) 6
D) 8
34) A confusion matrix:
A) helps the researcher classify a variable into its component categories.
B) indicates how well the attributes correlate with the target.
C) indicates how well a model has predicted group membership.
D) helps the researcher assess statistical significance.
35) A dummy variable:
A) combines several characteristics together to give a score.
B) is a variable known to have a zero correlation with the target.
C) merely indicates whether a case has a particular characteristic or not.
D) indicates the extent to which people differ on a particular characteristic.
36) Of all the data available today, it is estimated that about 90% of data is
A) unstructured.
B) structured.
C) numerical.
D) graphical.
37) In text mining, "knowledge discovery" refers to
A) extraction of codified features.
B) analysis of feature distribution.
C) counting the number of unknown terms used in a document.
D) the measurement of single-use terms present in the text.
38)
In the accompanying diagram of an insect we wish to classify, "Spiracle Diameter" is
A) a dependent variable.
B) a target.
C) a record.
D) an attribute.
39) In data mining terminology, "scoring" refers to
A) using the algorithm to predict.
B) the estimation of the parameters of the algorithm.
C) evaluating the appropriateness and accuracy of the algorithm.
D) an examination of the misclassification rate.
40) Which of the following statements is an example of one that a data scientist would seek, as opposed to one that a database manager would seek?
A) Find all customers that use Mastercard.
B) Find all customers that are likely to miss one payment.
C) Find all customers that live in South Bend.
D) Find all customers that missed one payment.
41) IBM/SPSS Modeler uses the "SEMMA" approach to the data mining process. The second "M" in SEMMA refers to
A) Model.
B) Mitigate.
C) Mix.
D) Manage.
42)
Doctors in ER | Actual Heart Attack | No Heart Attack |
| Tree Algorithm (Goldman) | Actual Heart Attack | No Heart Attack | ||||||||||||||
Predict Heart Attack |
| 0.89 |
|
| 0.75 |
|
| Predict Heart Attack |
| 0.96 |
|
| 0.08 |
| ||||||
Predict No Heart Attack |
| 0.11 |
|
| 0.25 |
|
| Predict No Heart Attack |
| 0.04 |
|
| 0.92 |
| ||||||
"BEFORE" |
| "AFTER" |
In the book Blink by Malcolm Gladwell, predictive analytics allowed emergency room physicians to be better able to help prospective heart attack victims in Cook County Hospital. Which statement best describes what happened after the predictive analytics?
A) While the study showed more accurate classification was possible, the required information needed took too much time to collect.
B) Twenty-six attributes were able to predict heart attacks (an non-heart attacks) almost perfectly.
C) The physicians began concentrating on only three "risk factors."
D) Examining only two attributes allowed about a 67% increase in accuracy of classification.
43)
"Lift" is represented in the diagram by the distance between
A) ab.
B) bc.
C) bd.
D) bf.
E) be.
44)
Classification Confusion Matrix | |||||||
| Predicted Class | ||||||
Actual Class | 1 | 0 | |||||
1 |
| 162 |
|
| 32 |
| |
0 |
| 4 |
|
| 1802 |
|
Error Report | |||||||||||
Class | #Cases | #Errors | % Error | ||||||||
1 |
| 194 |
|
| 32 |
|
| 16.49 |
| ||
0 |
| 1806 |
|
| 4 |
|
| 0.22 |
| ||
Overall |
| 2000 |
|
| 36 |
|
| 1.80 |
|
Validation Confusion Matrix Universal Bank
Target = personal loan customer
Class 1 = personal loan customer
Class 0 = not a personal loan customer
What percentage of actual bank personal loan customers were misclassified as non-personal loan customers?
A) 16.5%
B) 0.22%
C) 83.5%
D) 99.78%
45)
Classification Confusion Matrix | |||||||
| Predicted Class | ||||||
Actual Class | 1 | 0 | |||||
1 |
| 162 |
|
| 32 |
| |
0 |
| 4 |
|
| 1802 |
|
Error Report | |||||||||||
Class | #Cases | #Errors | % Error | ||||||||
1 |
| 194 |
|
| 32 |
|
| 16.49 |
| ||
0 |
| 1806 |
|
| 4 |
|
| 0.22 |
| ||
Overall |
| 2000 |
|
| 36 |
|
| 1.80 |
|
Validation Confusion Matrix Universal Bank |
Target = personal loan customer |
Class 1 = personal loan customer |
Class 0 = not a personal loan customer |
What percentage of actual bank personal loan customers were classified correctly as personal loan customers?
A) 16.5%
B) 0.22%
C) 83.5%
D) 99.78%
46)
Classification Confusion Matrix | |||||||
| Predicted Class | ||||||
Actual Class | 1 | 0 | |||||
1 |
| 162 |
|
| 32 |
| |
0 |
| 4 |
|
| 1802 |
|
Error Report | |||||||||||
Class | #Cases | #Errors | % Error | ||||||||
1 |
| 194 |
|
| 32 |
|
| 16.49 |
| ||
0 |
| 1806 |
|
| 4 |
|
| 0.22 |
| ||
Overall |
| 2000 |
|
| 36 |
|
| 1.80 |
|
Validation Confusion Matrix Universal Bank |
Target = personal loan customer |
Class 1 = personal loan customer |
Class 0 = not a personal loan customer |
What percentage of actual bank non-personal loan customers were correctly classified?
A) 16.5%
B) 0.22%
C) 83.51%
D) 99.8%
47)
Classification Confusion Matrix | |||||||
| Predicted Class | ||||||
Actual Class | 1 | 0 | |||||
1 |
| 162 |
|
| 32 |
| |
0 |
| 4 |
|
| 1802 |
|
Error Report | |||||||||||
Class | #Cases | #Errors | % Error | ||||||||
1 |
| 194 |
|
| 32 |
|
| 16.49 |
| ||
0 |
| 1806 |
|
| 4 |
|
| 0.22 |
| ||
Overall |
| 2000 |
|
| 36 |
|
| 1.80 |
|
Validation Confusion Matrix Universal Bank |
Target = personal loan customer |
Class 1 = personal loan customer |
Class 0 = not a personal loan customer |
What percentage of actual bank non-loan customers were misclassified?
A) 16.49%
B) 0.22%
C) 83.5%
D) 99.8%
48)
Classification Confusion Matrix | ||||||
| Predicted Class | |||||
Actual Class | 1 | 0 | ||||
1 |
| 1078 |
|
| 302 |
|
0 |
| 311 |
|
| 1134 |
|
Error Report | |||||||||||
Class | #Cases | #Errors | % Error | ||||||||
1 |
| 1380 |
|
| 302 |
|
| 21.88 |
| ||
0 |
| 1445 |
|
| 311 |
|
| 21.52 |
| ||
Overall |
| 2825 |
|
| 613 |
|
| 21.70 |
|
Validation Confusion Matrix Telecommunications Churn
Target = Churn
Class 0 = customer did churn
Class 1 = customer did not churn
What percentage of customers who actually did churn were correctly classified?
A) 78.12%
B) 21.5%
C) 78.48%
D) 21.2%
49)
Classification Confusion Matrix | ||||||
| Predicted Class | |||||
Actual Class | 1 | 0 | ||||
1 |
| 1078 |
|
| 302 |
|
0 |
| 311 |
|
| 1134 |
|
Error Report | |||||||||||
Class | #Cases | #Errors | % Error | ||||||||
1 |
| 1380 |
|
| 302 |
|
| 21.88 |
| ||
0 |
| 1445 |
|
| 311 |
|
| 21.52 |
| ||
Overall |
| 2825 |
|
| 613 |
|
| 21.70 |
|
Validation Confusion Matrix Telecommunications Churn
Target = Churn
Class 0 = customer did churn
Class 1 = customer did not churn
What percentage of customers who actually did not churn were correctly classified?
A) 78.12%
B) 21.5%
C) 78.48%
D) 21.2%
50)
Classification Confusion Matrix | ||||||
| Predicted Class | |||||
Actual Class | 1 | 0 | ||||
1 |
| 1078 |
|
| 302 |
|
0 |
| 311 |
|
| 1134 |
|
Error Report | |||||||||||
Class | #Cases | #Errors | % Error | ||||||||
1 |
| 1380 |
|
| 302 |
|
| 21.88 |
| ||
0 |
| 1445 |
|
| 311 |
|
| 21.52 |
| ||
Overall |
| 2825 |
|
| 613 |
|
| 21.70 |
|
Validation Confusion Matrix Telecommunications Churn
Target = Churn
Class 0 = customer did churn
Class 1 = customer did not churn
What percentage of customers who actually did not churn were incorrectly classified?
A) 78.12%
B) 21.5%
C) 78.48%
D) 21.88%
51)
Classification Confusion Matrix | ||||||
| Predicted Class | |||||
Actual Class | 1 | 0 | ||||
1 |
| 1078 |
|
| 302 |
|
0 |
| 311 |
|
| 1134 |
|
Error Report | |||||||||||
Class | #Cases | #Errors | % Error | ||||||||
1 |
| 1380 |
|
| 302 |
|
| 21.88 |
| ||
0 |
| 1445 |
|
| 311 |
|
| 21.52 |
| ||
Overall |
| 2825 |
|
| 613 |
|
| 21.70 |
|
Validation Confusion Matrix Telecommunications Churn
Target = Churn
Class 0 = customer did churn
Class 1 = customer did not churn
What percentage of customers who actually did churn were incorrectly classified?
A) 78.12%
B) 21.52%
C) 78.48%
D) 21.2%
52) Data mining algorithms require _______.
A) an efficient sampling method
B) storage of intermediate results
C) the capacity to handle large amounts of data
D) All of the options are correct.
53) The goal of data mining is to build _______ models.
A) retrospective
B) interrogative
C) predictive
D) imperative
54) Data mining is best described as the process of
A) identifying patterns in data.
B) deducing relationships in data.
C) representing data.
D) simulating trends in data.
55) Data used to build a data mining model is called _______.
A) validation data
B) training data
C) test data
D) unseen data
56) Assume you are asked to create a model that predicts the number of new babies born per period according to the size of the stork population. In this case, the number of babies is
A) a target.
B) a feature.
C) an outcome.
D) an observation.
57) Data mining is best described as the process of
A) identifying patterns in data.
B) deducing relationships in data.
C) representing data.
D) simulating trends in data.
58) Another name for an algorithm output?
A) Predictive variable
B) Independent variable
C) Estimated variable
D) Target variable
59) KDD describes the _______.
A) whole process of extraction of knowledge from data
B) extraction of data
C) extraction of information
D) extraction of rules
60) An algorithm that is controlled by a human during its execution is a(n) _______ algorithm.
A) unsupervised learning
B) supervised learning
C) batch learning
D) incremental
61) SQL stands for _______.
A) simple query language
B) structured query language
C) strong query language
D) simple language
62) _______ analysis divides data into groups that are meaningful, useful, or both.
A) A clustering
B) An association rule mining
C) A classification
D) A relational
Document Information
Connected Book
Explore recommendations drawn directly from what you're reading
Chapter 6 Explanatory Models 2. Time-Series Decomposition
DOCX Ch. 6
Chapter 7 ARIMA Forecasting Models
DOCX Ch. 7
Chapter 8 Predictive Analytics Helping To Make Sense Of Big Data
DOCX Ch. 8 Current
Chapter 9 Classification Models The Most Used Models In Analytics
DOCX Ch. 9
Chapter 10 Ensemble Models And Clustering
DOCX Ch. 10