Ensemble Models And Clustering Test Bank Answers Chapter 10 - Forecasting with Forecast X 7e Complete Test Bank by Barry Keating. DOCX document preview.
Forecasting and Predictive Analytics with Forecast X, 7e (Keating)
Chapter 10 Ensemble Models and Clustering
1) Which of the following is a fundamental difference between bagging and boosting?
A) Bagging is used for supervised learning. Boosting is used with unsupervised clustering.
B) Bagging gives varying weights to training instances. Boosting gives equal weight to all training instances.
C) Bagging does not take the performance of previously built models into account when building a new model. With boosting each new model is built based upon the results of previous models.
D) Boosting is used for supervised learning. Bagging is used with unsupervised clustering.
2) Bagging
A) concerns finding decision boundaries that can be used to separate out different classes.
B) is useful for visualizing data.
C) is a procedure that uses each training data point once.
D) is a form of ensemble model.
3) Boosting
A) is a technique used solely to improve classification results of Naїve Bayes algorithms.
B) is a classification technique similar to kNN.
C) involves the "creation" of data by replicating each instance in the training data multiple times.
D) uses weak learners to create an ensemble model.
4) Movie Recommendation systems are an example of
A) Classification.
B) Clustering.
C) Regression.
D) All of the options are correct.
5) What is the minimum no. of attributes/features required to perform clustering?
A) 0
B) 1
C) 2
D) 3
6) For two runs (i.e., estimations with centroid initialization) of K-Mean clustering is it expected to get same clustering results?
A) Yes.
B) No.
C) Unable to tell.
7) Is it possible that Assignment of observations to clusters does not change between successive iterations in K-Means?
A) Yes.
B) No.
C) Unable to determine.
D) None of the options are correct.
8) Which of the following can act as possible termination conditions in K-Means?
A) For a fixed number of iterations
B) Centroids do not change with successive iterations.
C) Assignment of observations do not change with successive iterations.
D) All of the options are correct.
9)
After performing K-Means Clustering analysis on a data set, you observed this dendrogram. Which of the following conclusion can be drawn from the dendrogram?
A) There were 28 data points in the clustering analysis.
B) The best number of clusters for the analyzed data points is 4.
C) These dendrogram interpretations are not possible for K-Means clustering.
10) What could be the possible reason(s) for producing two different dendrograms using agglomerative clustering algorithm for the same data set?
A) The number of data points used
B) The number of attributes used
C) Both "the number of data points used" and "the number of attributes used" could be a possible reason.
D) None of the options are correct.
11)
In the figure shown, if you draw a horizontal line on the y-axis at y = 2 what will be the number of clusters formed?
A) 1
B) 2
C) 3
D) 4
12)
What is the most appropriate number of clusters for the data points represented by this dendrogram?
A) 2
B) 4
C) 6
D) 8
13) Which of the following metrics, do we have for finding dissimilarity between two clusters in hierarchical clustering?
1) Single-link
2) Complete-link
3) Average-link
A) 1 and 2
B) 1 and 3
C) 2 and 3
D) 1, 2, and 3
14) Attribute scaling is an important step before applying the K-Mean clustering algorithm. What is reason behind this?
A) In distance calculations it will give the same weights for all attributes.
B) You always get the same clusters whether you use or don't use attribute scaling.
C) In Manhattan, distance it is an important step, but in Euclidean it is not.
D) None of the options are correct.
15) Which of the following can be applied to get good results for the K-means clustering algorithm corresponding to global minima?
A) Try to run algorithm for different centroid initialization.
B) Adjust number of iterations.
C) Both "Try to run algorithm for different centroid initialization" and "Adjust number of iterations" can be applied to get good results.
D) None of the options are correct.
16) Which of the following algorithms is not an example of an ensemble method?
A) Bagging
B) Random Forest
C) Gradient Boosting
D) Decision Tree
17) What is true about an ensembled classifier?
A) Classifiers that are more "sure" can vote with more conviction.
B) Most of the times, ensemble classifiers perform better than a single classifier.
C) Both "Classifiers that are more "sure" can vote with more conviction" and "Most of the times, ensemble classifiers perform better than a single classifier" are true.
D) None of the options are correct.
18) Which of the following can be true for selecting base learners for an ensemble?
1. Different learners can come from same algorithm with different parameters.
2. Different learners can come from different algorithms.
3. Different learners can come from different training spaces.
A) 1
B) 2
C) 1 and 2
D) 1, 2, and 3
19) Ensembles will yield bad results when there is significant diversity among the individual models (Assuming all individual models have meaningful and good predictions).
Choose the best answer.
A) The statement is generally true.
B) The statement is false; the diversity of the models is the reason for creating an ensemble.
20) If you use an ensemble of different base algorithms, is it necessary to tune the parameters of all base models to improve the ensemble performance?
A) Yes.
B) No.
C) There is not enough information to answer the question.
21) Generally, an ensemble method works better, if the individual base models have ________.
Assume each individual base models have accuracy greater than 50%.
A) Less correlation among predictions
B) High correlation among predictions
C) Correlation does not have any impact on ensemble output.
D) None of the options are correct.
22) In an election, N candidates are competing against each other and people are voting for either of the candidates. Voters don't communicate with each other while casting their votes.
Which of the following ensemble methods works similar to above-discussed election procedure?
Hint: Individual voters are like base models of ensemble method.
A) Bagging
B) Boosting
C) Either "Bagging" or "Boosting" are correct.
D) None of the options are correct.
23) Suppose, you are working on a binary classification problem, and there are 3 models each with 70% accuracy.
If you want to ensemble these models using majority voting method, what will be the maximum accuracy you can get?
A) 100%
B) 78.38%
C) 44%
D) 70%
24) Which of the following is true about an averaging ensemble?
A) It can only be used in classification problems.
B) It can only be used in regression problems.
C) It can be used in both classification as well as regression problems.
D) None of the options are correct.
25) Which of the following hyperparameters, when increased, may cause a random forest to over fit the data?
1) Number of Trees
2) Depth of Tree
3) Learning Rate
A) Only #1
B) Only #2
C) Only #3
D) All 3 (i.e., 1, 2, and 3)
26) A "dendrogram" is used with which analytics algorithms?
A) Text mining
B) Clustering
C) Ensemble models
D) All of the options are correct.
27) What is bootstrap?
A) It is a procedure that allows the data scientist to reduce the dimensions of the training data set.
B) This is one of many classification type algorithms.
C) It is a procedure for aggregating many attributes into a few attributes.
D) It is based on repeatedly and systematically sampling without replacement from the data.
28) What is clustering?
A) Clustering is an ensemble algorithm for improving the accuracy of classification models.
B) It could be thought of as a set of nested algorithms whose purpose is to choose weak learners.
C) It is the process of grouping the data into classes or clusters so that objects within a cluster have high similarity in comparison to one another.
D) None of the options are correct.
29) Which of the following are the not types of clustering?
A) K-means
B) Hierarchical
C) Agglomerative
D) Splitting
30) The computationally intensive method called the bootstrap involves
A) using regression models as independent variables.
B) using pseudo-data sets.
C) using the Gaussian distribution.
D) using only classical statistical techniques.
E) using the assumption of normality.
31) The bootstrapping process involves
A) diagnostic measures accurate to 109 degrees of freedom.
B) the use of least squares regression to create multiple data sets.
C) data sets that are proportional to one another.
D) creating artificial samples.
E) None of the options are correct.
Document Information
Connected Book
Explore recommendations drawn directly from what you're reading
Chapter 8 Predictive Analytics Helping To Make Sense Of Big Data
DOCX Ch. 8
Chapter 9 Classification Models The Most Used Models In Analytics
DOCX Ch. 9
Chapter 10 Ensemble Models And Clustering
DOCX Ch. 10 Current
Chapter 11 Text Mining
DOCX Ch. 11
Chapter 12 Forecast/Analytics Implementation
DOCX Ch. 12