machine learning with python ibm coursera quiz answers week 5
Practice Quiz: Clustering
1. Which of the following is an application of clustering?
- Customer churn prediction
- Price estimation
- Sales prediction
- Customer segmentation
2. Which approach can be used to calculate dissimilarity of objects in clustering?
- Cosine similarity
- Minkowski distance
- Euclidian distance
- All of the above
3. How is a center point (centroid) picked for each cluster in k-means upon initialization? (select two)
- We can randomly choose some observations out of the data set and use these observations as the initial means.
- We can select it through correlation analysis.
- We select the k points closest to the mean/median of the entire dataset.
- We can create some random points as centroids of the clusters.
Graded Quiz: Clustering
4. The objective of k-means clustering is:
- Yield the highest out of sample accuracy
- Maximize the number of correctly classified data points
- Separate dissimilar samples and group similar ones
- Minimize the cost function via gradient descent
5. Which option correctly orders the steps of k-means clustering?
1 Re-cluster the data points
2 Choose k random observations to calculate each cluster’s mean
3 Update centroid to take cluster mean
4 Repeat until centroids are constant
5 Calculate data point distance to centroids
- 2, 5, 3, 1, 4
- 2, 3, 4, 5, 1
- 3, 5, 1, 4, 2
- 2, 1, 4, 5, 3
6. How can we gauge the performance of a k-means clustering model when ground truth is not available?
- Calculate the number of incorrectly classified observations in the training set.
- Determine the prediction accuracy on the test set.
- Take the average of the distance between data points and their cluster centroids.
- Calculate the R-squared value to measure model fit.
7. When the parameter K for k-means clustering increases, what happens to the error?
- It will decrease because the data points are less possible to be in the wrong cluster.
- It might increase or decrease depending on if data points are closer to the centroid.
- It will decrease because distance between data points and centroid will decrease.
- It will increase because incorrectly classified points are further from the correct centroid.
8. Which of the following is true for partition-based clustering but not hierarchical nor density-based clustering algorithms?
- Partition-based clustering produces arbitrary shaped clusters.
- Partition-based clustering can handle spatial clusters and noisy data.
- Partition-based clustering is a type of unsupervised learning algorithm.
- Partition-based clustering produces sphere-like clusters.