machine learning with python ibm coursera quiz answers week 5

Practice Quiz: Clustering

1. Which of the following is an application of clustering?

Answers

Customer churn prediction
Price estimation
Sales prediction
Customer segmentation

2. Which approach can be used to calculate dissimilarity of objects in clustering?

Answers

Cosine similarity
Minkowski distance
Euclidian distance
All of the above

3. How is a center point (centroid) picked for each cluster in k-means upon initialization? (select two)

Answers

We can randomly choose some observations out of the data set and use these observations as the initial means.
We can select it through correlation analysis.
We select the k points closest to the mean/median of the entire dataset.
We can create some random points as centroids of the clusters.

Graded Quiz: Clustering

4. The objective of k-means clustering is:

Answers

Yield the highest out of sample accuracy
Maximize the number of correctly classified data points
Separate dissimilar samples and group similar ones
Minimize the cost function via gradient descent

5. Which option correctly orders the steps of k-means clustering?

1 Re-cluster the data points

2 Choose k random observations to calculate each cluster’s mean

3 Update centroid to take cluster mean

4 Repeat until centroids are constant

5 Calculate data point distance to centroids

Answers

2, 5, 3, 1, 4
2, 3, 4, 5, 1
3, 5, 1, 4, 2
2, 1, 4, 5, 3

6. How can we gauge the performance of a k-means clustering model when ground truth is not available?

Answers

Calculate the number of incorrectly classified observations in the training set.
Determine the prediction accuracy on the test set.
Take the average of the distance between data points and their cluster centroids.
Calculate the R-squared value to measure model fit.

7. When the parameter K for k-means clustering increases, what happens to the error?

Answers

It will decrease because the data points are less possible to be in the wrong cluster.
It might increase or decrease depending on if data points are closer to the centroid.
It will decrease because distance between data points and centroid will decrease.
It will increase because incorrectly classified points are further from the correct centroid.

8. Which of the following is true for partition-based clustering but not hierarchical nor density-based clustering algorithms?

Answers

Partition-based clustering produces arbitrary shaped clusters.
Partition-based clustering can handle spatial clusters and noisy data.
Partition-based clustering is a type of unsupervised learning algorithm.
Partition-based clustering produces sphere-like clusters.

Leave a Reply Cancel reply