machine learning with python ibm coursera quiz answers week 5

Practice Quiz: Clustering

1. Which of the following is an application of clustering?

  • Customer churn prediction
  • Price estimation
  • Sales prediction
  • Customer segmentation

2. Which approach can be used to calculate dissimilarity of objects in clustering?

  • Cosine similarity
  • Minkowski distance
  • Euclidian distance
  • All of the above

3. How is a center point (centroid) picked for each cluster in k-means upon initialization? (select two)

  • We can randomly choose some observations out of the data set and use these observations as the initial means.
  • We can select it through correlation analysis.
  • We select the k points closest to the mean/median of the entire dataset.
  • We can create some random points as centroids of the clusters.

Graded Quiz: Clustering

4. The objective of k-means clustering is:

  • Yield the highest out of sample accuracy
  • Maximize the number of correctly classified data points
  • Separate dissimilar samples and group similar ones
  • Minimize the cost function via gradient descent

5. Which option correctly orders the steps of k-means clustering?

1 Re-cluster the data points

2 Choose k random observations to calculate each cluster’s mean

3 Update centroid to take cluster mean

4 Repeat until centroids are constant

5 Calculate data point distance to centroids

  • 2, 5, 3, 1, 4
  • 2, 3, 4, 5, 1
  • 3, 5, 1, 4, 2
  • 2, 1, 4, 5, 3

6. How can we gauge the performance of a k-means clustering model when ground truth is not available?

  • Calculate the number of incorrectly classified observations in the training set.
  • Determine the prediction accuracy on the test set.
  • Take the average of the distance between data points and their cluster centroids.
  • Calculate the R-squared value to measure model fit.

7. When the parameter K for k-means clustering increases, what happens to the error?

  • It will decrease because the data points are less possible to be in the wrong cluster.
  • It might increase or decrease depending on if data points are closer to the centroid.
  • It will decrease because distance between data points and centroid will decrease.
  • It will increase because incorrectly classified points are further from the correct centroid.

8. Which of the following is true for partition-based clustering but not hierarchical nor density-based clustering algorithms?

  • Partition-based clustering produces arbitrary shaped clusters.
  • Partition-based clustering can handle spatial clusters and noisy data.
  • Partition-based clustering is a type of unsupervised learning algorithm.
  • Partition-based clustering produces sphere-like clusters.

Leave a Reply