Introduction to Clustering (K-Means)

Clustering is an unsupervised learning method where the goal is to group similar data points into clusters without using labels.

One of the most popular algorithms for clustering is K-Means.

How `K-Means` Works

The following are the steps of K-Means:

Choose k: the number of clusters.
Initialize: k cluster centers randomly.
Assign points: to the nearest center.
Update centers: to be the mean of their assigned points.
Repeat: steps 3–4 until the cluster assignments stop changing.

K-Means tries to minimize the distance between points in the same cluster and their cluster center.

When to Use K-Means

The following are the conditions for using K-Means:

You want to group data by similarity without predefined labels.
Your dataset has numerical features and a moderate number of dimensions.
You suspect there are clear groups in the data.

Example: Clustering Iris Data

The following example shows how to use K-Means to cluster the Iris dataset.

K-Means Example
from sklearn.datasets import load_iris
from sklearn.cluster import KMeans
import matplotlib.pyplot as plt

# Load data (only first two features for visualization)
iris = load_iris()
X = iris.data[:, :2]

# Apply K-Means
kmeans = KMeans(n_clusters=3, random_state=42)
labels = kmeans.fit_predict(X)

# Plot clusters
plt.scatter(X[:, 0], X[:, 1], c=labels, cmap='viridis', edgecolor='k')
plt.scatter(kmeans.cluster_centers_[:, 0], kmeans.cluster_centers_[:, 1],
            s=200, c='red', marker='X', label='Centers')
plt.xlabel(iris.feature_names[0])
plt.ylabel(iris.feature_names[1])
plt.title("K-Means Clustering (Iris)")
plt.legend()
plt.show()

Key Takeaways

Unsupervised learning means no labels are provided during training.
K-Means groups data into k clusters by minimizing distances within each cluster.
Choosing the right value of k is essential — commonly done using the elbow method.

Want to learn more?

Join CodeFriends Plus membership or enroll in a course to start your journey.

How K-Means Works​

When to Use K-Means​

Example: Clustering Iris Data​

Key Takeaways​

Want to learn more?

How `K-Means` Works

When to Use K-Means

Example: Clustering Iris Data

Key Takeaways