Clustering

A trained SOM already organizes data into a topology-preserving grid. Clustering goes one step further and groups the neurons themselves into a small number of regions, turning the map into an explicit segmentation. TorchSOM exposes this through a single method, cluster(), with three algorithms and built-in diagnostics for choosing among them.

The cluster method

result = som.cluster(
    method="kmeans",          # "kmeans", "gmm", or "hdbscan"
    n_clusters=4,             # ignored by HDBSCAN, which finds k itself
    feature_space="weights",  # "weights", "positions", or "combined"
)

It returns a dictionary describing the clustering, including labels (one cluster id per neuron), method, n_clusters, feature_space, and a metrics block (silhouette, Davies–Bouldin, Calinski–Harabasz). Pass the whole dictionary to the visualizer to draw it.

Choosing the feature space

feature_space

Clusters neurons by…

"weights"

their codebook vectors — groups neurons that encode similar inputs. The usual choice.

"positions"

their grid coordinates — groups neurons that are spatially close.

"combined"

both, balancing feature similarity with spatial contiguity.

Choosing an algorithm

Method

Needs n_clusters?

Best for

"kmeans"

Yes

Compact, roughly spherical clusters; fast baseline.

"gmm"

Yes

Elliptical clusters and soft assignments.

"hdbscan"

No

Arbitrary shapes and noise; density-based, finds k automatically.

HDBSCAN labels low-density neurons as noise (cluster id -1), which the cluster map renders as an “Uncertain” category.

Choosing the number of clusters

For K-Means and GMM, use the elbow and silhouette diagnostics rather than guessing.

from torchsom import SOMVisualizer

viz = SOMVisualizer(som=som)

# Elbow: within-cluster dispersion vs k; look for the bend
viz.plot_elbow_analysis(max_k=10, feature_space="weights")

# Silhouette: how cleanly points sit in their cluster (higher is better)
result = som.cluster(method="kmeans", n_clusters=4, feature_space="weights")
viz.plot_silhouette_analysis(cluster_result=result)

Comparing algorithms objectively

Instead of picking by eye, score several configurations side by side:

results = [
    som.cluster(method=m, feature_space="weights")
    for m in ("kmeans", "gmm", "hdbscan")
]
viz.plot_cluster_quality_comparison(results_list=results)

The comparison reports silhouette, Davies–Bouldin, and Calinski–Harabasz scores for each method, so the final choice is driven by metrics.

Visualizing the result

result = som.cluster(method="hdbscan", feature_space="weights")
viz.plot_cluster_map(cluster_result=result)
Cluster assignment overlaid on the SOM grid

Read the cluster map together with the U-matrix: cluster boundaries should fall along the U-matrix ridges (regions of large inter-neuron distance).

End-to-end example

import torch
from sklearn.datasets import make_blobs
from sklearn.preprocessing import StandardScaler

from torchsom import SOM, SOMVisualizer

X, _ = make_blobs(n_samples=1000, centers=4, n_features=5, random_state=42)
data = torch.tensor(StandardScaler().fit_transform(X), dtype=torch.float32)

som = SOM(x=25, y=15, num_features=5, epochs=100, batch_size=16,
          topology="hexagonal", initialization_mode="pca", random_seed=42)
som.initialize_weights(data=data, mode=som.initialization_mode)
som.fit(data=data)

viz = SOMVisualizer(som=som)
viz.plot_elbow_analysis(max_k=10, feature_space="weights")     # pick k
result = som.cluster(method="kmeans", n_clusters=4, feature_space="weights")
viz.plot_cluster_map(cluster_result=result)

Next steps