Core API

The core module contains the main SOM classes and implementations.

Base Classes

Abstract base class for all SOM variants.

class torchsom.core.base_som.BaseSOM(*args, **kwargs)[source]

Bases: Module, ABC

Abstract base class for all SOM variants.

abstract build_distance_map(scaling='sum')[source]

Build a distance map (U-matrix) showing neuron similarities.

Parameters:: scaling (str, optional) – Scaling method for distances. Defaults to “sum”.
Returns:: Distance map [row_neurons, col_neurons]
Return type:: torch.Tensor

abstract build_hit_map(data)[source]

Build a hit map showing neuron activation frequencies.

Parameters:: data (torch.Tensor) – Input data tensor [batch_size, num_features] or [num_features]
Returns:: Hit map [row_neurons, col_neurons]
Return type:: torch.Tensor

abstract fit(data)[source]

Train the SOM on the given data.

Parameters:: data (torch.Tensor) – Input data tensor [batch_size, num_features]
Returns:: Quantization and topographic errors [epoch]
Return type:: Tuple[List[float], List[float]]

abstract identify_bmus(data)[source]

Find best matching units for input data.

Parameters:

data (torch.Tensor) – Input data tensor [batch_size, num_features] or [num_features]

Returns:

For single sample: Tensor of shape [2] with [row, col].: For batch: Tensor of shape [batch_size, 2] with [row, col] pairs

Return type:

torch.Tensor

abstract initialize_weights(data, mode=None)[source]

Initialize the SOM weights.

Parameters:

data (torch.Tensor) – Input data tensor [batch_size, num_features]
mode (str, optional) – Weight initialization method. Defaults to None.

Return type:

None

abstract quantization_error(data)[source]

Calculate quantization error.

Parameters:: data (torch.Tensor) – Input data tensor [batch_size, num_features] or [num_features]
Returns:: Average quantization error value
Return type:: float

abstract topographic_error(data)[source]

Calculate topographic error.

Parameters:: data (torch.Tensor) – Input data tensor [batch_size, num_features] or [num_features]
Returns:: Topographic error ratio
Return type:: float

Classical SOM Implementation

PyTorch implementation of classic Self Organizing Maps using batch learning.

class torchsom.core.som.SOM(x, y, num_features, epochs=10, batch_size=5, sigma=1.0, learning_rate=0.5, neighborhood_order=1, topology='rectangular', lr_decay_function='asymptotic_decay', sigma_decay_function='asymptotic_decay', neighborhood_function='gaussian', distance_function='euclidean', initialization_mode='random', device='cpu', random_seed=42)[source]

Bases: BaseSOM

PyTorch implementation of Self Organizing Maps using batch learning.

Parameters:

BaseSOM – Abstract base class for SOM variants
x (int)
y (int)
num_features (int)
epochs (int)
batch_size (int)
sigma (float)
learning_rate (float)
neighborhood_order (int)
topology (str)
lr_decay_function (str)
sigma_decay_function (str)
neighborhood_function (str)
distance_function (str)
initialization_mode (str)
device (str)
random_seed (int)

build_bmus_data_map(data, return_indices=False, batch_size=1024)[source]

Create a mapping of winning neurons to their corresponding data points.

It processes the data in batches to save memory. The hit map is built on CPU, but the calculations are done on GPU if available.

Parameters:

data (torch.Tensor) – input data tensor [num_samples, num_features] or [num_features]
return_indices (bool, optional) – If True, return indices instead of data points. Defaults to False.
batch_size (int, optional) – Size of batches to process. Defaults to 1024.

Returns:

Dictionary mapping bmus to data samples or indices

Return type:

Dict[Tuple[int, int], Any]

build_classification_map(data, target, neighborhood_order=1)[source]

Build a classification map where each neuron is assigned the most frequent label.

In case of a tie, consider labels from neighboring neurons. If there are no neighboring neurons or a second tie, then randomly select one of the top label.

Parameters:

data (torch.Tensor) – Input data tensor [batch_size, num_features]
target (torch.Tensor) – Labels tensor for data points [batch_size]. They are assumed to be encoded with value > 1 for decent visualization.
neighborhood_order (int, optional) – Neighborhood order to consider for tie-breaking. Defaults to 1.

Returns:

Classification map with the most frequent label for each neuron

Return type:

torch.Tensor

build_distance_map(distance_metric=None, neighborhood_order=None, scaling='sum')[source]

Computes the distance map of each neuron with its neighbors.

The distance map represents the normalized sum or mean of distances between a neuron’s weight vector and its neighboring neurons.

Parameters:

scaling (str, optional) – Defaults to “sum”. If ‘mean’, each cell is normalized by the average neighbor distance. If ‘sum’, normalization is done by the sum of distances.
distance_metric (str, optional) – Name of the method to calculate the distance. Defaults to None.
neighborhood_order (int, optional) – Indicate the neighbors to consider for the distance calculation. Defaults to None.

Raises:

ValueError – If an invalid scaling option is provided.
ValueError – If an invalid distance metric is provided.

Returns:

Normalized distance map [row_neurons, col_neurons]

Return type:

torch.Tensor

build_hit_map(data, batch_size=1024)[source]

Returns a matrix where element i,j is the number of times that neuron i,j has been the winner.

It processes the data in batches to save memory. The hit map is built on CPU, but the calculations are done on GPU if available.

Parameters:

data (torch.Tensor) – input data tensor [batch_size, num_features]
batch_size (int, optional) – Size of batches to process. Defaults to 1024.

Returns:

Matrix indicating the number of times each neuron has been identified as bmu.

Return type:

torch.Tensor

build_metric_map(data, target, reduction_parameter)[source]

Calculate neurons’ metrics based on target values.

Parameters:

data (torch.Tensor) – Input data tensor [batch_size, num_features]
target (torch.Tensor) – Labels tensor for data points [batch_size]
reduction_parameter (str) – Decide the calculation to apply to each neuron, ‘mean’ or ‘std’.

Returns:

Metric map based on the reduction parameter.

Return type:

torch.Tensor

build_rank_map(data, target)[source]

Build a map of neuron ranks based on their target value standard deviations.

Parameters:

data (torch.Tensor) – Input data tensor [batch_size, num_features]
target (torch.Tensor) – Labels tensor for data points [batch_size]

Returns:

Rank map where each neuron’s value is its rank (1 = lowest std = best)

Return type:

torch.Tensor

build_score_map(data, target)[source]

Calculate neurons’ score based on target values.

Parameters:

data (torch.Tensor) – Input data tensor [batch_size, num_features]
target (torch.Tensor) – Labels tensor for data points [batch_size]

Returns:

Score map based on a chosen score function: std_neuron / sqrt(n_neuron) * log(N_data/n_neuron). The score combines the standard error with a term penalizing uneven sample distribution across neurons. Lower scores indicate better neuron representativeness.

Return type:

torch.Tensor

cluster(method='kmeans', n_clusters=None, feature_space='weights', **kwargs)[source]

Cluster SOM neurons using various clustering algorithms.

Parameters:

method (str) – Clustering method. Options: “kmeans”, “gmm”, “hdbscan”
n_clusters (Optional[int]) – Number of clusters. If None, uses automatic selection
feature_space (str) – Feature space for clustering. Options: - “weights”: Cluster based on neuron weight vectors - “positions”: Cluster based on 2D neuron coordinates - “combined”: Use both weights and positions
**kwargs – Additional arguments for clustering algorithms

Returns:

Clustering results containing:

labels: Cluster assignments for neurons [n_neurons]
centers: Cluster centers [n_clusters, n_features]
n_clusters: Number of clusters found
method: Clustering method used
metrics: Dictionary of clustering quality metrics
feature_space: Feature space used for clustering
original_data: Features used for clustering

Return type:

dict[str, Any]

Raises:

ValueError – If invalid method or feature_space is specified

collect_samples(query_sample, historical_samples, historical_outputs, bmus_idx_map, min_buffer_threshold=50)[source]

Collect historical samples similar to the query sample using SOM projection.

Parameters:

query_sample (torch.Tensor) – The query data point [num_features]
historical_samples (torch.Tensor) – Historical input data [num_samples, num_features]
historical_outputs (torch.Tensor) – Historical output values [num_samples]
min_buffer_threshold (int, optional) – Minimum number of samples to collect. Defaults to 50.
bmus_idx_map (dict[tuple[int, int], list[int]] | None)

Returns:

(historical_data_buffer, historical_output_buffer)

Return type:

Tuple[torch.Tensor, torch.Tensor]

fit(data)[source]

Train the SOM using batches and track errors.

Parameters:: data (torch.Tensor) – input data tensor [batch_size, num_features]
Returns:: Quantization and topographic errors [epoch]
Return type:: Tuple[List[float], List[float]]

identify_bmus(data)[source]

Find BMUs for input data. Handles both single samples and batches.

It requires a data on the GPU if available for calculations with SOM’s weights on GPU’s too.

Parameters:

data (torch.Tensor) – Input tensor of shape [num_features] or [batch_size, num_features]

Returns:

For single sample: Tensor of shape [2] with [row, col].: For batch: Tensor of shape [batch_size, 2] with [row, col] pairs

Return type:

torch.Tensor

initialize_weights(data, mode=None)[source]

Data should be normalized before initialization.

Initialize weights using:

Random samples from input data.

PCA components to make the training process converge faster.

Parameters:

data (torch.Tensor) – input data tensor [batch_size, num_features]
mode (str, optional) – selection of the method to init the weights. Defaults to None.

Raises:

ValueError – Ensure neurons’ weights and input data have the same number of features
RuntimeError – If random initialization takes too long
ValueError – Requires at least 2 features for PCA
ValueError – Requires more than one sample to perform PCA
ValueError – Ensure an appropriate method for initialization

Return type:

None

quantization_error(data)[source]

Calculate quantization error.

Parameters:: data (torch.Tensor) – input data tensor [batch_size, num_features] or [num_features]
Returns:: Average quantization error value
Return type:: float

topographic_error(data)[source]

Calculate topographic error with batch support.

Parameters:: data (torch.Tensor) – input data tensor [batch_size, num_features] or [num_features]
Returns:: Topographic error ratio
Return type:: float

SOM Variants

Growing SOM

Growing SOM module for torchsom.

Components for growing SOMs.

Growing SOM module for torchsom.

Hierarchical SOM

Hierarchical SOM module for torchsom.

Components for hierarchical SOMs.

Hierarchical SOM module for torchsom.