Core API

The core module contains the main SOM classes and implementations.

Base Classes

Abstract base class for all SOM variants.

class torchsom.core.base_som.BaseSOM(*args, **kwargs)[source]

Bases: Module, ABC

Abstract base class for all SOM variants.

abstract fit(data)[source]

Train the SOM on the given data.

Parameters:

data (torch.Tensor) – Input data tensor [batch_size, num_features]

Returns:

Quantization and topographic errors [epoch]

Return type:

Tuple[List[float], List[float]]

abstract identify_bmus(data)[source]

Find best matching units for input data.

Parameters:

data (torch.Tensor) – Input data tensor [batch_size, num_features] or [num_features]

Returns:

For single sample: Tensor of shape [2] with [row, col].

For batch: Tensor of shape [batch_size, 2] with [row, col] pairs

Return type:

torch.Tensor

abstract initialize_weights(data, mode=None)[source]

Initialize the SOM weights.

Parameters:
  • data (torch.Tensor) – Input data tensor [batch_size, num_features]

  • mode (str, optional) – Weight initialization method. Defaults to None.

Return type:

None

abstract quantization_error(data)[source]

Calculate quantization error.

Parameters:

data (torch.Tensor) – Input data tensor [batch_size, num_features] or [num_features]

Returns:

Average quantization error value

Return type:

float

abstract topographic_error(data)[source]

Calculate topographic error.

Parameters:

data (torch.Tensor) – Input data tensor [batch_size, num_features] or [num_features]

Returns:

Topographic error ratio

Return type:

float

Classical SOM Implementation

PyTorch implementation of classic Self Organizing Maps using batch learning.

class torchsom.core.som.SOM(x, y, num_features, epochs=10, batch_size=5, sigma=1.0, learning_rate=0.5, neighborhood_order=1, topology='rectangular', lr_decay_function='asymptotic_decay', sigma_decay_function='asymptotic_decay', neighborhood_function='gaussian', distance_function='euclidean', initialization_mode='random', device='cpu', random_seed=42)[source]

Bases: BaseSOM

PyTorch implementation of Self Organizing Maps using batch learning.

Parameters:
  • BaseSOM – Abstract base class for SOM variants

  • x (int)

  • y (int)

  • num_features (int)

  • epochs (int)

  • batch_size (int)

  • sigma (float)

  • learning_rate (float)

  • neighborhood_order (int)

  • topology (str)

  • lr_decay_function (str)

  • sigma_decay_function (str)

  • neighborhood_function (str)

  • distance_function (str)

  • initialization_mode (str)

  • device (str)

  • random_seed (int)

build_map(map_type, data=None, target=None, bmus_data_map=None, **kwargs)[source]

Unified method to build various types of maps.

Parameters:
  • map_type (str) – Type of map to build. Options: - “hit”: Hit map showing neuron activation frequencies - “distance”: Distance map showing neuron-to-neighbor distances - “bmus_data”: Mapping of BMUs to their corresponding data points - “metric”: Metric map based on target values (requires target) - “score”: Score map combining standard error with distribution penalty (requires target) - “rank”: Rank map based on neuron standard deviations (requires target) - “classification”: Classification map with most frequent labels (requires target)

  • data (Optional[torch.Tensor]) – Input data tensor [batch_size, num_features]. Required if bmus_data_map is not provided.

  • target (Optional[torch.Tensor]) – Target values/labels (required for some map types)

  • bmus_data_map (Optional[dict[tuple[int, int], list[int]]]) – Pre-computed BMU to data indices mapping. If provided, avoids recomputing BMUs for dependent maps.

  • **kwargs – Additional arguments specific to each map type: - batch_size (int): Batch processing size (default: 1024) - distance_metric (str): Distance function for distance maps - neighborhood_order (int): Neighborhood order for distance/classification maps - scaling (str): ‘sum’ or ‘mean’ for distance maps - reduction_parameter (str): ‘mean’ or ‘std’ for metric maps - return_indices (bool): Return indices instead of data for bmus_data maps

Returns:

Map result (type depends on map_type)

Return type:

torch.Tensor or Dict

Raises:
  • ValueError – If invalid map_type is specified

  • ValueError – If target is required but not provided

  • ValueError – If neither data nor bmus_data_map is provided

build_multiple_maps(map_configs, data, target=None, batch_size=1024)[source]

Efficiently build multiple maps by reusing BMUs computation.

Parameters:
  • map_configs (list[dict]) – List of map configurations

  • data (torch.Tensor) – Input data tensor

  • target (Optional[torch.Tensor]) – Target values (if needed by any map)

  • batch_size (int) – Batch size for BMUs computation

Returns:

Dictionary mapping map names to their results

Return type:

dict[str, torch.Tensor]

Example

configs = [

{‘type’: ‘hit’}, {‘type’: ‘metric’, ‘kwargs’: {‘reduction_parameter’: ‘std’}}, {‘type’: ‘rank’}, {‘type’: ‘classification’, ‘kwargs’: {‘neighborhood_order’: 2}}

] results = som.build_multiple_maps(configs, data, target)

cluster(method='kmeans', n_clusters=None, feature_space='weights', **kwargs)[source]

Cluster SOM neurons using various clustering algorithms.

Parameters:
  • method (str) – Clustering method. Options: “kmeans”, “gmm”, “hdbscan”

  • n_clusters (Optional[int]) – Number of clusters. If None, uses automatic selection

  • feature_space (str) – Feature space for clustering. Options: - “weights”: Cluster based on neuron weight vectors - “positions”: Cluster based on 2D neuron coordinates - “combined”: Use both weights and positions

  • **kwargs – Additional arguments for clustering algorithms

Returns:

Clustering results containing:
  • labels: Cluster assignments for neurons [n_neurons]

  • centers: Cluster centers [n_clusters, n_features]

  • n_clusters: Number of clusters found

  • method: Clustering method used

  • metrics: Dictionary of clustering quality metrics

  • feature_space: Feature space used for clustering

  • original_data: Features used for clustering

Return type:

dict[str, Any]

Raises:

ValueError – If invalid method or feature_space is specified

collect_samples(query_sample, historical_samples, historical_outputs, bmus_idx_map, min_buffer_threshold=50)[source]

Collect historical samples similar to the query sample using SOM projection.

Parameters:
  • query_sample (torch.Tensor) – Query sample tensor [num_features]

  • historical_samples (torch.Tensor) – Historical samples tensor [num_samples, num_features]

  • historical_outputs (torch.Tensor) – Historical outputs tensor [num_samples]

  • bmus_idx_map (dict[tuple[int, int], list[int]]) – BMU to data indices mapping

  • min_buffer_threshold (int) – Minimum buffer threshold

Return type:

tuple[Tensor, Tensor]

fit(data, verbose=True)[source]

Train the SOM using batches and track errors.

Parameters:
  • data (torch.Tensor) – input data tensor [batch_size, num_features]

  • verbose (bool, optional) – Whether to print progress. Defaults to True.

Returns:

Quantization and topographic errors [epoch]

Return type:

Tuple[List[float], List[float]]

identify_bmus(data)[source]

Find BMUs for input data.

Parameters:

data (torch.Tensor) – Input tensor of shape [batch_size, features]

Returns:

BMU coordinates as tensor [batch_size, 2]

Return type:

torch.Tensor

initialize_weights(data, mode=None)[source]

Data should be normalized before initialization.

Initialize weights using:

  1. Random samples from input data.

  2. PCA components to make the training process converge faster.

Parameters:
  • data (torch.Tensor) – input data tensor [batch_size, num_features]

  • mode (str, optional) – selection of the method to init the weights. Defaults to None.

Raises:
  • ValueError – Ensure neurons’ weights and input data have the same number of features

  • RuntimeError – If random initialization takes too long

  • ValueError – Requires at least 2 features for PCA

  • ValueError – Requires more than one sample to perform PCA

  • ValueError – Ensure an appropriate method for initialization

Return type:

None

quantization_error(data)[source]

Calculate quantization error.

Parameters:

data (torch.Tensor) – input data tensor [batch_size, num_features] or [num_features]

Returns:

Average quantization error value

Return type:

float

topographic_error(data)[source]

Calculate topographic error with batch support.

Parameters:

data (torch.Tensor) – input data tensor [batch_size, num_features] or [num_features]

Returns:

Topographic error ratio

Return type:

float

weights

Pre-compute: 1. Coordinate distance matrices for efficient distance calculations 2. Neighbor offsets for topology operations 3. Decay schedules for all epochs at once

Example usage

import torch
from torchsom import SOM

X = torch.randn(1000, 4)
som = SOM(x=10, y=10, num_features=4, epochs=20)
som.initialize_weights(data=X, mode="pca")
q_errors, t_errors = som.fit(X)

# Build maps via unified API
distance_map = som.build_map("distance")
hit_map = som.build_map("hit", data=X)

# Efficiently build multiple maps with shared BMUs
results = som.build_multiple_maps(
    map_configs=[
        {"type": "hit"},
        {"type": "distance"},
    ],
    data=X,
)

SOM Variants (WORK IN PROGRESS)

Periodic Boundary Conditioned SOM

Growing SOM

Hierarchical SOM