Core API

The core module contains the main SOM classes and implementations.

Base Classes

Abstract base class for all SOM variants.

class torchsom.core.base_som.BaseSOM(*args, **kwargs)[source]

Bases: Module, ABC

Abstract base class for all SOM variants.

Parameters:
abstractmethod fit(data)[source]

Train the SOM on the given data.

Parameters:

data (torch.Tensor) – Input data tensor [batch_size, num_features]

Returns:

Quantization and topographic errors [epoch]

Return type:

Tuple[List[float], List[float]]

abstractmethod identify_bmus(data)[source]

Find best matching units for input data.

Parameters:

data (torch.Tensor) – Input data tensor [batch_size, num_features] or [num_features]

Returns:

For single sample: Tensor of shape [2] with [row, col].

For batch: Tensor of shape [batch_size, 2] with [row, col] pairs

Return type:

torch.Tensor

abstractmethod initialize_weights(data, mode=None)[source]

Initialize the SOM weights.

Parameters:
  • data (torch.Tensor) – Input data tensor [batch_size, num_features]

  • mode (str, optional) – Weight initialization method. Defaults to None.

Return type:

None

abstractmethod quantization_error(data)[source]

Calculate quantization error.

Parameters:

data (torch.Tensor) – Input data tensor [batch_size, num_features] or [num_features]

Returns:

Average quantization error value

Return type:

float

abstractmethod topographic_error(data)[source]

Calculate topographic error.

Parameters:

data (torch.Tensor) – Input data tensor [batch_size, num_features] or [num_features]

Returns:

Topographic error ratio

Return type:

float

Classical SOM Implementation

PyTorch implementation of classic Self Organizing Maps using batch learning.

class torchsom.core.som.SOM(x, y, num_features, epochs=10, batch_size=5, sigma=1.0, learning_rate=0.5, neighborhood_order=1, topology='rectangular', lr_decay_function='asymptotic_decay', sigma_decay_function='asymptotic_decay', neighborhood_function='gaussian', distance_function='euclidean', initialization_mode='random', pbc=False, search_backend='auto', device='cpu', random_seed=42)[source]

Bases: BaseSOM

PyTorch implementation of Self Organizing Maps using batch learning.

Parameters:
  • BaseSOM – Abstract base class for SOM variants

  • x (int)

  • y (int)

  • num_features (int)

  • epochs (int)

  • batch_size (int)

  • sigma (float)

  • learning_rate (float)

  • neighborhood_order (int)

  • topology (str)

  • lr_decay_function (str)

  • sigma_decay_function (str)

  • neighborhood_function (str)

  • distance_function (str)

  • initialization_mode (str)

  • pbc (bool)

  • search_backend (str)

  • device (str)

  • random_seed (int)

build_map(map_type, data=None, target=None, bmus_data_map=None, **kwargs)[source]

Unified method to build various types of maps.

Parameters:
  • map_type (str) – Type of map to build. Options: - “hit”: Hit map showing neuron activation frequencies - “distance”: Distance map showing neuron-to-neighbor distances - “bmus_data”: Mapping of BMUs to their corresponding data points - “metric”: Metric map based on target values (requires target) - “score”: Score map combining standard error with distribution penalty (requires target) - “rank”: Rank map based on neuron standard deviations (requires target) - “classification”: Classification map with most frequent labels (requires target)

  • data (Optional[torch.Tensor]) – Input data tensor [batch_size, num_features]. Required if bmus_data_map is not provided.

  • target (Optional[torch.Tensor]) – Target values/labels (required for some map types)

  • bmus_data_map (Optional[dict[tuple[int, int], list[int]]]) – Pre-computed BMU to data indices mapping. If provided, avoids recomputing BMUs for dependent maps.

  • **kwargs – Additional arguments specific to each map type: - batch_size (int): Batch processing size (default: 1024) - distance_metric (str): Distance function for distance maps - neighborhood_order (int): Neighborhood order for distance/classification maps - scaling (str): ‘sum’ or ‘mean’ for distance maps - reduction_parameter (str): ‘mean’ or ‘std’ for metric maps - return_indices (bool): Return indices instead of data for bmus_data maps

Returns:

Map result (type depends on map_type)

Return type:

torch.Tensor or Dict

Raises:
  • ValueError – If invalid map_type is specified

  • ValueError – If target is required but not provided

  • ValueError – If neither data nor bmus_data_map is provided

build_multiple_maps(map_configs, data, target=None, batch_size=1024)[source]

Efficiently build multiple maps by reusing BMUs computation.

Parameters:
  • map_configs (list[dict]) – List of map configurations

  • data (torch.Tensor) – Input data tensor

  • target (Optional[torch.Tensor]) – Target values (if needed by any map)

  • batch_size (int) – Batch size for BMUs computation

Returns:

Dictionary mapping map names to their results

Return type:

dict[str, torch.Tensor]

Example

configs = [
    {"type": "hit"},
    {"type": "metric", "kwargs": {"reduction_parameter": "std"}},
    {"type": "rank"},
    {"type": "classification", "kwargs": {"neighborhood_order": 2}},
]
results = som.build_multiple_maps(configs, data, target)
cluster(method='kmeans', n_clusters=None, feature_space='weights', **kwargs)[source]

Cluster SOM neurons using various clustering algorithms.

Parameters:
  • method (str) – Clustering method. Options: “kmeans”, “gmm”, “hdbscan”

  • n_clusters (Optional[int]) – Number of clusters. If None, uses automatic selection

  • feature_space (str) – Feature space for clustering. Options: - “weights”: Cluster based on neuron weight vectors - “positions”: Cluster based on 2D neuron coordinates - “combined”: Use both weights and positions

  • **kwargs – Additional arguments for clustering algorithms

Returns:

Clustering results containing:
  • labels: Cluster assignments for neurons [n_neurons]

  • centers: Cluster centers [n_clusters, n_features]

  • n_clusters: Number of clusters found

  • method: Clustering method used

  • metrics: Dictionary of clustering quality metrics

  • feature_space: Feature space used for clustering

  • original_data: Features used for clustering

Return type:

dict[str, Any]

Raises:

ValueError – If invalid method or feature_space is specified

collect_samples(query_sample, historical_samples, historical_outputs, bmus_idx_map, min_buffer_threshold=50, return_indices=False, retrieval_mode='bmu_neighborhood_knn')[source]

Collect historical samples similar to the query sample using SOM projection.

Three retrieval modes control the collection strategy:

  • "bmu_only": Collect samples mapped to the query’s BMU cell only.

  • "bmu_neighborhood": Collect from BMU + topological neighbors (up to neighborhood_order hops). No KNN fallback.

  • "bmu_neighborhood_knn" (default): Same as bmu_neighborhood, plus KNN fallback in weight space when the buffer is below min_buffer_threshold.

Parameters:
  • query_sample (torch.Tensor) – Query sample tensor [num_features].

  • historical_samples (torch.Tensor) – Historical samples tensor [num_samples, num_features].

  • historical_outputs (torch.Tensor) – Historical outputs tensor [num_samples].

  • bmus_idx_map (dict[tuple[int, int], list[int]]) – BMU to data indices mapping.

  • min_buffer_threshold (int) – Minimum buffer size before KNN fallback triggers. Only used when retrieval_mode="bmu_neighborhood_knn".

  • return_indices (bool) – If True, also return the indices of collected samples.

  • retrieval_mode (str) – Retrieval strategy. One of "bmu_only", "bmu_neighborhood", or "bmu_neighborhood_knn" (default).

Returns:

(historical_data_buffer, historical_output_buffer) If return_indices is True: (historical_data_buffer, historical_output_buffer, indices_tensor)

Return type:

If return_indices is False

Raises:

ValueError – If retrieval_mode is not one of the valid modes.

fit(data, verbose=True)[source]

Train the SOM using batches and track errors.

Parameters:
  • data (torch.Tensor) – input data tensor [batch_size, num_features]

  • verbose (bool, optional) – Whether to print progress. Defaults to True.

Returns:

Quantization and topographic errors [epoch]

Return type:

Tuple[List[float], List[float]]

identify_bmus(data)[source]

Find BMUs for input data.

Uses the configured search strategy (PyTorch brute-force or FAISS).

Parameters:

data (torch.Tensor) – Input tensor of shape [batch_size, features] or [features]

Returns:

BMU coordinates as tensor [batch_size, 2] or [2]

Return type:

torch.Tensor

initialize_weights(data, mode=None)[source]

Data should be normalized before initialization.

Initialize weights using:

  1. Random samples from input data.

  2. PCA components to make the training process converge faster.

Parameters:
  • data (torch.Tensor) – input data tensor [batch_size, num_features]

  • mode (str, optional) – selection of the method to init the weights. Defaults to None.

Raises:
  • ValueError – Ensure neurons’ weights and input data have the same number of features

  • RuntimeError – If random initialization takes too long

  • ValueError – Requires at least 2 features for PCA

  • ValueError – Requires more than one sample to perform PCA

  • ValueError – Ensure an appropriate method for initialization

Return type:

None

quantization_error(data)[source]

Calculate quantization error.

Parameters:

data (torch.Tensor) – input data tensor [batch_size, num_features] or [num_features]

Returns:

Average quantization error value

Return type:

float

set_neighborhood_order(neighborhood_order)[source]

Update the neighborhood order and recompute neighbor offsets.

This only affects retrieval (collect_samples); trained weights are untouched.

Parameters:

neighborhood_order (int) – New neighborhood order (>= 1).

Raises:

ValueError – If neighborhood_order < 1.

Return type:

None

topographic_error(data)[source]

Calculate topographic error with batch support.

Parameters:

data (torch.Tensor) – input data tensor [batch_size, num_features] or [num_features]

Returns:

Topographic error ratio

Return type:

float

Example usage

import torch
from torchsom import SOM

X = torch.randn(1000, 4)
som = SOM(x=10, y=10, num_features=4, epochs=20)
som.initialize_weights(data=X, mode="pca")
q_errors, t_errors = som.fit(X)

# Build maps via unified API
distance_map = som.build_map("distance")
hit_map = som.build_map("hit", data=X)

# Efficiently build multiple maps with shared BMUs
results = som.build_multiple_maps(
    map_configs=[
        {"type": "hit"},
        {"type": "distance"},
    ],
    data=X,
)

Periodic boundary conditions

Periodic boundary conditions are not a separate class. They are enabled with the pbc=True argument of SOM, which wraps the grid into a torus for both rectangular and hexagonal topologies, removing edge effects. See Topologies & Boundary Conditions for when and how to use them.

Roadmap

Growing and Hierarchical SOM variants are planned (see the paper’s Conclusion). They live under torchsom.core.growing and torchsom.core.hierarchical as work-in-progress modules, are not yet part of the public API, and are therefore not documented here. Track progress in the Changelog.