Core API¶
The core module contains the main SOM classes and implementations.
Base Classes¶
Abstract base class for all SOM variants.
- class torchsom.core.base_som.BaseSOM(*args, **kwargs)[source]¶
-
Abstract base class for all SOM variants.
- abstractmethod fit(data)[source]¶
Train the SOM on the given data.
- Parameters:
data (torch.Tensor) – Input data tensor [batch_size, num_features]
- Returns:
Quantization and topographic errors [epoch]
- Return type:
- abstractmethod identify_bmus(data)[source]¶
Find best matching units for input data.
- Parameters:
data (torch.Tensor) – Input data tensor [batch_size, num_features] or [num_features]
- Returns:
- For single sample: Tensor of shape [2] with [row, col].
For batch: Tensor of shape [batch_size, 2] with [row, col] pairs
- Return type:
- abstractmethod initialize_weights(data, mode=None)[source]¶
Initialize the SOM weights.
- Parameters:
data (torch.Tensor) – Input data tensor [batch_size, num_features]
mode (str, optional) – Weight initialization method. Defaults to None.
- Return type:
None
- abstractmethod quantization_error(data)[source]¶
Calculate quantization error.
- Parameters:
data (torch.Tensor) – Input data tensor [batch_size, num_features] or [num_features]
- Returns:
Average quantization error value
- Return type:
- abstractmethod topographic_error(data)[source]¶
Calculate topographic error.
- Parameters:
data (torch.Tensor) – Input data tensor [batch_size, num_features] or [num_features]
- Returns:
Topographic error ratio
- Return type:
Classical SOM Implementation¶
PyTorch implementation of classic Self Organizing Maps using batch learning.
- class torchsom.core.som.SOM(x, y, num_features, epochs=10, batch_size=5, sigma=1.0, learning_rate=0.5, neighborhood_order=1, topology='rectangular', lr_decay_function='asymptotic_decay', sigma_decay_function='asymptotic_decay', neighborhood_function='gaussian', distance_function='euclidean', initialization_mode='random', pbc=False, search_backend='auto', device='cpu', random_seed=42)[source]¶
Bases:
BaseSOMPyTorch implementation of Self Organizing Maps using batch learning.
- Parameters:
BaseSOM – Abstract base class for SOM variants
x (int)
y (int)
num_features (int)
epochs (int)
batch_size (int)
sigma (float)
learning_rate (float)
neighborhood_order (int)
topology (str)
lr_decay_function (str)
sigma_decay_function (str)
neighborhood_function (str)
distance_function (str)
initialization_mode (str)
pbc (bool)
search_backend (str)
device (str)
random_seed (int)
- build_map(map_type, data=None, target=None, bmus_data_map=None, **kwargs)[source]¶
Unified method to build various types of maps.
- Parameters:
map_type (str) – Type of map to build. Options: - “hit”: Hit map showing neuron activation frequencies - “distance”: Distance map showing neuron-to-neighbor distances - “bmus_data”: Mapping of BMUs to their corresponding data points - “metric”: Metric map based on target values (requires target) - “score”: Score map combining standard error with distribution penalty (requires target) - “rank”: Rank map based on neuron standard deviations (requires target) - “classification”: Classification map with most frequent labels (requires target)
data (Optional[torch.Tensor]) – Input data tensor [batch_size, num_features]. Required if bmus_data_map is not provided.
target (Optional[torch.Tensor]) – Target values/labels (required for some map types)
bmus_data_map (Optional[dict[tuple[int, int], list[int]]]) – Pre-computed BMU to data indices mapping. If provided, avoids recomputing BMUs for dependent maps.
**kwargs – Additional arguments specific to each map type: - batch_size (int): Batch processing size (default: 1024) - distance_metric (str): Distance function for distance maps - neighborhood_order (int): Neighborhood order for distance/classification maps - scaling (str): ‘sum’ or ‘mean’ for distance maps - reduction_parameter (str): ‘mean’ or ‘std’ for metric maps - return_indices (bool): Return indices instead of data for bmus_data maps
- Returns:
Map result (type depends on map_type)
- Return type:
torch.Tensor or Dict
- Raises:
ValueError – If invalid map_type is specified
ValueError – If target is required but not provided
ValueError – If neither data nor bmus_data_map is provided
- build_multiple_maps(map_configs, data, target=None, batch_size=1024)[source]¶
Efficiently build multiple maps by reusing BMUs computation.
- Parameters:
data (torch.Tensor) – Input data tensor
target (Optional[torch.Tensor]) – Target values (if needed by any map)
batch_size (int) – Batch size for BMUs computation
- Returns:
Dictionary mapping map names to their results
- Return type:
Example
configs = [ {"type": "hit"}, {"type": "metric", "kwargs": {"reduction_parameter": "std"}}, {"type": "rank"}, {"type": "classification", "kwargs": {"neighborhood_order": 2}}, ] results = som.build_multiple_maps(configs, data, target)
- cluster(method='kmeans', n_clusters=None, feature_space='weights', **kwargs)[source]¶
Cluster SOM neurons using various clustering algorithms.
- Parameters:
method (str) – Clustering method. Options: “kmeans”, “gmm”, “hdbscan”
n_clusters (Optional[int]) – Number of clusters. If None, uses automatic selection
feature_space (str) – Feature space for clustering. Options: - “weights”: Cluster based on neuron weight vectors - “positions”: Cluster based on 2D neuron coordinates - “combined”: Use both weights and positions
**kwargs – Additional arguments for clustering algorithms
- Returns:
- Clustering results containing:
labels: Cluster assignments for neurons [n_neurons]
centers: Cluster centers [n_clusters, n_features]
n_clusters: Number of clusters found
method: Clustering method used
metrics: Dictionary of clustering quality metrics
feature_space: Feature space used for clustering
original_data: Features used for clustering
- Return type:
- Raises:
ValueError – If invalid method or feature_space is specified
- collect_samples(query_sample, historical_samples, historical_outputs, bmus_idx_map, min_buffer_threshold=50, return_indices=False, retrieval_mode='bmu_neighborhood_knn')[source]¶
Collect historical samples similar to the query sample using SOM projection.
Three retrieval modes control the collection strategy:
"bmu_only": Collect samples mapped to the query’s BMU cell only."bmu_neighborhood": Collect from BMU + topological neighbors (up toneighborhood_orderhops). No KNN fallback."bmu_neighborhood_knn"(default): Same asbmu_neighborhood, plus KNN fallback in weight space when the buffer is belowmin_buffer_threshold.
- Parameters:
query_sample (torch.Tensor) – Query sample tensor [num_features].
historical_samples (torch.Tensor) – Historical samples tensor [num_samples, num_features].
historical_outputs (torch.Tensor) – Historical outputs tensor [num_samples].
bmus_idx_map (dict[tuple[int, int], list[int]]) – BMU to data indices mapping.
min_buffer_threshold (int) – Minimum buffer size before KNN fallback triggers. Only used when
retrieval_mode="bmu_neighborhood_knn".return_indices (bool) – If True, also return the indices of collected samples.
retrieval_mode (str) – Retrieval strategy. One of
"bmu_only","bmu_neighborhood", or"bmu_neighborhood_knn"(default).
- Returns:
(historical_data_buffer, historical_output_buffer) If return_indices is True: (historical_data_buffer, historical_output_buffer, indices_tensor)
- Return type:
If return_indices is False
- Raises:
ValueError – If
retrieval_modeis not one of the valid modes.
- fit(data, verbose=True)[source]¶
Train the SOM using batches and track errors.
- Parameters:
data (torch.Tensor) – input data tensor [batch_size, num_features]
verbose (bool, optional) – Whether to print progress. Defaults to True.
- Returns:
Quantization and topographic errors [epoch]
- Return type:
- identify_bmus(data)[source]¶
Find BMUs for input data.
Uses the configured search strategy (PyTorch brute-force or FAISS).
- Parameters:
data (torch.Tensor) – Input tensor of shape [batch_size, features] or [features]
- Returns:
BMU coordinates as tensor [batch_size, 2] or [2]
- Return type:
- initialize_weights(data, mode=None)[source]¶
Data should be normalized before initialization.
Initialize weights using:
Random samples from input data.
PCA components to make the training process converge faster.
- Parameters:
data (torch.Tensor) – input data tensor [batch_size, num_features]
mode (str, optional) – selection of the method to init the weights. Defaults to None.
- Raises:
ValueError – Ensure neurons’ weights and input data have the same number of features
RuntimeError – If random initialization takes too long
ValueError – Requires at least 2 features for PCA
ValueError – Requires more than one sample to perform PCA
ValueError – Ensure an appropriate method for initialization
- Return type:
None
- quantization_error(data)[source]¶
Calculate quantization error.
- Parameters:
data (torch.Tensor) – input data tensor [batch_size, num_features] or [num_features]
- Returns:
Average quantization error value
- Return type:
- set_neighborhood_order(neighborhood_order)[source]¶
Update the neighborhood order and recompute neighbor offsets.
This only affects retrieval (
collect_samples); trained weights are untouched.- Parameters:
neighborhood_order (int) – New neighborhood order (>= 1).
- Raises:
ValueError – If neighborhood_order < 1.
- Return type:
None
- topographic_error(data)[source]¶
Calculate topographic error with batch support.
- Parameters:
data (torch.Tensor) – input data tensor [batch_size, num_features] or [num_features]
- Returns:
Topographic error ratio
- Return type:
Example usage¶
import torch
from torchsom import SOM
X = torch.randn(1000, 4)
som = SOM(x=10, y=10, num_features=4, epochs=20)
som.initialize_weights(data=X, mode="pca")
q_errors, t_errors = som.fit(X)
# Build maps via unified API
distance_map = som.build_map("distance")
hit_map = som.build_map("hit", data=X)
# Efficiently build multiple maps with shared BMUs
results = som.build_multiple_maps(
map_configs=[
{"type": "hit"},
{"type": "distance"},
],
data=X,
)
Periodic boundary conditions¶
Periodic boundary conditions are not a separate class. They are enabled with the
pbc=True argument of SOM, which wraps the grid into a
torus for both rectangular and hexagonal topologies, removing edge effects. See
Topologies & Boundary Conditions for when and how to use them.
Roadmap¶
Growing and Hierarchical SOM variants are planned (see the paper’s Conclusion). They
live under torchsom.core.growing and torchsom.core.hierarchical as
work-in-progress modules, are not yet part of the public API, and are therefore not
documented here. Track progress in the Changelog.