Core API
The core module contains the main SOM classes and implementations.
Base Classes
Abstract base class for all SOM variants.
- class torchsom.core.base_som.BaseSOM(*args, **kwargs)[source]
Bases:
Module
,ABC
Abstract base class for all SOM variants.
- abstract fit(data)[source]
Train the SOM on the given data.
- Parameters:
data (torch.Tensor) – Input data tensor [batch_size, num_features]
- Returns:
Quantization and topographic errors [epoch]
- Return type:
Tuple[List[float], List[float]]
- abstract identify_bmus(data)[source]
Find best matching units for input data.
- Parameters:
data (torch.Tensor) – Input data tensor [batch_size, num_features] or [num_features]
- Returns:
- For single sample: Tensor of shape [2] with [row, col].
For batch: Tensor of shape [batch_size, 2] with [row, col] pairs
- Return type:
torch.Tensor
- abstract initialize_weights(data, mode=None)[source]
Initialize the SOM weights.
- Parameters:
data (torch.Tensor) – Input data tensor [batch_size, num_features]
mode (str, optional) – Weight initialization method. Defaults to None.
- Return type:
None
Classical SOM Implementation
PyTorch implementation of classic Self Organizing Maps using batch learning.
- class torchsom.core.som.SOM(x, y, num_features, epochs=10, batch_size=5, sigma=1.0, learning_rate=0.5, neighborhood_order=1, topology='rectangular', lr_decay_function='asymptotic_decay', sigma_decay_function='asymptotic_decay', neighborhood_function='gaussian', distance_function='euclidean', initialization_mode='random', device='cpu', random_seed=42)[source]
Bases:
BaseSOM
PyTorch implementation of Self Organizing Maps using batch learning.
- Parameters:
BaseSOM – Abstract base class for SOM variants
x (int)
y (int)
num_features (int)
epochs (int)
batch_size (int)
sigma (float)
learning_rate (float)
neighborhood_order (int)
topology (str)
lr_decay_function (str)
sigma_decay_function (str)
neighborhood_function (str)
distance_function (str)
initialization_mode (str)
device (str)
random_seed (int)
- build_map(map_type, data=None, target=None, bmus_data_map=None, **kwargs)[source]
Unified method to build various types of maps.
- Parameters:
map_type (str) – Type of map to build. Options: - “hit”: Hit map showing neuron activation frequencies - “distance”: Distance map showing neuron-to-neighbor distances - “bmus_data”: Mapping of BMUs to their corresponding data points - “metric”: Metric map based on target values (requires target) - “score”: Score map combining standard error with distribution penalty (requires target) - “rank”: Rank map based on neuron standard deviations (requires target) - “classification”: Classification map with most frequent labels (requires target)
data (Optional[torch.Tensor]) – Input data tensor [batch_size, num_features]. Required if bmus_data_map is not provided.
target (Optional[torch.Tensor]) – Target values/labels (required for some map types)
bmus_data_map (Optional[dict[tuple[int, int], list[int]]]) – Pre-computed BMU to data indices mapping. If provided, avoids recomputing BMUs for dependent maps.
**kwargs – Additional arguments specific to each map type: - batch_size (int): Batch processing size (default: 1024) - distance_metric (str): Distance function for distance maps - neighborhood_order (int): Neighborhood order for distance/classification maps - scaling (str): ‘sum’ or ‘mean’ for distance maps - reduction_parameter (str): ‘mean’ or ‘std’ for metric maps - return_indices (bool): Return indices instead of data for bmus_data maps
- Returns:
Map result (type depends on map_type)
- Return type:
torch.Tensor or Dict
- Raises:
ValueError – If invalid map_type is specified
ValueError – If target is required but not provided
ValueError – If neither data nor bmus_data_map is provided
- build_multiple_maps(map_configs, data, target=None, batch_size=1024)[source]
Efficiently build multiple maps by reusing BMUs computation.
- Parameters:
map_configs (list[dict]) – List of map configurations
data (torch.Tensor) – Input data tensor
target (Optional[torch.Tensor]) – Target values (if needed by any map)
batch_size (int) – Batch size for BMUs computation
- Returns:
Dictionary mapping map names to their results
- Return type:
dict[str, torch.Tensor]
Example
- configs = [
{‘type’: ‘hit’}, {‘type’: ‘metric’, ‘kwargs’: {‘reduction_parameter’: ‘std’}}, {‘type’: ‘rank’}, {‘type’: ‘classification’, ‘kwargs’: {‘neighborhood_order’: 2}}
] results = som.build_multiple_maps(configs, data, target)
- cluster(method='kmeans', n_clusters=None, feature_space='weights', **kwargs)[source]
Cluster SOM neurons using various clustering algorithms.
- Parameters:
method (str) – Clustering method. Options: “kmeans”, “gmm”, “hdbscan”
n_clusters (Optional[int]) – Number of clusters. If None, uses automatic selection
feature_space (str) – Feature space for clustering. Options: - “weights”: Cluster based on neuron weight vectors - “positions”: Cluster based on 2D neuron coordinates - “combined”: Use both weights and positions
**kwargs – Additional arguments for clustering algorithms
- Returns:
- Clustering results containing:
labels: Cluster assignments for neurons [n_neurons]
centers: Cluster centers [n_clusters, n_features]
n_clusters: Number of clusters found
method: Clustering method used
metrics: Dictionary of clustering quality metrics
feature_space: Feature space used for clustering
original_data: Features used for clustering
- Return type:
dict[str, Any]
- Raises:
ValueError – If invalid method or feature_space is specified
- collect_samples(query_sample, historical_samples, historical_outputs, bmus_idx_map, min_buffer_threshold=50)[source]
Collect historical samples similar to the query sample using SOM projection.
- Parameters:
query_sample (torch.Tensor) – Query sample tensor [num_features]
historical_samples (torch.Tensor) – Historical samples tensor [num_samples, num_features]
historical_outputs (torch.Tensor) – Historical outputs tensor [num_samples]
bmus_idx_map (dict[tuple[int, int], list[int]]) – BMU to data indices mapping
min_buffer_threshold (int) – Minimum buffer threshold
- Return type:
tuple[Tensor, Tensor]
- fit(data, verbose=True)[source]
Train the SOM using batches and track errors.
- Parameters:
data (torch.Tensor) – input data tensor [batch_size, num_features]
verbose (bool, optional) – Whether to print progress. Defaults to True.
- Returns:
Quantization and topographic errors [epoch]
- Return type:
Tuple[List[float], List[float]]
- identify_bmus(data)[source]
Find BMUs for input data.
- Parameters:
data (torch.Tensor) – Input tensor of shape [batch_size, features]
- Returns:
BMU coordinates as tensor [batch_size, 2]
- Return type:
torch.Tensor
- initialize_weights(data, mode=None)[source]
Data should be normalized before initialization.
Initialize weights using:
Random samples from input data.
PCA components to make the training process converge faster.
- Parameters:
data (torch.Tensor) – input data tensor [batch_size, num_features]
mode (str, optional) – selection of the method to init the weights. Defaults to None.
- Raises:
ValueError – Ensure neurons’ weights and input data have the same number of features
RuntimeError – If random initialization takes too long
ValueError – Requires at least 2 features for PCA
ValueError – Requires more than one sample to perform PCA
ValueError – Ensure an appropriate method for initialization
- Return type:
None
- quantization_error(data)[source]
Calculate quantization error.
- Parameters:
data (torch.Tensor) – input data tensor [batch_size, num_features] or [num_features]
- Returns:
Average quantization error value
- Return type:
float
- topographic_error(data)[source]
Calculate topographic error with batch support.
- Parameters:
data (torch.Tensor) – input data tensor [batch_size, num_features] or [num_features]
- Returns:
Topographic error ratio
- Return type:
float
- weights
Pre-compute: 1. Coordinate distance matrices for efficient distance calculations 2. Neighbor offsets for topology operations 3. Decay schedules for all epochs at once
Example usage
import torch
from torchsom import SOM
X = torch.randn(1000, 4)
som = SOM(x=10, y=10, num_features=4, epochs=20)
som.initialize_weights(data=X, mode="pca")
q_errors, t_errors = som.fit(X)
# Build maps via unified API
distance_map = som.build_map("distance")
hit_map = som.build_map("hit", data=X)
# Efficiently build multiple maps with shared BMUs
results = som.build_multiple_maps(
map_configs=[
{"type": "hit"},
{"type": "distance"},
],
data=X,
)