Core API
The core module contains the main SOM classes and implementations.
Base Classes
- class torchsom.core.base_som.BaseSOM(*args, **kwargs)[source]
Bases:
Module
,ABC
Abstract base class for all SOM variants.
- abstract build_distance_map(scaling='sum')[source]
Build a distance map (U-matrix) showing neuron similarities.
- Parameters:
scaling (str, optional) – Scaling method for distances. Defaults to “sum”.
- Returns:
Distance map [row_neurons, col_neurons]
- Return type:
torch.Tensor
- abstract build_hit_map(data)[source]
Build a hit map showing neuron activation frequencies.
- Parameters:
data (torch.Tensor) – Input data tensor [batch_size, num_features] or [num_features]
- Returns:
Hit map [row_neurons, col_neurons]
- Return type:
torch.Tensor
- abstract fit(data)[source]
Train the SOM on the given data.
- Parameters:
data (torch.Tensor) – Input data tensor [batch_size, num_features]
- Returns:
Quantization and topographic errors [epoch]
- Return type:
Tuple[List[float], List[float]]
- abstract identify_bmus(data)[source]
Find best matching units for input data.
- Parameters:
data (torch.Tensor) – Input data tensor [batch_size, num_features] or [num_features]
- Returns:
- For single sample: Tensor of shape [2] with [row, col].
For batch: Tensor of shape [batch_size, 2] with [row, col] pairs
- Return type:
torch.Tensor
- abstract initialize_weights(data, mode=None)[source]
Initialize the SOM weights.
- Parameters:
data (torch.Tensor) – Input data tensor [batch_size, num_features]
mode (str, optional) – Weight initialization method. Defaults to None.
- Return type:
None
Classical SOM Implementation
- class torchsom.core.som.SOM(x, y, num_features, epochs=10, batch_size=5, sigma=1.0, learning_rate=0.5, neighborhood_order=1, topology='rectangular', lr_decay_function='asymptotic_decay', sigma_decay_function='asymptotic_decay', neighborhood_function='gaussian', distance_function='euclidean', initialization_mode='random', device='cpu', random_seed=42)[source]
Bases:
BaseSOM
PyTorch implementation of Self Organizing Maps using batch learning.
- Parameters:
BaseSOM – Abstract base class for SOM variants
x (int)
y (int)
num_features (int)
epochs (int)
batch_size (int)
sigma (float)
learning_rate (float)
neighborhood_order (int)
topology (str)
lr_decay_function (str)
sigma_decay_function (str)
neighborhood_function (str)
distance_function (str)
initialization_mode (str)
device (str)
random_seed (int)
- build_bmus_data_map(data, return_indices=False, batch_size=1024)[source]
Create a mapping of winning neurons to their corresponding data points. It processes the data in batches to save memory. The hit map is built on CPU, but the calculations are done on GPU if available.
- Parameters:
data (torch.Tensor) – input data tensor [num_samples, num_features] or [num_features]
return_indices (bool, optional) – If True, return indices instead of data points. Defaults to False.
batch_size (int, optional) – Size of batches to process. Defaults to 1024.
- Returns:
Dictionary mapping bmus to data samples or indices
- Return type:
Dict[Tuple[int, int], Any]
- build_classification_map(data, target, neighborhood_order=1)[source]
Build a classification map where each neuron is assigned the most frequent label. In case of a tie, consider labels from neighboring neurons. If there are no neighboring neurons or a second tie, then randomly select one of the top label.
- Parameters:
data (torch.Tensor) – Input data tensor [batch_size, num_features]
target (torch.Tensor) – Labels tensor for data points [batch_size]. They are assumed to be encoded with value > 1 for decent visualization.
neighborhood_order (int, optional) – Neighborhood order to consider for tie-breaking. Defaults to 1.
- Returns:
Classification map with the most frequent label for each neuron
- Return type:
torch.Tensor
- build_distance_map(scaling='sum', distance_metric=None, neighborhood_order=None)[source]
Computes the distance map of each neuron with its neighbors.
The distance map represents the normalized sum or mean of distances between a neuron’s weight vector and its neighboring neurons.
- Parameters:
scaling (str, optional) – Defaults to “sum”. If ‘mean’, each cell is normalized by the average neighbor distance. If ‘sum’, normalization is done by the sum of distances.
distance_metric (str, optional) – Name of the method to calculate the distance. Defaults to None.
neighborhood_order (int, optional) – Indicate the neighbors to consider for the distance calculation. Defaults to None.
- Raises:
ValueError – If an invalid scaling option is provided.
ValueError – If an invalid distance metric is provided.
- Returns:
Normalized distance map [row_neurons, col_neurons]
- Return type:
torch.Tensor
- build_hit_map(data, batch_size=1024)[source]
Returns a matrix where element i,j is the number of times that neuron i,j has been the winner. It processes the data in batches to save memory. The hit map is built on CPU, but the calculations are done on GPU if available.
- Parameters:
data (torch.Tensor) – input data tensor [batch_size, num_features]
batch_size (int, optional) – Size of batches to process. Defaults to 1024.
- Returns:
Matrix indicating the number of times each neuron has been identified as bmu.
- Return type:
torch.Tensor
- build_metric_map(data, target, reduction_parameter)[source]
Calculate neurons’ metrics based on target values.
- Parameters:
data (torch.Tensor) – Input data tensor [batch_size, num_features]
target (torch.Tensor) – Labels tensor for data points [batch_size]
reduction_parameter (str) – Decide the calculation to apply to each neuron, ‘mean’ or ‘std’.
- Returns:
Metric map based on the reduction parameter.
- Return type:
torch.Tensor
- build_rank_map(data, target)[source]
Build a map of neuron ranks based on their target value standard deviations.
- Parameters:
data (torch.Tensor) – Input data tensor [batch_size, num_features]
target (torch.Tensor) – Labels tensor for data points [batch_size]
- Returns:
Rank map where each neuron’s value is its rank (1 = lowest std = best)
- Return type:
torch.Tensor
- build_score_map(data, target)[source]
Calculate neurons’ score based on target values.
- Parameters:
data (torch.Tensor) – Input data tensor [batch_size, num_features]
target (torch.Tensor) – Labels tensor for data points [batch_size]
- Returns:
Score map based on a chosen score function: std_neuron / sqrt(n_neuron) * log(N_data/n_neuron). The score combines the standard error with a term penalizing uneven sample distribution across neurons. Lower scores indicate better neuron representativeness.
- Return type:
torch.Tensor
- collect_samples(query_sample, historical_samples, historical_outputs, min_buffer_threshold=50, bmus_idx_map=None)[source]
Collect historical samples similar to the query sample using SOM projection.
- Parameters:
query_sample (torch.Tensor) – The query data point [num_features]
historical_samples (torch.Tensor) – Historical input data [num_samples, num_features]
historical_outputs (torch.Tensor) – Historical output values [num_samples]
min_buffer_threshold (int, optional) – Minimum number of samples to collect. Defaults to 50.
bmus_idx_map (Dict[Tuple[int, int], List[int]] | None)
- Returns:
(historical_data_buffer, historical_output_buffer)
- Return type:
Tuple[torch.Tensor, torch.Tensor]
- fit(data)[source]
Train the SOM using batches and track errors.
- Parameters:
data (torch.Tensor) – input data tensor [batch_size, num_features]
- Returns:
Quantization and topographic errors [epoch]
- Return type:
Tuple[List[float], List[float]]
- identify_bmus(data)[source]
Find BMUs for input data. Handles both single samples and batches. It requires a data on the GPU if available for calculations with SOM’s weights on GPU’s too.
- Parameters:
data (torch.Tensor) – Input tensor of shape [num_features] or [batch_size, num_features]
- Returns:
- For single sample: Tensor of shape [2] with [row, col].
For batch: Tensor of shape [batch_size, 2] with [row, col] pairs
- Return type:
torch.Tensor
- initialize_weights(data, mode=None)[source]
Data should be normalized before initialization. Initialize weights using
Random samples from input data.
PCA components to make the training process converge faster.
- Parameters:
data (torch.Tensor) – input data tensor [batch_size, num_features]
mode (str, optional) – selection of the method to init the weights. Defaults to None.
- Raises:
ValueError – Ensure neurons’ weights and input data have the same number of features
RuntimeError – If random initialization takes too long
ValueError – Requires at least 2 features for PCA
ValueError – Requires more than one sample to perform PCA
ValueError – Ensure an appropriate method for initialization
- Return type:
None