Frequently Asked Questions
General Questions
What is TorchSOM?
TorchSOM is a modern PyTorch-based implementation of Self-Organizing Maps (SOMs), designed for efficient training and comprehensive visualization of high-dimensional data clustering and analysis.
How does TorchSOM differ from other SOM implementations?
TorchSOM offers several advantages:
GPU acceleration through PyTorch
Modern Python practices with type hints and Pydantic validation
Comprehensive visualization suite with matplotlib integration
Flexible architecture supporting multiple SOM variants
Installation and Setup
Which Python versions are supported?
We recommend using Python 3.9+.
Do I need a GPU to use TorchSOM?
No, TorchSOM works on both CPU and GPU. However, GPU acceleration significantly improves training speed for large datasets and maps. We recommend using a GPU for training.
Data Preprocessing
Should I always normalize my data?
Yes, normalization is crucial because:
Features with larger scales dominate the distance calculation
SOM learning is sensitive to feature magnitudes
StandardScaler or MinMaxScaler from scikit-learn both work well
What about categorical features?
SOMs operate exclusively on numerical data. Therefore, it is essential to convert any categorical features into a numerical format before using them with TorchSOM. Common strategies include:
One-hot encoding for nominal (unordered) categories
Ordinal encoding for ordered categories
Target or frequency encoding for high-cardinality categories
If your dataset contains a mix of numerical and categorical features, ensure all features are numerically encoded prior to training.
Similarly, when visualizing classification or label maps, assign numerical levels to each class or category to enable proper mapping and interpretation in the visualization outputs.
Performance and Optimization
My training is very slow. How can I speed it up?
Try these optimizations:
Enable GPU: Use
device="cuda"
if availableIncrease batch size: Try 64, 128, or 256
Reduce map size: Start smaller and scale up
Use PCA initialization:
initialization_mode="pca"
Reduce epochs: Monitor convergence and stop early
How much memory does TorchSOM use?
Memory usage depends on:
Map size: O(x × y × num_features)
Batch size: Larger batches use more memory
Data size: Keep datasets in reasonable sizes
For large datasets, consider: - Processing in batches - Using CPU instead of GPU - Reducing precision (float32 vs float64)
Visualization Issues
Why are some neurons white in my visualizations?
White neurons typically indicate:
Unactivated neurons: No data points assigned as BMU
Zero values: In some visualizations, zero values appear white
NaN values: Missing or invalid calculations
This is normal for sparse data or oversized maps.
How do I interpret the distance map (D-Matrix)?
In the D-Matrix:
Light areas: High distances between neighboring neurons (cluster boundaries)
Dark areas: Low distances (within clusters)
Patterns: Reveal cluster structure and boundaries
Can I customize the visualization colors?
Yes! Use the VisualizationConfig:
from torchsom.visualization.config import VisualizationConfig
config = VisualizationConfig(
cmap="plasma", # Use a different colormap
figsize=(15, 10), # Set larger figure size
dpi=300 # Set higher resolution
)
Advanced Topics
Can I use TorchSOM for time series data?
TorchSOM is designed to work with tabular data, meaning any data type—such as time series, images, or text—can be used as long as it is represented in a tabular (2D array) format. This typically means that each sample should be a fixed-length feature vector.
For time series or other complex data types, you can preprocess your data to obtain such representations. Common approaches include extracting statistical features, flattening fixed-length windows, or generating embeddings (e.g., using autoencoders or other neural networks) before projecting them onto the SOM map. As long as your data can be converted into a matrix of shape [n_samples, n_features], it can be used with TorchSOM.
How do I implement custom distance functions?
Create a function following the signature:
def custom_distance(data, weights):
"""
Args:
data: [batch_size, 1, 1, n_features]
weights: [1, row_neurons, col_neurons, n_features]
Returns:
distances: [batch_size, row_neurons, col_neurons]
"""
# Your custom distance calculation
return distances
Can I save and load trained SOMs?
Yes, use PyTorch’s standard mechanisms:
# Save
torch.save(som.state_dict(), 'som_weights.pth')
# Load
som = SOM(x=10, y=10, num_features=4)
som.load_state_dict(torch.load('som_weights.pth'))
Integration Questions
How do I cite TorchSOM in my research?
Please cite TorchSOM as:
# GitHub Repository
@software{Berthier_TorchSOM_The_Reference_2025,
author={Berthier, Louis},
title={TorchSOM: The Reference PyTorch Library for Self-Organizing Maps},
url={https://github.com/michelin/TorchSOM},
version={1.0.0},
year={2025}
}
# Conference Paper
@inproceedings{Berthier2025TorchSOM,
title={TorchSOM: A Scalable PyTorch-Compatible Library for Self-Organizing Maps},
author={Berthier, Louis},
booktitle={Conference Name},
year={2025}
}
Getting Help
Where can I get more help?
`Documentation <https://opensource.michelin.io/TorchSOM/>`_: Check our comprehensive guides
`GitHub Issues <https://github.com/michelin/TorchSOM/issues>`_: Report bugs and request features
`Notebooks <https://github.com/michelin/TorchSOM/tree/main/notebooks>`_: See our tutorial notebooks.
How do I report a bug?
Please include:
TorchSOM version:
torchsom.__version__
Python version:
python --version
PyTorch version:
torch.__version__
Operating system: Linux/macOS/Windows
Minimal reproduction example
Full error traceback
Can I contribute to TorchSOM?
Yes! We welcome contributions:
Fork the repository
Create a feature branch
Add tests for new functionality
Submit a pull request
Follow our coding standards
See our contributing guide for details.