Frequently Asked Questions

General Questions

What is TorchSOM?

TorchSOM is a modern PyTorch-based implementation of Self-Organizing Maps (SOMs), designed for efficient training and comprehensive visualization of high-dimensional data clustering and analysis.

How does TorchSOM differ from other SOM implementations?

TorchSOM offers several advantages:

GPU acceleration through PyTorch
Modern Python practices with type hints and Pydantic validation
Comprehensive visualization suite with matplotlib integration
Flexible architecture supporting multiple SOM variants

Installation and Setup

Which Python versions are supported?

We recommend using Python 3.9+.

Do I need a GPU to use TorchSOM?

No, TorchSOM works on both CPU and GPU. However, GPU acceleration significantly improves training speed for large datasets and maps. We recommend using a GPU for training.

Data Preprocessing

Should I always normalize my data?

Yes, normalization is crucial because:

Features with larger scales dominate the distance calculation
SOM learning is sensitive to feature magnitudes
StandardScaler or MinMaxScaler from scikit-learn both work well

What about categorical features?

SOMs operate exclusively on numerical data. Therefore, it is essential to convert any categorical features into a numerical format before using them with TorchSOM. Common strategies include:

One-hot encoding for nominal (unordered) categories
Ordinal encoding for ordered categories
Target or frequency encoding for high-cardinality categories

If your dataset contains a mix of numerical and categorical features, ensure all features are numerically encoded prior to training.

Similarly, when visualizing classification or label maps, assign numerical levels to each class or category to enable proper mapping and interpretation in the visualization outputs.

Performance and Optimization

My training is very slow. How can I speed it up?

Try these optimizations:

Enable GPU: Use device="cuda" if available
Increase batch size: Try 64, 128, or 256
Reduce map size: Start smaller and scale up
Use PCA initialization: initialization_mode="pca"
Reduce epochs: Monitor convergence and stop early

How much memory does TorchSOM use?

Memory usage depends on:

Map size: O(x × y × num_features)
Batch size: Larger batches use more memory
Data size: Keep datasets in reasonable sizes

For large datasets, consider: - Processing in batches - Using CPU instead of GPU - Reducing precision (float32 vs float64)

Visualization Issues

Why are some neurons white in my visualizations?

White neurons typically indicate:

Unactivated neurons: No data points assigned as BMU
Zero values: In some visualizations, zero values appear white
NaN values: Missing or invalid calculations

This is normal for sparse data or oversized maps.

How do I interpret the distance map (D-Matrix)?

In the D-Matrix:

Light areas: High distances between neighboring neurons (cluster boundaries)
Dark areas: Low distances (within clusters)
Patterns: Reveal cluster structure and boundaries

Can I customize the visualization colors?

Yes! Use the VisualizationConfig:

from torchsom.visualization.config import VisualizationConfig

config = VisualizationConfig(
    cmap="plasma",        # Use a different colormap
    figsize=(15, 10),     # Set larger figure size
    dpi=300               # Set higher resolution
)

Advanced Topics

Can I use TorchSOM for time series data?

TorchSOM is designed to work with tabular data, meaning any data type—such as time series, images, or text—can be used as long as it is represented in a tabular (2D array) format. This typically means that each sample should be a fixed-length feature vector.

For time series or other complex data types, you can preprocess your data to obtain such representations. Common approaches include extracting statistical features, flattening fixed-length windows, or generating embeddings (e.g., using autoencoders or other neural networks) before projecting them onto the SOM map. As long as your data can be converted into a matrix of shape [n_samples, n_features], it can be used with TorchSOM.

How do I implement custom distance functions?

Create a function following the signature:

def custom_distance(data, weights):
    """
    Args:
        data: [batch_size, 1, 1, n_features]
        weights: [1, row_neurons, col_neurons, n_features]
    Returns:
        distances: [batch_size, row_neurons, col_neurons]
    """
    # Your custom distance calculation
    return distances

Can I save and load trained SOMs?

Yes, use PyTorch’s standard mechanisms:

# Save
torch.save(som.state_dict(), 'som_weights.pth')

# Load
som = SOM(x=10, y=10, num_features=4)
som.load_state_dict(torch.load('som_weights.pth'))

Integration Questions

How do I cite TorchSOM in my research?

Please cite TorchSOM as:

# GitHub Repository
@software{Berthier_TorchSOM_The_Reference_2025,
    author={Berthier, Louis},
    title={TorchSOM: The Reference PyTorch Library for Self-Organizing Maps},
    url={https://github.com/michelin/TorchSOM},
    version={1.0.0},
    year={2025}
}

# Conference Paper
@inproceedings{Berthier2025TorchSOM,
    title={TorchSOM: A Scalable PyTorch-Compatible Library for Self-Organizing Maps},
    author={Berthier, Louis},
    booktitle={Conference Name},
    year={2025}
}

Getting Help

Where can I get more help?

`Documentation <https://opensource.michelin.io/TorchSOM/>`_: Check our comprehensive guides
`GitHub Issues <https://github.com/michelin/TorchSOM/issues>`_: Report bugs and request features
`Notebooks <https://github.com/michelin/TorchSOM/tree/main/notebooks>`_: See our tutorial notebooks.

How do I report a bug?

Please include:

TorchSOM version: torchsom.__version__
Python version: python --version
PyTorch version: torch.__version__
Operating system: Linux/macOS/Windows
Minimal reproduction example
Full error traceback

Can I contribute to TorchSOM?

Yes! We welcome contributions:

Fork the repository
Create a feature branch
Add tests for new functionality
Submit a pull request
Follow our coding standards

See our contributing guide for details.