Troubleshooting
This guide helps you resolve common issues when using TorchSOM.
Installation Issues
Package Issues
ImportError: No module named 'torchsom'
Problem: TorchSOM is not installed or not in Python path.
Solutions:
Install TorchSOM:
pip install torchsom
If using conda environment, make sure it’s activated:
conda activate your_environment pip install torchsom
Check installation:
import torchsom print(torchsom.__version__)
CUDA/GPU Issues
RuntimeError: CUDA out of memory
Problem: GPU memory is exhausted during training.
Solutions:
Reduce batch size:
som = SOM(x=10, y=10, num_features=4, batch_size=16) # Smaller batch
Use CPU instead:
som = SOM(x=10, y=10, num_features=4, device="cpu")
Clear GPU cache:
import torch torch.cuda.empty_cache()
Reduce map size:
som = SOM(x=8, y=8, num_features=4) # Smaller SOM
CUDA not available
Problem: torch.cuda.is_available()
returns False
.
Diagnostic steps:
import torch
print(f"CUDA available: {torch.cuda.is_available()}")
print(f"CUDA version: {torch.version.cuda}")
print(f"PyTorch version: {torch.__version__}")
Solutions:
Install CUDA-enabled PyTorch:
# For CUDA 11.8 pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
Check CUDA installation:
nvidia-smi nvcc --version
Use CPU if no GPU available:
device = "cuda" if torch.cuda.is_available() else "cpu" som = SOM(x=10, y=10, num_features=4, device=device)
Training Problems
Training doesn’t converge
Symptoms: Quantization error doesn’t decrease or fluctuates wildly.
Diagnostic:
# Monitor training progress
q_errors, t_errors = som.fit(data)
import matplotlib.pyplot as plt
plt.plot(q_errors)
plt.title('Quantization Error')
plt.show()
Common causes and solutions:
Learning rate too high:
som = SOM(x=10, y=10, num_features=4, learning_rate=0.1) # Lower LR
Data not normalized:
from sklearn.preprocessing import StandardScaler scaler = StandardScaler() data_scaled = scaler.fit_transform(data) data_tensor = torch.tensor(data_scaled, dtype=torch.float32)
Poor initialization:
som = SOM(x=10, y=10, num_features=4, initialization_mode="pca")
Map too large:
# Rule of thumb: 5-10x fewer neurons than data points data_size = len(data) map_size = int(np.sqrt(data_size / 7)) som = SOM(x=map_size, y=map_size, num_features=4)
Very slow training
Problem: Training takes much longer than expected.
Performance optimization:
Enable GPU acceleration:
som = SOM(x=10, y=10, num_features=4, device="cuda")
Increase batch size:
som = SOM(x=10, y=10, num_features=4, batch_size=128)
Use PCA initialization:
som = SOM(x=10, y=10, num_features=4, initialization_mode="pca")
Reduce epochs if acceptable:
som = SOM(x=10, y=10, num_features=4, epochs=50)
Profile your code:
import time start_time = time.time() som.fit(data) print(f"Training time: {time.time() - start_time:.2f} seconds")
NaN values in results
Problem: Getting NaN values in errors or visualizations.
Diagnostic:
# Check for NaN in data
print(f"NaN in data: {torch.isnan(data).any()}")
# Check SOM weights
print(f"NaN in weights: {torch.isnan(som.weights).any()}")
Solutions:
Check input data:
# Remove NaN values data_clean = data[~torch.isnan(data).any(dim=1)] # Or impute missing values from sklearn.impute import SimpleImputer imputer = SimpleImputer(strategy='mean') data_imputed = imputer.fit_transform(data.numpy()) data_clean = torch.tensor(data_imputed, dtype=torch.float32)
Reduce learning rate:
som = SOM(x=10, y=10, num_features=4, learning_rate=0.1)
Check for inf values:
data = torch.clamp(data, min=-1e6, max=1e6) # Clip extreme values
Visualization Issues
Empty or white visualizations
Problem: Visualizations appear blank or mostly white.
Possible causes:
No data passed to visualization:
# Make sure to pass data to hit map viz.plot_hit_map(data=data_tensor)
All neurons have same values:
# Check weight variance weights = som.weights.detach().cpu().numpy() print(f"Weight std: {np.std(weights)}")
Colormap issues:
# Try different colormap from torchsom.visualization import VisualizationConfig config = VisualizationConfig(cmap="viridis") viz = SOMVisualizer(som, config=config)
Figures not displaying
Problem: Plots don’t show up in Jupyter notebooks or scripts.
Solutions:
For Jupyter notebooks:
%matplotlib inline import matplotlib.pyplot as plt
For scripts:
import matplotlib.pyplot as plt # ... create plots ... plt.show() # Don't forget this
Save figures instead:
viz.plot_distance_map(save_path="results", fig_name="distance_map")
Poor visualization quality
Problem: Plots look pixelated or unclear.
Solutions:
Increase resolution:
config = VisualizationConfig(dpi=300) viz = SOMVisualizer(som, config=config)
Larger figure size:
config = VisualizationConfig(figsize=(12, 10)) viz = SOMVisualizer(som, config=config)
Better colormap:
config = VisualizationConfig(cmap="plasma") viz = SOMVisualizer(som, config=config)
Data Issues
Poor clustering results
Problem: SOM doesn’t find meaningful clusters.
Diagnostic steps:
Visualize raw data:
from sklearn.decomposition import PCA from sklearn.manifold import TSNE # PCA visualization pca = PCA(n_components=2) data_pca = pca.fit_transform(data.numpy()) plt.scatter(data_pca[:, 0], data_pca[:, 1]) plt.title('Data in PCA space') plt.show()
Check data distribution:
print(f"Data shape: {data.shape}") print(f"Data mean: {data.mean(dim=0)}") print(f"Data std: {data.std(dim=0)}")
Compare with K-means:
from sklearn.cluster import KMeans kmeans = KMeans(n_clusters=3) kmeans_labels = kmeans.fit_predict(data.numpy())
Solutions:
Better preprocessing:
# Remove outliers from sklearn.preprocessing import RobustScaler scaler = RobustScaler() data_scaled = scaler.fit_transform(data.numpy())
Feature selection:
# Remove highly correlated features import pandas as pd df = pd.DataFrame(data.numpy()) corr_matrix = df.corr().abs() # Remove features with correlation > 0.95
Adjust SOM parameters:
som = SOM( x=15, y=15, # Larger map num_features=data.shape[1], epochs=200, # More training learning_rate=0.2, sigma=3.0 # Larger neighborhood )
Configuration Errors
ValidationError from Pydantic
Problem: Configuration validation fails.
Example error:
ValidationError: 1 validation error for SOMConfig
learning_rate
ensure this value is greater than 0 (type=value_error.number.not_gt)
Solution:
from torchsom.configs import SOMConfig
from pydantic import ValidationError
try:
config = SOMConfig(
x=10, y=10,
learning_rate=0.3, # Must be > 0
sigma=1.0, # Must be > 0
epochs=100 # Must be >= 1
)
except ValidationError as e:
print("Configuration errors:")
for error in e.errors():
print(f"- {error['loc'][0]}: {error['msg']}")
Parameter compatibility issues
Problem: Certain parameter combinations don’t work.
Common incompatibilities:
Sigma too large for map size:
# Problem: sigma=10 on 5x5 map som = SOM(x=5, y=5, num_features=4, sigma=2.0) # Better
Batch size larger than dataset:
batch_size = min(64, len(data)) som = SOM(x=10, y=10, num_features=4, batch_size=batch_size)
Memory Issues
Memory usage too high
Problem: TorchSOM uses too much RAM or GPU memory.
Memory usage breakdown:
- SOM weights: x * y * num_features * 4 bytes
(float32)
- Batch data: batch_size * num_features * 4 bytes
- Distance calculations: batch_size * x * y * 4 bytes
Solutions:
Reduce map size:
som = SOM(x=10, y=10, num_features=4) # Instead of 20x20
Smaller batch size:
som = SOM(x=10, y=10, num_features=4, batch_size=32)
Use CPU for large maps:
som = SOM(x=50, y=50, num_features=4, device="cpu")
Process data in chunks:
# For very large datasets chunk_size = 1000 for i in range(0, len(data), chunk_size): chunk = data[i:i+chunk_size] som.fit(chunk) # Incremental training
Memory leaks
Problem: Memory usage increases over time.
Solutions:
Clear GPU cache periodically:
import torch torch.cuda.empty_cache()
Use context managers:
with torch.no_grad(): # Inference operations bmus = som.identify_bmus(data)
Delete large variables:
del large_data_tensor torch.cuda.empty_cache()
Getting Help
Diagnostic Information
When reporting issues, please include:
import torchsom
import torch
import sys
import platform
print("=== Diagnostic Information ===")
print(f"TorchSOM version: {torchsom.__version__}")
print(f"PyTorch version: {torch.__version__}")
print(f"Python version: {sys.version}")
print(f"Platform: {platform.platform()}")
print(f"CUDA available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
print(f"CUDA version: {torch.version.cuda}")
print(f"GPU count: {torch.cuda.device_count()}")
for i in range(torch.cuda.device_count()):
print(f"GPU {i}: {torch.cuda.get_device_name(i)}")
Creating Minimal Examples
For bug reports, create minimal reproducible examples:
import torch
from torchsom import SOM
# Minimal data
data = torch.randn(100, 4)
# Minimal SOM
som = SOM(x=5, y=5, num_features=4, epochs=10)
# Show the problem
try:
som.fit(data)
except Exception as e:
print(f"Error: {e}")
raise
Where to Get Help
Documentation: Check our comprehensive guides first
FAQ: Review the Frequently Asked Questions for common questions
GitHub Issues: Report bugs with minimal examples
GitHub Discussions: Ask questions and share experiences
Stack Overflow: Tag questions with
torchsom
andpytorch
Debug Mode
Enable debug logging for more detailed information:
import logging
logging.basicConfig(level=logging.DEBUG)
# Your TorchSOM code here
som = SOM(x=10, y=10, num_features=4)
som.fit(data)