Troubleshooting

This guide helps you resolve common issues when using TorchSOM.

Installation Issues

Package Issues

ImportError: No module named 'torchsom'

Problem: TorchSOM is not installed or not in Python path.

Solutions:

Install TorchSOM:
```
pip install torchsom
```

If using conda environment, make sure it’s activated:

conda activate your_environment
pip install torchsom

Check installation:

import torchsom
print(torchsom.__version__)

CUDA/GPU Issues

RuntimeError: CUDA out of memory

Problem: GPU memory is exhausted during training.

Solutions:

Reduce batch size:

som = SOM(x=10, y=10, num_features=4, batch_size=16)  # Smaller batch

Use CPU instead:

som = SOM(x=10, y=10, num_features=4, device="cpu")

Clear GPU cache:
```
import torch
torch.cuda.empty_cache()
```

Reduce map size:

som = SOM(x=8, y=8, num_features=4)  # Smaller SOM

CUDA not available

Problem: torch.cuda.is_available() returns False.

Diagnostic steps:

import torch
print(f"CUDA available: {torch.cuda.is_available()}")
print(f"CUDA version: {torch.version.cuda}")
print(f"PyTorch version: {torch.__version__}")

Solutions:

Install CUDA-enabled PyTorch:

# For CUDA 11.8
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

Check CUDA installation:
```
nvidia-smi
nvcc --version
```

Use CPU if no GPU available:

device = "cuda" if torch.cuda.is_available() else "cpu"
som = SOM(x=10, y=10, num_features=4, device=device)

Training Problems

Training doesn’t converge

Symptoms: Quantization error doesn’t decrease or fluctuates wildly.

Diagnostic:

# Monitor training progress
q_errors, t_errors = som.fit(data)

import matplotlib.pyplot as plt
plt.plot(q_errors)
plt.title('Quantization Error')
plt.show()

Common causes and solutions:

Learning rate too high:

som = SOM(x=10, y=10, num_features=4, learning_rate=0.1)  # Lower LR

Data not normalized:

from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
data_scaled = scaler.fit_transform(data)
data_tensor = torch.tensor(data_scaled, dtype=torch.float32)

Poor initialization:

som = SOM(x=10, y=10, num_features=4, initialization_mode="pca")

Map too large:

# Rule of thumb: 5-10x fewer neurons than data points
data_size = len(data)
map_size = int(np.sqrt(data_size / 7))
som = SOM(x=map_size, y=map_size, num_features=4)

Very slow training

Problem: Training takes much longer than expected.

Performance optimization:

Enable GPU acceleration:

som = SOM(x=10, y=10, num_features=4, device="cuda")

Increase batch size:

som = SOM(x=10, y=10, num_features=4, batch_size=128)

Use PCA initialization:

som = SOM(x=10, y=10, num_features=4, initialization_mode="pca")

Reduce epochs if acceptable:

som = SOM(x=10, y=10, num_features=4, epochs=50)

Profile your code:

import time
start_time = time.time()
som.fit(data)
print(f"Training time: {time.time() - start_time:.2f} seconds")

NaN values in results

Problem: Getting NaN values in errors or visualizations.

Diagnostic:

# Check for NaN in data
print(f"NaN in data: {torch.isnan(data).any()}")

# Check SOM weights
print(f"NaN in weights: {torch.isnan(som.weights).any()}")

Solutions:

Check input data:

# Remove NaN values
data_clean = data[~torch.isnan(data).any(dim=1)]

# Or impute missing values
from sklearn.impute import SimpleImputer
imputer = SimpleImputer(strategy='mean')
data_imputed = imputer.fit_transform(data.numpy())
data_clean = torch.tensor(data_imputed, dtype=torch.float32)

Reduce learning rate:

som = SOM(x=10, y=10, num_features=4, learning_rate=0.1)

Check for inf values:

data = torch.clamp(data, min=-1e6, max=1e6)  # Clip extreme values

Visualization Issues

Empty or white visualizations

Problem: Visualizations appear blank or mostly white.

Possible causes:

No data passed to visualization:

# Make sure to pass data to hit map
viz.plot_hit_map(data=data_tensor)

All neurons have same values:

# Check weight variance
weights = som.weights.detach().cpu().numpy()
print(f"Weight std: {np.std(weights)}")

Colormap issues:

# Try different colormap
from torchsom.visualization import VisualizationConfig
config = VisualizationConfig(cmap="viridis")
viz = SOMVisualizer(som, config=config)

Figures not displaying

Problem: Plots don’t show up in Jupyter notebooks or scripts.

Solutions:

For Jupyter notebooks:

%matplotlib inline
import matplotlib.pyplot as plt

For scripts:

import matplotlib.pyplot as plt
# ... create plots ...
plt.show()  # Don't forget this

Save figures instead:

viz.plot_distance_map(save_path="results", fig_name="distance_map")

Poor visualization quality

Problem: Plots look pixelated or unclear.

Solutions:

Increase resolution:

config = VisualizationConfig(dpi=300)
viz = SOMVisualizer(som, config=config)

Larger figure size:

config = VisualizationConfig(figsize=(12, 10))
viz = SOMVisualizer(som, config=config)

Better colormap:

config = VisualizationConfig(cmap="plasma")
viz = SOMVisualizer(som, config=config)

Data Issues

Poor clustering results

Problem: SOM doesn’t find meaningful clusters.

Diagnostic steps:

Visualize raw data:

from sklearn.decomposition import PCA
from sklearn.manifold import TSNE

# PCA visualization
pca = PCA(n_components=2)
data_pca = pca.fit_transform(data.numpy())
plt.scatter(data_pca[:, 0], data_pca[:, 1])
plt.title('Data in PCA space')
plt.show()

Check data distribution:

print(f"Data shape: {data.shape}")
print(f"Data mean: {data.mean(dim=0)}")
print(f"Data std: {data.std(dim=0)}")

Compare with K-means:

from sklearn.cluster import KMeans
kmeans = KMeans(n_clusters=3)
kmeans_labels = kmeans.fit_predict(data.numpy())

Solutions:

Better preprocessing:

# Remove outliers
from sklearn.preprocessing import RobustScaler
scaler = RobustScaler()
data_scaled = scaler.fit_transform(data.numpy())

Feature selection:

# Remove highly correlated features
import pandas as pd
df = pd.DataFrame(data.numpy())
corr_matrix = df.corr().abs()
# Remove features with correlation > 0.95

Adjust SOM parameters:

som = SOM(
    x=15, y=15,  # Larger map
    num_features=data.shape[1],
    epochs=200,  # More training
    learning_rate=0.2,
    sigma=3.0  # Larger neighborhood
)

Configuration Errors

ValidationError from Pydantic

Problem: Configuration validation fails.

Example error:

ValidationError: 1 validation error for SOMConfig
learning_rate
  ensure this value is greater than 0 (type=value_error.number.not_gt)

Solution:

from torchsom.configs import SOMConfig
from pydantic import ValidationError

try:
    config = SOMConfig(
        x=10, y=10,
        learning_rate=0.3,  # Must be > 0
        sigma=1.0,          # Must be > 0
        epochs=100          # Must be >= 1
    )
except ValidationError as e:
    print("Configuration errors:")
    for error in e.errors():
        print(f"- {error['loc'][0]}: {error['msg']}")

Parameter compatibility issues

Problem: Certain parameter combinations don’t work.

Common incompatibilities:

Sigma too large for map size:

# Problem: sigma=10 on 5x5 map
som = SOM(x=5, y=5, num_features=4, sigma=2.0)  # Better

Batch size larger than dataset:

batch_size = min(64, len(data))
som = SOM(x=10, y=10, num_features=4, batch_size=batch_size)

Memory Issues

Memory usage too high

Problem: TorchSOM uses too much RAM or GPU memory.

Memory usage breakdown: - SOM weights: x * y * num_features * 4 bytes (float32) - Batch data: batch_size * num_features * 4 bytes - Distance calculations: batch_size * x * y * 4 bytes

Solutions:

Reduce map size:

som = SOM(x=10, y=10, num_features=4)  # Instead of 20x20

Smaller batch size:

som = SOM(x=10, y=10, num_features=4, batch_size=32)

Use CPU for large maps:

som = SOM(x=50, y=50, num_features=4, device="cpu")

Process data in chunks:

# For very large datasets
chunk_size = 1000
for i in range(0, len(data), chunk_size):
     chunk = data[i:i+chunk_size]
     som.fit(chunk)  # Incremental training

Memory leaks

Problem: Memory usage increases over time.

Solutions:

Clear GPU cache periodically:
```
import torch
torch.cuda.empty_cache()
```

Use context managers:

with torch.no_grad():
     # Inference operations
     bmus = som.identify_bmus(data)

Delete large variables:

del large_data_tensor
torch.cuda.empty_cache()

Getting Help

Diagnostic Information

When reporting issues, please include:

import torchsom
import torch
import sys
import platform

print("=== Diagnostic Information ===")
print(f"TorchSOM version: {torchsom.__version__}")
print(f"PyTorch version: {torch.__version__}")
print(f"Python version: {sys.version}")
print(f"Platform: {platform.platform()}")
print(f"CUDA available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"CUDA version: {torch.version.cuda}")
    print(f"GPU count: {torch.cuda.device_count()}")
    for i in range(torch.cuda.device_count()):
        print(f"GPU {i}: {torch.cuda.get_device_name(i)}")

Creating Minimal Examples

For bug reports, create minimal reproducible examples:

import torch
from torchsom import SOM

# Minimal data
data = torch.randn(100, 4)

# Minimal SOM
som = SOM(x=5, y=5, num_features=4, epochs=10)

# Show the problem
try:
    som.fit(data)
except Exception as e:
    print(f"Error: {e}")
    raise

Where to Get Help

Documentation: Check our comprehensive guides first
FAQ: Review the Frequently Asked Questions for common questions
GitHub Issues: Report bugs with minimal examples
GitHub Discussions: Ask questions and share experiences
Stack Overflow: Tag questions with torchsom and pytorch

Debug Mode

Enable debug logging for more detailed information:

import logging
logging.basicConfig(level=logging.DEBUG)

# Your TorchSOM code here
som = SOM(x=10, y=10, num_features=4)
som.fit(data)