Basic Concepts
This page introduces the fundamental concepts behind Self-Organizing Maps (SOMs) and how they work.
What is a Self-Organizing Map?
A Self-Organizing Map (SOM), also known as a Kohonen map, is an unsupervised neural network algorithm that:
Clusters similar data points together
Reduces dimensionality by mapping high-dimensional data to a lower-dimensional grid, usually 2D
Preserves topology by keeping similar data points close together on the map
Visualizes patterns in complex, high-dimensional datasets
Key Characteristics:
Unsupervised: No labeled data required
Competitive learning: Neurons compete to represent input data
Topology preservation: Maintains neighborhood relationships
Dimensionality reduction: Maps N-dimensional data to 2D grid
How SOMs Work
The SOM Algorithm
Initialize weight vectors randomly for each neuron
Present input data to the network
Find the Best Matching Unit (BMU) - the neuron most similar to input
Update the BMU and its neighbors to be more similar to the input
Repeat until convergence or maximum iterations reached
Mathematical Foundation
Distance Calculation The similarity between input x and neuron w is typically measured using Euclidean distance:
Weight Update Rule The weight update follows:
Where: - \(\alpha(t)\) is the learning rate at time t - \(h_{BMU,i}(t)\) is the neighborhood function - \(x(t)\) is the input vector at time t
Core Components
1. Grid Topology
SOMs organize neurons in a grid structure:
- Rectangular Grid
Each neuron has up to 8 neighbors
Simple, intuitive visualization
Good for most applications
- Hexagonal Grid
Each neuron has up to 6 neighbors
More uniform neighborhood distances
Better for circular/radial patterns
2. Neighborhood Function
Determines how much each neuron is affected by the BMU:
- Gaussian (Most Common)
- \[h_{BMU,i}(t) = \exp\left(-\frac{d_{BMU,i}^2}{2\sigma(t)^2}\right)\]
- Bubble
Step function - neurons within radius are updated equally
- Triangle
Linear decay from BMU to neighborhood boundary
3. Learning Rate Decay
Controls how much weights change during training:
- Asymptotic Decay
- \[\alpha(t) = \frac{\alpha_0}{1 + t/T}\]
- Linear Decay
- \[\alpha(t) = \alpha_0 \cdot (1 - t/T)\]
4. Distance Functions
Different ways to measure similarity:
Euclidean: Standard geometric distance
Cosine: Measures angle between vectors
Manhattan: Sum of absolute differences
Chebyshev: Maximum absolute difference
5. Quality Metrics
Quantization Error: Average distance between data points and their BMUs. Lower is better, measures how well the map represents the data.
Topographic Error: Percentage of data points whose BMU and second-BMU are not neighbors. Lower is better, measures topology preservation.
Strengths and Weaknesses
Advantages
No assumptions about data distribution
Topology preservation maintains relationships
Intuitive visualization of complex data
Unsupervised learning - no labels needed
Limitations
Computationally expensive for large datasets
Parameter sensitive - requires tuning
Interpretation challenges for very high dimensions
Best Practices
Data Preparation
Normalize features to similar scales
Remove highly correlated features
Handle missing values appropriately
Consider dimensionality reduction for very high dimensions
Parameter Selection
Experiment with different topologies and functions
Monitor training progress with error curves to guide parameter choice
Interpretation
Use multiple visualizations to understand the map
Combine with domain knowledge for meaningful insights
Validate findings with other analysis methods
Document parameter choices for reproducibility
Next Steps
Now that you understand the basics, explore:
SOM Visualization Guide - Visualization techniques