Boston Housing — Regression¶
The Boston Housing dataset (506 samples, 13 numeric features, one continuous target)
is a standard regression benchmark. The target MEDV is the median home value. This
tutorial trains a map, checks convergence, and reads the structure through the
U-matrix and hit map, then turns to the regression-specific views: the mean and std
target maps, the score map, and the rank map.
Note
Full runnable notebook: notebooks/boston_housing.ipynb. The figures below are its outputs.
1. Load and standardize the data¶
The dataset ships in the repo as a CSV. The features are every column except the last;
MEDV is the continuous target. The BMU search compares raw feature distances, so
standardizing the features is essential.
import torch
import pandas as pd
from sklearn.preprocessing import StandardScaler
# boston_housing.csv ships in the repo under data/notebooks/
df = pd.read_csv("data/notebooks/boston_housing.csv")
feature_df = df.iloc[:, :-1] # all columns except the last
target_series = df.iloc[:, -1] # last column: MEDV
features = torch.tensor(
StandardScaler().fit_transform(feature_df), dtype=torch.float32
)
targets = torch.tensor(target_series.values, dtype=torch.float32) # continuous
feature_names = list(feature_df.columns)
2. Train the SOM¶
from torchsom import SOM
som = SOM(
x=25,
y=15,
num_features=features.shape[1],
epochs=100,
batch_size=16,
sigma=1.45,
learning_rate=0.95,
neighborhood_order=3,
topology="rectangular",
initialization_mode="pca",
random_seed=42,
)
som.initialize_weights(data=features, mode=som.initialization_mode)
q_errors, t_errors = som.fit(data=features)
3. Check convergence¶
from torchsom import SOMVisualizer
viz = SOMVisualizer(som=som)
viz.plot_training_errors(
quantization_errors=q_errors, topographic_errors=t_errors
)
Both errors fall and flatten — training is long enough.
4. Map structure¶
The U-matrix exposes cluster boundaries; the hit map shows where the data lands.
viz.plot_distance_map(
distance_metric=som.distance_fn_name,
neighborhood_order=som.neighborhood_order,
)
viz.plot_hit_map(data=features)
5. Target landscape¶
Build the BMU→sample map once, then summarize the target over each neuron. The mean map is a smooth regression surface over the topology: neighboring neurons hold similar predicted values. The std map flags neurons whose mapped samples disagree on the target, marking regions where a single prediction is less trustworthy.
bmus_map = som.build_map("bmus_data", data=features)
viz.plot_metric_map(
bmus_data_map=bmus_map,
data=features,
target=targets,
reduction_parameter="mean",
)
viz.plot_metric_map(
bmus_data_map=bmus_map,
data=features,
target=targets,
reduction_parameter="std",
)
6. Per-neuron reliability¶
The score map combines target variance, sample count, and statistical significance into a single value where lower is better. The rank map orders neurons by std, so rank 1 is the lowest-std, most reliable neuron. Together they pinpoint which neurons give trustworthy regression estimates.
viz.plot_score_map(
bmus_data_map=bmus_map,
target=targets,
total_samples=features.shape[0],
)
viz.plot_rank_map(bmus_data_map=bmus_map, target=targets)
Next steps¶
Energy Efficiency — Multi-target Regression — Another regression example
Visualization Gallery — Every plot explained
Clustering — Group neurons into clusters