Just-in-Time Learning¶
Just-in-time learning (JITL) builds a small, local model on demand: for each new
query, it retrieves the most relevant historical samples and trains (or predicts) on
just those. A SOM is a natural retrieval index for this — its topology already places
similar samples on neighboring neurons. TorchSOM exposes this through
collect_samples(), which gathers the historical samples that
land on the query’s BMU and its neighborhood.
This pattern is common in industrial soft sensing and adaptive monitoring, where the process drifts and a single global model goes stale.
The retrieval index¶
collect_samples needs a BMU→sample-index map of the historical data — the same
"bmus_data" map used by the visualizations. Build it once after training:
som.initialize_weights(data=historical_samples, mode="pca")
som.fit(data=historical_samples)
bmus_idx_map = som.build_map("bmus_data", data=historical_samples)
bmus_idx_map maps each grid cell (i, j) to the list of historical-sample
indices whose BMU is that cell.
Retrieving samples for a query¶
data_buffer, output_buffer = som.collect_samples(
query_sample=query, # tensor [num_features]
historical_samples=historical_samples, # [N, num_features]
historical_outputs=historical_outputs, # [N]
bmus_idx_map=bmus_idx_map,
retrieval_mode="bmu_neighborhood_knn",
min_buffer_threshold=50,
)
It returns the matched inputs and their outputs (data_buffer, output_buffer),
ready to fit a local regressor. Pass return_indices=True to also get the indices
of the retrieved samples.
Retrieval modes¶
The retrieval_mode argument trades recall against locality. All modes start from
the query’s BMU cell; they differ in how far they expand:
|
Expands? |
Strategy |
|---|---|---|
|
No |
Only samples mapped to the query’s BMU cell. Tightest, smallest buffer. |
|
Topological |
BMU plus its grid neighbors up to |
|
Topological + KNN |
Same as above, then a nearest-neighbor fallback in weight space when the
buffer is still below |
The KNN fallback in the default mode guarantees a usable buffer size even in sparse
regions of the map: if the BMU and its neighbors hold too few samples, the nearest
remaining neurons (by codebook distance) are pulled in until
min_buffer_threshold is exceeded. The neighborhood extent is the SOM’s
neighborhood_order, and under periodic boundary conditions
the neighborhood wraps across edges.
Typical workflow¶
from sklearn.linear_model import LinearRegression
# 1. Train the SOM once on the historical buffer and index it
som.initialize_weights(data=historical_samples, mode="pca")
som.fit(data=historical_samples)
bmus_idx_map = som.build_map("bmus_data", data=historical_samples)
# 2. For each incoming query, retrieve a local set and fit a local model
for query in stream:
X_local, y_local = som.collect_samples(
query_sample=query,
historical_samples=historical_samples,
historical_outputs=historical_outputs,
bmus_idx_map=bmus_idx_map,
retrieval_mode="bmu_neighborhood_knn",
)
local_model = LinearRegression().fit(
X_local.cpu().numpy(), y_local.cpu().numpy()
)
prediction = local_model.predict(query.reshape(1, -1).cpu().numpy())
Choosing a mode¶
Start with
"bmu_neighborhood_knn"(the default) — it adapts the buffer size to local data density.Use
"bmu_neighborhood"when you want strictly local samples and accept a variable, possibly small, buffer.Use
"bmu_only"for the most local model, or to inspect exactly which samples a single neuron represents.Tune
min_buffer_thresholdto the minimum sample count your local model needs.
Next steps¶
Topologies & Boundary Conditions — How
neighborhood_orderand PBC shape retrievalBasic Concepts — The latent representation behind retrieval
Core API —
collect_samplesandbuild_mapreference