Quantum · ibm_fez · 156 qubits · 106-day window

Seventeen shared-resource clusters discovered on IBM's ibm_fez. Fifteen validated across multiple coordinated-motion events.

Nervous Machine ran on ibm_fez for 106 days from 2026-02-12 to 2026-05-28, registering 61,708 predictions — each one timestamped before the next-day calibration telemetry it targets — and scoring them against actuals as they arrived. The clusters were discovered from raw telemetry alone, against neutral priors, with no coupling-map input and no vendor-side hints. The scorecard, the benchmark against standard tools, and the calibration breakdown are below.

Book a 30-minute walk-through See the scorecard

Cluster scorecard

Seventeen clusters. Two events were the cutoff at the start of the data window; the other fifteen recurred.

A cluster is registered when a group of qubits shows persistent within-group correlation above a noise-floor null distribution. Confirmation requires at least two distinct coordinated-motion events at least 14 days apart, each with within-cluster |r| above 0.5 in a 7-day rolling window. The two single-event clusters at the bottom of the table were registered close enough to the start of the collection window that earlier events could be present in the archive but outside what we pulled.

Cluster	Type	Size	Members	Events	T1 co-moves	Peak \|r\|
78e15a79	chain	2	Q14, Q85	4	1/4	0.999
1cee4ce7	chain	2	Q20, Q153	3	2/3	0.997
3625b1c4	chain	2	Q58, Q109	3	1/3	0.984
49b87d14	chain	2	Q81, Q142	3	0/3	0.975
8ef82ebd★	chain	9	Q17, Q24, Q36, Q52, Q53, Q55, Q61, Q73, Q91	3	2/3	0.811
d7dd4ee8	regional	2	Q115, Q132	2	0/2	0.999
f441a7fb	regional	2	Q6, Q8	2	0/2	0.994
72cad12f	chain	3	Q1, Q7, Q135	2	0/2	0.993
1fa2d19a	chain	2	Q22, Q136	2	1/2	0.988
6786bc2e	chain	2	Q103, Q113	2	0/2	0.982
51e064db	chain	2	Q9, Q146	2	0/2	0.982
ca1cdf2a	chain	4	Q119, Q124, Q131, Q143	2	0/2	0.981
2f82883c	chain	2	Q31, Q75	2	0/2	0.974
4c96bbb9	chain	2	Q15, Q116	2	1/2	0.953
fe542648	chain	2	Q66, Q117	2	1/2	0.939
16594770	chain	2	Q40, Q100	1	0/1	0.992
6e26122e	regional	2	Q84, Q107	1	0/1	0.990

★ The 9-qubit cluster is the one we wrote up in detail on Substack: How uncertainty evolved around a 9-qubit cluster. T1 co-moves counts events with simultaneous T1 movement; the remainder are readout-only.

Benchmark against standard tools

How the substrate compares to the five tools an engineer already has.

The substrate's event-window detector was benchmarked against five standard methods on the same 9-qubit telemetry, 106 days, T1 and readout error from MongoDB. Event-window AUC measures how cleanly each method separates inside-event days from outside-event days in the 12-day March cluster event.

Method	What it natively produces	Event-window AUC
Nervous Machine (rolling bipartite)	Per-edge causal certainty + structural signal	0.718
Spectral clustering	Static partition once you choose k	diagnostic
Hierarchical clustering	Dendrogram + cluster threshold	diagnostic
PC algorithm (causal-learn)	Static DAG over variables	diagnostic
Per-qubit isolation forest	Per-(qubit, day) outlier flag	0.369
Per-day z-score	Per-(qubit, day) anomaly flag	0.344

Spectral, hierarchical, and PC are diagnostic methods that produce partitions or graphs, not real-time event detectors — they're not directly AUC-comparable. The per-qubit detectors are. Both score below 0.5 on the event because the event manifests as correlated structure across the cluster; per-qubit detectors observe each qubit independently and miss the correlation by construction.

On the A/B bipartite split inside the 9-qubit cluster, spectral and hierarchical clustering recovered the partition exactly (Adjusted Rand Index = 1.000) when given the right 9-qubit correlation matrix — that finding isn't the substrate's differentiator. The differentiator is that the substrate flagged the two qubits the bipartite analysis would later split on (Q55 and Q61) by leaving their certainty Z stalled in the 0.40–0.65 band before any bipartite analysis ran.

Calibration breakdown

PERSISTENT_FAIL holds 71% precision at both 1-day and 7-day horizons — a 9× lift over base rate that doesn't decay with the prediction window.

Per-qubit failure flags typically lose precision as you extrapolate further out. The substrate's PERSISTENT_FAIL flag doesn't. Below is the breakdown of verdict precision against the base rate of each actual state at horizon, across 59,192 scored predictions on all 156 qubits. Lift = precision divided by the base rate at horizon — it's the right way to compare alerting signals, and it's what separates a real alert from a re-statement of the prior.

Predicted verdict	1d precision	1d base rate	1d lift	7d precision	7d base rate	7d lift
PERSISTENT_FAIL	71%	7.5%	9.5×	71%	8.2%	8.7×
DEGRADING	43%	11.4%	3.8×	11%	11.7%	0.9×

Three operational tiers fall out of this. PERSISTENT_FAIL is a real alert at both horizons — 9.5× lift at 1d and 8.7× lift at 7d on the same 71% precision is genuinely unusual for a per-qubit failure flag. DEGRADING is a usable short-horizon signal at 1d (3.8× lift) — useful for next-day scheduling decisions like deprioritizing flagged qubits for sensitive work — but the lift collapses to 0.9× at 7d, where the flag stops carrying signal. TRANSIENT_ANOMALY and DEGRADING extrapolated to 7d are roadmap — the substrate produces them, but their precision needs recalibration before they enter a paid alerting product.

WAIT_ENVIRONMENTAL is a different signal class entirely. It's rare by design — it fires when multiple qubits across the chip show signs of an environmental event simultaneously, and chip-wide events are uncommon. The May 8–16 window, in which 85 of 156 qubits lost ≥30% of T1 over a week, is the canonical example. Per-qubit horizon precision on the flag is noisy by construction because environmental events recover stochastically; the chip-wide co-fire pattern is the right diagnostic. The operational action it implies is wait, not reroute. Some of the apparent DEGRADING → HEALTHY → PERSISTENT_FAIL flapping observed in individual qubits during environmental windows is attributable to those qubits transiting an environmental event that WAIT_ENVIRONMENTAL was already tracking at the chip level.

Two evaluation modes the dashboard tracks separately, and the page names both on purpose. State forecasts — predict which verdict label a qubit will land in at horizon — score 57% overall state-match across all five states at 1d. That's the operational product the alerts are built on. Value forecasts — predict the exact T1 or readout-error number — score 51% overall value-accuracy. The value bar is harder, and serves as a residual diagnostic (Q53's +11.82σ z_residual on May 12 was the canonical example) rather than the basis of the alerting product. Leading with state-match without naming the value-accuracy bar would obscure how the product actually evaluates.

The 9-qubit cluster

The first cluster we wrote up — and the first one where the substrate told us, in advance, where to look.

The 9-qubit cluster 8ef82ebd (Q17, Q24, Q36, Q52, Q53, Q55, Q61, Q73, Q91) was the original case. It had a 12-day March event window, a bipartite A/B split in the readout-error correlation structure, and a mechanism that localized to shared readout electronics rather than coherence. A second, larger bipartite event in May was visible in the substrate's signal before the calibration verdicts caught up to it.

The substrate's certainty for two of the nine members (Q55 at 0.413, Q61 at 0.522) sat in the stalled band 0.40–0.65 despite 50+ updates. Those are exactly the two qubits that straddle the bipartite A/B partition the cluster later split on. The framework raised its hand about the right partition before the partition was identified by other means. The full piece is on Substack: How uncertainty evolved around a 9-qubit cluster.

For chip and platform teams

A method that transfers off the quantum chip.

ibm_fez is the public proof. The substrate is built to run on any device that exposes a per-element calibration feed — quantum hardware, classical silicon, HPC cooling fleets, or model-based forecasting systems. The methodology stays the same. The integration is what's bespoke.

Discovery without priors

Shared-resource clusters surface from telemetry alone, against neutral priors. No coupling map, no vendor hints, no topology assumptions.

Per-edge certainty, not chip-wide scores

Every causal relationship carries its own evolving certainty. The substrate tells you which conclusions it stands behind and which it doesn't.

Correlation-aware signal

Structural signals are first-class outputs. The kinds of events per-qubit detectors are blind to are exactly what the substrate is built to surface.

Deployable on your telemetry

Raw data stays on-site. Calibrated findings cross organizational boundaries. The production implementation is what we license.