Case study -- substrate in the wild

What a Claude Code session looks like with the substrate wired in.

Real session. A novel algorithm written, debugged, and shipped -- then naturally deposited into the substrate at the boundaries where context would otherwise be lost. Zero forced calls. Verbatim transcript with the tool fires annotated.

Inner-loop calls
0
write -> run -> fix loop
Seam calls
10
naturally fired at boundaries
Cross-modal recall hit
0.31
paraphrase -> design retrieved
Bugs surfaced
3
all root-caused, all shipped fixed
The task

A small probe, no prior context.

Pick something a Claude Code session might write on a Tuesday: a textbook Bloom filter -- probabilistic set-membership test, m bits, k independent hashes, insert sets the k bit positions, query checks they're all set. The choice of probe is incidental. The point is to watch which substrate tools fire during a normal coding loop.

class Bloom:
    def __init__(self, m, k):
        self.m, self.k = m, k
        self.bits = bytearray(m)

    def _hashes(self, x):
        # k independent hashes over m bits
        return [hash_seeded(i, x) % self.m for i in range(self.k)]

    def add(self, x):
        for h in self._hashes(x): self.bits[h] = 1

    def contains(self, x):
        return all(self.bits[h] for h in self._hashes(x))

# Hypothesized properties:
#   H1 deterministic    same insert sequence -> identical bit array
#   H2 no false negatives   every inserted item returns True
#   H3 fp rate near theory   (1 - e^(-kn/m))^k +/- noise
#   H4 monotone         more inserts -> higher fp rate, smoothly

No prior context exists for this filter. Nothing to recall on session start. A clean test of whether the substrate inserts itself unnaturally into the inner loop -- it shouldn't.

The inner loop

Zero substrate calls during write -- run -- fix.

The iteration was tight enough that the substrate had nothing to contribute. One Write, two Bash runs, one Edit. No prior probe to recall, no persistent state to deposit yet. The substrate's job is at the seams, not in the typing.

StepToolSubstrate fired?
Write probe_bloom.py (~90 lines)Writeno
First run -- H3 fails (fp rate ≈ 1.0, expected ≈ 0.05)Bashno
Diagnose: Python hash() ignores the seed arg, so all k positions collide(reasoning)no
Fix: prepend seed to the bytes -- hash(f"{i}:{x}")Editno
Re-run -- all four hypotheses pass; fp rate 0.048 vs theory 0.050Bashno
The principle

We are not a smart- autocomplete or an inner-loop accelerant. The substrate adds value at handoff boundaries -- between probes, between sessions, between concepts -- where context would otherwise be lost. The quality bar for an inner-loop call is "the agent already knows; don't interrupt." The quality bar for a seam call is "the next session will need this." They are different.

The seams

Four seams. Ten calls. Each doing real work.

When the probe passed, the agent reached for the substrate without being prompted -- because the boundary was real. Four kinds of deposits, each compounding differently.

Decision log
records_ingest x2
kind=decision, kind=mistake
The decision ("Bloom filter passes all four hypotheses; fp rate 0.048 vs theory 0.050") and the mistake ("Python hash() doesn't accept a seed arg -- prepend the seed to the bytes instead") deposited as separate typed events. Mistake records are the highest-ROI shape -- one deposit can save 15-60 minutes of next-session re-debugging.
Future recall
memory_upsert_text
design statement, embedded
The Bloom filter design -- what it is, what it costs, false- positive rate formula -- embedded as text. A future session asking "probabilistic membership data structure" (different wording) still retrieves it at score 0.31, beating an unrelated note at -0.008. Paraphrase tolerance compounds across sessions.
Relational query
hd_kg_add_fact x3
(bloom_filter, is_a, probabilistic_data_structure)
Three triples bound into the KG. Now hd_kg_query(subject= bloom_filter, relation=has_property) returns no_false_negatives at confidence 1.0. As more facts land, structural queries grow faster than storage -- N facts give M·M·R 2-hop reachability over M atoms.
Cross-modal recall
records_query (text + tagsAny)
hybrid semantic + typed filter
The just-written decision came back at score 0.26 against an unrelated note at -0.008. The hybrid filter (semantic + typed) ranked the right record without the agent needing to remember the exact tag.
What this gives back

The asymmetry: deposit ≈ 1 call, recall ≈ 1 call.

A single high-quality mistake-record can save tens of minutes of re-debugging weeks later. That's the whole pitch. The substrate isn't a smart agent -- it's a discipline that turns each finished probe into a deposit that future probes earn interest on.

Hours
Next-session recall
Yesterday's mistake-record warns the agent about a footgun it's about to step on today. The 'int8 overflow in cos_bipolar' record costs <$0.000004 to deposit and saves the next agent 15-60 min of re-debugging the identical bug.
Weeks (5-20 sessions)
'Have we tried this?' index
The namespace becomes a paraphrase-tolerant catalog of every approach attempted. The KG fills with concept relations -- structural queries start returning real graph paths, not just nearest neighbors.
Months
Audit + replay + compaction
The records timeline IS the audit log of decisions and pivots. records_compact summarizes old slices into one summary-record while preserving original_ids[] for traceability, so the namespace stays dense without drowning in noise.
The honest part

What the session caught -- and what we shipped.

The same session uncovered three real bugs in the Pinecone-compat memory layer, separate from the substrate path. We root-caused all three, regression-tested them, and shipped patches the same day. The records substrate path was clean throughout.

BugRoot causeStatus
memory_fetch multi-id returned only lastServer signature parsed multi-param query as single valuefixed -- accepts both wire shapes
memory_update silently no-op on metadataWrapper sent metadata; server expects setMetadatafixed in MCP wrappers 0.6.1
memory_export leaked tombstoned rowsTombstones not pruned before dumps_nmm -- GDPR Art. 17 violationfixed -- compact-before-dump on all paths

The records substrate (records_ingest / records_query / records_timeline / records_forget / records_compact) handled every deposit and recall correctly, with user_id auto-tagging and .neruva portable export working end-to-end. That's the typed-event substrate doing what it's designed for.

Reproduce it

Run this exact session yourself.

One command installs the hooks + MCP + auto-recall into your existing Claude Code. Then write any small probe and watch which tools fire at the boundaries.

pip install -U neruva-record
export NERUVA_API_KEY=nv_...
neruva-record-install --yes --namespace my-probes

# Restart Claude Code, then in a fresh window:
#   1. Write a small probe (any concept, any 100 lines)
#   2. Iterate until it passes
#   3. Type: "deposit the result, the design, and any mistakes"
#   4. Next session: "what have I tried that involves <paraphrase>?"
#
# The substrate fires at steps 3 + 4. Not before.

Stop discarding context at every session boundary.

Deposit at every seam. Recall at every cold-start. Compact when noise creeps in. The substrate is one MCP install away.