Case study -- substrate in the wild

What a Claude Code session looks like
with the substrate wired in.

Real session. A novel algorithm written, debugged, and shipped -- then naturally deposited into the substrate at the boundaries where context would otherwise be lost. Zero forced calls. Verbatim transcript with the tool fires annotated.

Inner-loop calls

write -> run -> fix loop

Seam calls

naturally fired at boundaries

Cross-modal recall hit

0.31

paraphrase -> design retrieved

Bugs surfaced

all root-caused, all shipped fixed

The task

A small probe, no prior context.

Pick something a Claude Code session might write on a Tuesday: a textbook Bloom filter -- probabilistic set-membership test, m bits, k independent hashes, insert sets the k bit positions, query checks they're all set. The choice of probe is incidental. The point is to watch which substrate tools fire during a normal coding loop.

class Bloom:
    def __init__(self, m, k):
        self.m, self.k = m, k
        self.bits = bytearray(m)

    def _hashes(self, x):
        # k independent hashes over m bits
        return [hash_seeded(i, x) % self.m for i in range(self.k)]

    def add(self, x):
        for h in self._hashes(x): self.bits[h] = 1

    def contains(self, x):
        return all(self.bits[h] for h in self._hashes(x))

# Hypothesized properties:
#   H1 deterministic    same insert sequence -> identical bit array
#   H2 no false negatives   every inserted item returns True
#   H3 fp rate near theory   (1 - e^(-kn/m))^k +/- noise
#   H4 monotone         more inserts -> higher fp rate, smoothly

No prior context exists for this filter. Nothing to recall on session start. A clean test of whether the substrate inserts itself unnaturally into the inner loop -- it shouldn't.

The inner loop

Zero substrate calls during write -- run -- fix.

The iteration was tight enough that the substrate had nothing to contribute. One Write, two Bash runs, one Edit. No prior probe to recall, no persistent state to deposit yet. The substrate's job is at the seams, not in the typing.

Step	Tool	Substrate fired?
Write probe_bloom.py (~90 lines)	Write	no
First run -- H3 fails (fp rate ≈ 1.0, expected ≈ 0.05)	Bash	no
Diagnose: Python `hash()` ignores the seed arg, so all k positions collide	(reasoning)	no
Fix: prepend seed to the bytes -- `hash(f"{i}:{x}")`	Edit	no
Re-run -- all four hypotheses pass; fp rate 0.048 vs theory 0.050	Bash	no

The principle

We are not a smart- autocomplete or an inner-loop accelerant. The substrate adds value at handoff boundaries -- between probes, between sessions, between concepts -- where context would otherwise be lost. The quality bar for an inner-loop call is "the agent already knows; don't interrupt." The quality bar for a seam call is "the next session will need this." They are different.

The seams

Four seams. Ten calls. Each doing real work.

When the probe passed, the agent reached for the substrate without being prompted -- because the boundary was real. Four kinds of deposits, each compounding differently.

Decision log

records_ingest x2

kind=decision, kind=mistake

The decision ("Bloom filter passes all four hypotheses; fp rate 0.048 vs theory 0.050") and the mistake ("Python hash() doesn't accept a seed arg -- prepend the seed to the bytes instead") deposited as separate typed events. Mistake records are the highest-ROI shape -- one deposit can save 15-60 minutes of next-session re-debugging.

Future recall

memory_upsert_text

design statement, embedded

The Bloom filter design -- what it is, what it costs, false- positive rate formula -- embedded as text. A future session asking "probabilistic membership data structure" (different wording) still retrieves it at score 0.31, beating an unrelated note at -0.008. Paraphrase tolerance compounds across sessions.

Relational query

hd_kg_add_fact x3

(bloom_filter, is_a, probabilistic_data_structure)

Three triples bound into the KG. Now hd_kg_query(subject= bloom_filter, relation=has_property) returns no_false_negatives at confidence 1.0. As more facts land, structural queries grow faster than storage -- N facts give M·M·R 2-hop reachability over M atoms.

Cross-modal recall

records_query (text + tagsAny)

hybrid semantic + typed filter

The just-written decision came back at score 0.26 against an unrelated note at -0.008. The hybrid filter (semantic + typed) ranked the right record without the agent needing to remember the exact tag.

What this gives back

The asymmetry: deposit ≈ 1 call, recall ≈ 1 call.

A single high-quality mistake-record can save tens of minutes of re-debugging weeks later. That's the whole pitch. The substrate isn't a smart agent -- it's a discipline that turns each finished probe into a deposit that future probes earn interest on.

Hours

Next-session recall

Yesterday's mistake-record warns the agent about a footgun it's about to step on today. The 'int8 overflow in cos_bipolar' record costs <$0.000004 to deposit and saves the next agent 15-60 min of re-debugging the identical bug.

Weeks (5-20 sessions)

'Have we tried this?' index

The namespace becomes a paraphrase-tolerant catalog of every approach attempted. The KG fills with concept relations -- structural queries start returning real graph paths, not just nearest neighbors.

Months

Audit + replay + compaction

The records timeline IS the audit log of decisions and pivots. records_compact summarizes old slices into one summary-record while preserving original_ids[] for traceability, so the namespace stays dense without drowning in noise.

The honest part

What the session caught -- and what we shipped.

The same session uncovered three real bugs in the Pinecone-compat memory layer, separate from the substrate path. We root-caused all three, regression-tested them, and shipped patches the same day. The records substrate path was clean throughout.

Bug	Root cause	Status
`memory_fetch` multi-id returned only last	Server signature parsed multi-param query as single value	fixed -- accepts both wire shapes
`memory_update` silently no-op on metadata	Wrapper sent `metadata`; server expects `setMetadata`	fixed in MCP wrappers 0.6.1
`memory_export` leaked tombstoned rows	Tombstones not pruned before `dumps_nmm` -- GDPR Art. 17 violation	fixed -- compact-before-dump on all paths

The records substrate (records_ingest / records_query / records_timeline / records_forget / records_compact) handled every deposit and recall correctly, with user_id auto-tagging and .neruva portable export working end-to-end. That's the typed-event substrate doing what it's designed for.

Reproduce it

Run this exact session yourself.

One command installs the hooks + MCP + auto-recall into your existing Claude Code. Then write any small probe and watch which tools fire at the boundaries.

pip install -U neruva-record
export NERUVA_API_KEY=nv_...
neruva-record-install --yes --namespace my-probes

# Restart Claude Code, then in a fresh window:
#   1. Write a small probe (any concept, any 100 lines)
#   2. Iterate until it passes
#   3. Type: "deposit the result, the design, and any mistakes"
#   4. Next session: "what have I tried that involves <paraphrase>?"
#
# The substrate fires at steps 3 + 4. Not before.

Stop discarding context at every session boundary.

Deposit at every seam. Recall at every cold-start. Compact when noise creeps in. The substrate is one MCP install away.

Get an API key See how to install

What a Claude Code session looks like with the substrate wired in.