/v1/hd/*·the substrate underneath your LLM

Operations your LLM doesn't have to do.

HD-native endpoints that answer knowledge-graph queries, analogies, and causal interventions in single-digit milliseconds -- deterministically, over HTTP. Pairs with whatever LLM you're already running; offloads the substrate-answerable slice so you stop paying token rates for cosine math.

Accuracy
100%
across knowledge graph, analogy, causal on synthetic ground truth at n=200. Measured on the live API; raw numbers on /benchmarks.
Latency
~95ms p50
end-to-end from Cloud Run to caller per substrate call. Server compute is sub-ms; the floor is network round-trip.
Cost
~3,000x
cheaper per recall than re-prepending context to your LLM. $2/M op vs $0.00625/turn at Opus 4.7 input rates. Other models vary.
Determinism
20/20
bit-identical reruns of the same query. No temperature, no sampling, no stochasticity. Audit it and you can prove it.
The surface

Four operations. One substrate. No prompts.

Every endpoint is a function on bound HD vectors at D=8192 -- bind, unbind, bundle, cosine cleanup. The codebook is deterministic in a seed. The math is integer multiply, sign, and dot product. There is no model in the call path.

/v1/hd/kg/*
Knowledge graphs
Bind (subject, relation, object) triples into one compact vector per graph. Query by (subject, relation) -- unbind returns the answer with calibrated confidence. Sharded for capacity; thousands of facts per graph.
POST /v1/hd/kg/people/facts
{"facts":[
  {"subject":"alice","relation":"lives_in","object":"toronto"},
  {"subject":"alice","relation":"works_at","object":"acme"}
]}

POST /v1/hd/kg/people/query
{"subject":"alice","relation":"lives_in"}
-> {"object":"toronto","confidence":0.71}
/v1/hd/analogy
Analogy by algebra
A:B::C:? Parallelogram completion over factored items. The substrate finds D by XOR-style algebra, not by prompting a model. Returns candidate, cosine, and an ambiguity gap you can threshold on.
POST /v1/hd/analogy
{"n_feat":6, "a":0, "b":1, "c":2, "seed":4301}
-> {"candidate":3, "cosine":0.999, "confidence":0.17}
/v1/hd/causal/*
Pearl's do-operator
Distinguish observation from intervention. P(Y|X=1) is what you observed; P(Y|do(X=1)) is what would happen if you forced it. Same logged worlds, two arithmetically distinct queries. No other vector DB exposes this.
POST /v1/hd/causal/scm1/query
{"query_type":"observation",
 "condition_var":1, "condition_value":1,
 "query_var":2,     "query_value":1}

POST /v1/hd/causal/scm1/query
{"query_type":"intervention", ...}
What the substrate handles for you

Stop paying token rates for cosine math.

Your LLM is great at language and open-ended thinking. It is wildly overpriced and slow when used as a key-value store, a confidence-thresholded fact lookup, or a counterfactual calculator. The substrate handles those slices natively so the LLM can do what it's good at.

Recall via context-stuffing burns input tokens.

todayRe-prepending 5 KB of memory to every LLM turn costs ~$0.00625 per call at Opus 4.7 input rates.

with neruvaOne records_query at $0.000002 returns the same content. ~3,000x cheaper per recall, sub-100ms.

Confidence-thresholded fact lookup is a lookup, not a generation.

todayAsking the LLM 'what does Alice know about Bob?' costs tokens AND can hallucinate when the fact isn't there.

with neruvahd_kg_query returns the bound object plus calibrated confidence -- or null below the threshold. No phantom answers.

Causal vs observational is arithmetically distinct.

todayLLMs blur 'X co-occurs with Y' with 'X causes Y'. The do-operator is a different math, not a different prompt.

with neruvaP(Y|X=x) and P(Y|do(X=x)) are two endpoints with two different SCM evaluations on the same logged worlds.

Hot loops can't afford LLM round-trips.

today500ms-30s LLM round-trip latency rules out using the model as the inner loop of anything real-time.

with neruvaSub-100ms p50 substrate calls fit in inner loops. The LLM handles outer-loop reasoning at its own pace.

Auditability requires reproducibility.

todayLLM responses drift across deployments, model versions, sampling temps. You can't prove a decision a year later.

with neruvaSubstrate is deterministic from a seed. Bit-identical reruns. Compliance gets a reproduction artifact, not a vibe.

Real-world use cases

Where teams reach for the substrate.

Live agent decisioning at scale

scenarioCustomer-service agents that need to recall 'this user prefers refunds over store credit; their last 3 interactions were about returns' before each LLM call.

with substrateKG of (user, preference, value). Query per turn. 51ms p50. Zero LLM tokens to keep the agent grounded.

Counterfactual safety in agentic systems

scenarioBefore executing an action, the agent needs to know: 'if I push this commit, what's the historical conditional probability of a rollback?'

with substrateBuild an SCM over (action, context, outcome) logged worlds. Query observation vs intervention at decision time. 90ms p50.

Knowledge-graph-grounded chatbots

scenarioChatbot needs to answer 'what does Alice know about Bob?' from a freshly-ingested CRM dump, then write the answer in natural language via the existing LLM.

with substrateSubstrate handles the (subject, relation) recall in 51ms; LLM only formats the answer. 100x cheaper than asking the LLM to recall and format.

Concept-drift-resistant retrieval

scenarioEmbedding-based retrieval breaks when the model version changes. You re-embed everything and ranks shift in production.

with substrateHD codebook is deterministic in a seed. Re-encode tomorrow with the same seed; vectors are bit-identical. Migration is free.

Auditable AI for regulated industries

scenarioFinance, healthcare, legal -- need to prove a substrate-answered decision is reproducible months later for an audit.

with substrateEvery substrate response is reproducible byte-for-byte from input + seed. Determinism is a compliance feature, not a footnote.

Division of labor

What the LLM does. What the substrate does. Together.

The substrate doesn't replace your LLM -- it absorbs the slice the LLM is overpriced and slow at. Your model handles language, open-ended reasoning, and tool orchestration. The substrate handles recall, KG facts, causal arithmetic, analogy, and audit-grade reproduction.

CapabilityLLM (where it shines)Substrate (where it shines)
Open-ended reasoningYes -- this is what frontier LMs are built forNo -- the substrate is not a model
Memory recall (5 KB context per turn)Costs ~$0.00625/turn at Opus 4.7 input rates~$0.000002/turn -- 3,000x cheaper
Confidence-thresholded fact lookupTokens + can hallucinatehd_kg_query returns null below threshold
Causal vs observationalHand-wavy, prompt-dependentTwo distinct endpoints, arithmetically separated
Determinism / replayDrifts across deployments and model versionsBit-identical reruns from a seed
Inner-loop latency budget500ms -- 30s p50 round-trip~95ms p50 from caller to substrate
Hot-loop friendlyNo -- too slow for game/robotics inner loopsYes -- fits a 16ms frame budget
Cold-start taxMulti-second autoscale spin-up per requestSub-second; warm calls in microseconds
Why it compounds

Deposit cost ≈ 1 call. Recall cost ≈ 1 call.

The substrate doesn't insert itself into the inner write -- run -- fix loop, and shouldn't. It fires at the seams -- between probes, between sessions, between concepts -- where you'd otherwise lose context. One mistake-record can save 15-60 minutes the next time you hit the same root cause. The math compounds three ways across three horizons.

Hours
Next-session recall
Semantic recall of decisions / mistakes / handoffs without re-reading any file. A mistake-record from yesterday warns the agent about a footgun it's about to step on today.
Weeks (5-20 sessions)
"Have we tried this?" index
The namespace becomes a paraphrase-tolerant catalog of every approach attempted. KG fills with concept relations -- structural queries start returning real graph paths, not just nearest neighbors.
Months
Audit + replay
The records timeline is the audit log of every decision and pivot. records_compact is the regulator -- summarizes old slices while preserving original_ids for traceability, so the namespace doesn't drown in noise.
DepositCost to writeCompounding shapeWhen it pays off
mistake record1 callHighest ROI per entry -- saves 15-60 min next time you hit the same root causeAnytime you write code that resembles a past failure
decision record1 callLinear log; queryable by tag / kind / time"What shipped this month?", audit, handoffs
memory_upsert_text (design / spec)1 callSemantic -- paraphrased queries still hit (different wording, same hit)"Have we tried X?" before reinventing
hd_kg_add_fact (triple)1 callSuper-linear -- N facts → M·M·R 2-hop reachability over M atomsStructural queries grow faster than storage
.neruva exportfreeSubstrate-independence -- survives any provider switchMigration, backup, sharing across agent instances

The asymmetry that matters: deposit cost ≈ 1 call, recall cost ≈ 1 call, a single high-quality mistake-record can save tens of minutes of next-session re-debugging. The discipline is one line: deposit at every seam, compact when noise creeps in.

Stop paying for tokens to think.
Pay for the answer.

$5 in credits on signup. Substrate ops are $1-5 per million. That's roughly five million KG queries before you spend a dollar.