Operations your LLM
doesn't have to do.
HD-native endpoints that answer knowledge-graph queries, analogies, and causal interventions in single-digit milliseconds -- deterministically, over HTTP. Pairs with whatever LLM you're already running; offloads the substrate-answerable slice so you stop paying token rates for cosine math.
Four operations. One substrate. No prompts.
Every endpoint is a function on bound HD vectors at D=8192 -- bind, unbind, bundle, cosine cleanup. The codebook is deterministic in a seed. The math is integer multiply, sign, and dot product. There is no model in the call path.
POST /v1/hd/kg/people/facts
{"facts":[
{"subject":"alice","relation":"lives_in","object":"toronto"},
{"subject":"alice","relation":"works_at","object":"acme"}
]}
POST /v1/hd/kg/people/query
{"subject":"alice","relation":"lives_in"}
-> {"object":"toronto","confidence":0.71}POST /v1/hd/analogy
{"n_feat":6, "a":0, "b":1, "c":2, "seed":4301}
-> {"candidate":3, "cosine":0.999, "confidence":0.17}POST /v1/hd/causal/scm1/query
{"query_type":"observation",
"condition_var":1, "condition_value":1,
"query_var":2, "query_value":1}
POST /v1/hd/causal/scm1/query
{"query_type":"intervention", ...}Stop paying token rates for cosine math.
Your LLM is great at language and open-ended thinking. It is wildly overpriced and slow when used as a key-value store, a confidence-thresholded fact lookup, or a counterfactual calculator. The substrate handles those slices natively so the LLM can do what it's good at.
Recall via context-stuffing burns input tokens.
todayRe-prepending 5 KB of memory to every LLM turn costs ~$0.00625 per call at Opus 4.7 input rates.
with neruvaOne records_query at $0.000002 returns the same content. ~3,000x cheaper per recall, sub-100ms.
Confidence-thresholded fact lookup is a lookup, not a generation.
todayAsking the LLM 'what does Alice know about Bob?' costs tokens AND can hallucinate when the fact isn't there.
with neruvahd_kg_query returns the bound object plus calibrated confidence -- or null below the threshold. No phantom answers.
Causal vs observational is arithmetically distinct.
todayLLMs blur 'X co-occurs with Y' with 'X causes Y'. The do-operator is a different math, not a different prompt.
with neruvaP(Y|X=x) and P(Y|do(X=x)) are two endpoints with two different SCM evaluations on the same logged worlds.
Hot loops can't afford LLM round-trips.
today500ms-30s LLM round-trip latency rules out using the model as the inner loop of anything real-time.
with neruvaSub-100ms p50 substrate calls fit in inner loops. The LLM handles outer-loop reasoning at its own pace.
Auditability requires reproducibility.
todayLLM responses drift across deployments, model versions, sampling temps. You can't prove a decision a year later.
with neruvaSubstrate is deterministic from a seed. Bit-identical reruns. Compliance gets a reproduction artifact, not a vibe.
Where teams reach for the substrate.
Live agent decisioning at scale
scenarioCustomer-service agents that need to recall 'this user prefers refunds over store credit; their last 3 interactions were about returns' before each LLM call.
with substrateKG of (user, preference, value). Query per turn. 51ms p50. Zero LLM tokens to keep the agent grounded.
Counterfactual safety in agentic systems
scenarioBefore executing an action, the agent needs to know: 'if I push this commit, what's the historical conditional probability of a rollback?'
with substrateBuild an SCM over (action, context, outcome) logged worlds. Query observation vs intervention at decision time. 90ms p50.
Knowledge-graph-grounded chatbots
scenarioChatbot needs to answer 'what does Alice know about Bob?' from a freshly-ingested CRM dump, then write the answer in natural language via the existing LLM.
with substrateSubstrate handles the (subject, relation) recall in 51ms; LLM only formats the answer. 100x cheaper than asking the LLM to recall and format.
Concept-drift-resistant retrieval
scenarioEmbedding-based retrieval breaks when the model version changes. You re-embed everything and ranks shift in production.
with substrateHD codebook is deterministic in a seed. Re-encode tomorrow with the same seed; vectors are bit-identical. Migration is free.
Auditable AI for regulated industries
scenarioFinance, healthcare, legal -- need to prove a substrate-answered decision is reproducible months later for an audit.
with substrateEvery substrate response is reproducible byte-for-byte from input + seed. Determinism is a compliance feature, not a footnote.
What the LLM does. What the substrate does. Together.
The substrate doesn't replace your LLM -- it absorbs the slice the LLM is overpriced and slow at. Your model handles language, open-ended reasoning, and tool orchestration. The substrate handles recall, KG facts, causal arithmetic, analogy, and audit-grade reproduction.
| Capability | LLM (where it shines) | Substrate (where it shines) |
|---|---|---|
| Open-ended reasoning | Yes -- this is what frontier LMs are built for | No -- the substrate is not a model |
| Memory recall (5 KB context per turn) | Costs ~$0.00625/turn at Opus 4.7 input rates | ~$0.000002/turn -- 3,000x cheaper |
| Confidence-thresholded fact lookup | Tokens + can hallucinate | hd_kg_query returns null below threshold |
| Causal vs observational | Hand-wavy, prompt-dependent | Two distinct endpoints, arithmetically separated |
| Determinism / replay | Drifts across deployments and model versions | Bit-identical reruns from a seed |
| Inner-loop latency budget | 500ms -- 30s p50 round-trip | ~95ms p50 from caller to substrate |
| Hot-loop friendly | No -- too slow for game/robotics inner loops | Yes -- fits a 16ms frame budget |
| Cold-start tax | Multi-second autoscale spin-up per request | Sub-second; warm calls in microseconds |
Deposit cost ≈ 1 call. Recall cost ≈ 1 call.
The substrate doesn't insert itself into the inner write -- run -- fix loop, and shouldn't. It fires at the seams -- between probes, between sessions, between concepts -- where you'd otherwise lose context. One mistake-record can save 15-60 minutes the next time you hit the same root cause. The math compounds three ways across three horizons.
records_compact is the regulator -- summarizes old slices while preserving original_ids for traceability, so the namespace doesn't drown in noise.| Deposit | Cost to write | Compounding shape | When it pays off |
|---|---|---|---|
| mistake record | 1 call | Highest ROI per entry -- saves 15-60 min next time you hit the same root cause | Anytime you write code that resembles a past failure |
| decision record | 1 call | Linear log; queryable by tag / kind / time | "What shipped this month?", audit, handoffs |
memory_upsert_text (design / spec) | 1 call | Semantic -- paraphrased queries still hit (different wording, same hit) | "Have we tried X?" before reinventing |
hd_kg_add_fact (triple) | 1 call | Super-linear -- N facts → M·M·R 2-hop reachability over M atoms | Structural queries grow faster than storage |
| .neruva export | free | Substrate-independence -- survives any provider switch | Migration, backup, sharing across agent instances |
The asymmetry that matters: deposit cost ≈ 1 call, recall cost ≈ 1 call, a single high-quality mistake-record can save tens of minutes of next-session re-debugging. The discipline is one line: deposit at every seam, compact when noise creeps in.
Stop paying for tokens to think.
Pay for the answer.
$5 in credits on signup. Substrate ops are $1-5 per million. That's roughly five million KG queries before you spend a dollar.