api.neruva.io·now in early access

The agent substrate
underneath your LLM.

Four typed primitives -- recall, knowledge graph, causal do-operator, and analogy -- behind one API key. Deterministic from a seed, sub-millisecond. Pairs with whatever LLM you're already running.

Start free -- $5 in credits Read the docs

or one-shot capture every Claude Code session:pip install neruva-record && neruva-record-install

Recall

Typed events, not raw vectors.

Records carry kind, tags, and timestamp as first-class fields. Query by tag-any / tag-all / kind / time window without metadata gymnastics. The agent retrieves what it actually needs, by the shape of the question.

Knowledge graph

Bind, query, get the truth.

(subject, relation, object) triples bound into a single ~32KB vector per relation. kg.query returns the answer with calibrated confidence -- or null below the floor. Mutable structured state your agent stops hallucinating about.

Causal

Pearl's do-operator over HTTP.

P(Y | X=1) is what you observed. P(Y | do(X=1)) is what would happen if you forced it. Same logged worlds, two arithmetically distinct queries. No other vector store exposes this; it's the killer feature for action selection.

Portable

Your substrate is files.

.nmm / .hdkg / .hdscm -- portable, inspectable, deterministic from a seed. Export, diff, migrate, replay. The agent's mind, on disk, in formats you can read.

Capture before you query

Auto-record every turn into the substrate.

Wrap one Anthropic client and every messages.create writes a typed {kind: "llm_turn", ...} record. Or run neruva-record-install and Claude Code itself drops every Bash, Edit, prompt, subagent into the same namespace as typed events. Then recall, KG, causal, analogy have something to operate on.

pip install neruva-record anthropic
export NERUVA_API_KEY=nv_...

from neruva_record import auto_record
import anthropic

client = auto_record(
    anthropic.Anthropic(),
    namespace="main", ttl_days=30,
)
client.messages.create(...)   # recorded as a typed llm_turn

# Or one command for full Claude Code capture:
neruva-record-install

Migrating from?

Pinecone-shape REST is a compat surface, not the product. Swap from pinecone import Pinecone for from neruva import Pinecone and your existing upsert / query / delete works unchanged -- then graduate to typed records, KG, causal, and analogy when ready.

See compat layer →

Get started

Wire into the tool you already use.

One MCP server, every modern coding agent. Pick your client and paste the line.

claude mcp add neruva \
  --env NERUVA_API_KEY=$NERUVA_API_KEY \
  -- npx -y @neruva/mcp

See all integrations →View Docs

The wedge -- HD-native substrate

Operations on memory, not just storage of it.

Every other vector DB is a dumb cabinet. You store, you retrieve by cosine, then you pay an LLM to think. Neruva's HD-native API lets the substrate itself answer the question -- knowledge-graph queries, analogies, causal inference -- all over HTTP, all deterministic, none of them burning model tokens.

/v1/hd/kg/*

Knowledge graphs

Bind (subject, relation, object) triples into one ~32KB vector. Query by (subject, relation) -- unbind returns the answer with calibrated confidence. Thousands of facts per shard. Sub-millisecond.

POST /v1/hd/kg/people/facts
{"facts":[
  {"subject":"alice","relation":"lives_in","object":"toronto"},
  {"subject":"alice","relation":"works_at","object":"acme"}
]}

POST /v1/hd/kg/people/query
{"subject":"alice","relation":"lives_in"}
-> {"object":"toronto","confidence":0.71}

/v1/hd/analogy

Analogy by algebra

A:B::C:? Parallelogram completion over factored items. The substrate finds D by XOR-style algebra, not by prompting a model. Returns candidate, cosine, and ambiguity gap.

POST /v1/hd/analogy
{"n_feat":6, "a":0, "b":1, "c":2, "seed":4301}
-> {"candidate":3, "cosine":0.999, "confidence":0.17}

/v1/hd/causal/*

Pearl's do-operator

Distinguish observation from intervention. P(Y|X=1) is what you observed; P(Y|do(X=1)) is what would happen if you forced it. Same logged worlds, two arithmetically distinct queries. No other vector DB exposes this.

POST /v1/hd/causal/scm1/query
{"query_type":"observation",
 "condition_var":1, "condition_value":1,
 "query_var":2,     "query_value":1}

POST /v1/hd/causal/scm1/query
{"query_type":"intervention", ...}

Cost

No tokens burned for substrate-answerable queries.

Latency

Sub-millisecond HD ops vs ~500ms model round-trips.

Determinism

Same input -> same output. Auditable. No temperature.

Safety

Causal vs observational queries are arithmetically distinct -- principled refusal.

How agents use it

Four primitives. One mental model.

Stop stuffing context windows with everything the agent might recall. Stop asking the LLM to do what a vector op can do for free.

Memory

Semantic recall

Past decisions, file summaries, retrieved docs -- store as vectors with metadata, retrieve by cosine. Pinecone-shape, drop-in. With memory_embed + memory_upsert_text the whole flow is text-in / text-out.

# on every turn
hits = memory_query_text(
  index="brain", namespace="handoffs",
  text=task_description, topK=3)
# pass the 3 hits to the LLM, not the whole history

Mutable structured state

Project status, deploy state, refactor diaries -- facts the LLM should never hallucinate. Add facts on success, delete-fact on rollback. Confidence-thresholded: unknowns return null.

# refactor an old function
hd_kg_delete_fact(kg="code",
  subject="myfunc", relation="defined_in", object="file_a")
hd_kg_add_fact(kg="code",
  subject="myfunc", relation="defined_in", object="file_b")

Causal

Should I do X?

Distinguish 'I see X co-occur with Y' from 'forcing X causes Y'. Pearl's do-operator on logged worlds. The killer feature most stacks can't replicate -- principled action selection without an LLM.

# CI debug: is my PR the cause, or noise?
p = hd_causal_query(scm="ci",
  query_type="intervention",
  condition_var=my_change, condition_value=1,
  query_var=test_failed,  query_value=1)
# p > 0.6 means yes, my change broke it

Analogy

Pattern transfer

A:B::C:? in HD feature space. Sub-millisecond, deterministic. Useful when items are integer-encoded; pass on day 1 unless your domain maps cleanly.

# experimental, n_feat up to 20 (1M items)
ans = hd_analogy(n_feat=10,
  a=king, b=queen, c=man)
# returns: woman with cosine 1.0

Why agents save money

Every offloaded recall is tokens you don't pay for.

An agent that stuffs 5 KB of "memory" context into every turn burns input tokens at LLM rates. The same recall via Neruva is sub-100 ms and zero LLM tokens. KG queries replace whole paragraphs of "here's what you should know about project X". Causal queries replace multi-step prompts that ask the model to weigh interventions.

Context offload

before

5 KB / turn = ~1.2K input tokens

after

memory_query_text = 100 ms, $0 in tokens

At Opus 4.7 input rates ($5/M), an agent with 1k turns/day saves ~$6/day from this alone -- ~$2200/yr per agent.

Avoided hallucination

before

LLM picks file_a when myfunc moved to file_b

after

hd_kg_query returns the truth or null at conf 0.05 floor

Cost: one bad refactor turn from a stale memory is hours of rework. Worse than tokens.

Multi-step prompt collapsed

before

3-4 prompt rounds: 'consider X, then weigh Y, then decide'

after

one hd_causal_query (intervention) returns P(outcome | do(X))

At ~5K input + ~1K output tokens/round, save ~15K input + ~3K output -- about $0.15/decision on Opus 4.7.

Example numbers use Claude Opus 4.7 list pricing ($5/M input, $25/M output). Other models vary -- Sonnet is cheaper, Haiku cheaper still, and self-hosted is its own math. The shape of the savings is what matters: every recall via cosine is a recall you don't do via a 6 KB context-stuffing prompt.

Cross-agent compatibility

Adopt the optional { kind: "decision" | "mistake" | "session_end" | "handoff" } metadata convention and any Neruva-aware harness (Claude Code, Cursor, Codex, custom agents) can recall and render your records the same way. Plus tags: ["foo"] sugar on every query for filter-by-tag.

What we solve

Every agent team hits the same wall.

We've heard it from teams shipping agents into production. The shape of pain is the same every time.

Your agents forget between sessions.

todayContext resets the moment the loop ends. Users restart and the agent has no idea who they are.

with neruvaAppend-only memory keyed by agent + user. Recall in a single call, no LLM round-trip.

The memory bill scales with every user.

todayVector DBs charge per index, per pod, per read unit. Spin up 10,000 user namespaces and you're funding the wrong startup.

with neruvaOne mmap segment per namespace. Marginal cost per agent rounds to zero.

LLM-as-memory hallucinates -- and bills you for it.

todayCalling a model to summarize, extract, and rerank on every write adds seconds of latency and per-token cost that compounds.

with neruvaNo model in the retrieval path. Deterministic, auditable, cheap by construction.

Write-heavy workloads break read-optimized indexes.

todayHNSW rebuilds choke when agents write thousands of memories per minute. Throughput collapses; tail latency explodes.

with neruvaAppend-only write-ahead log with async indexing. Writes don't wait for the index.

Nobody can audit what your agent remembers.

todayCompliance asks 'show me what this agent knows about user X' and the answer is a black-box embedding.

with neruvaEvery memory is a first-class row with metadata, timestamp, and a content-addressable id. Inspect, export, delete.

Surgical forget is an afterthought.

todayA user revokes consent. Now you have to find their fingerprint scattered across embeddings, rebuilds, and caches.

with neruvaSoft-delete by id, by filter, or by predicate. Tombstones flush on the next compaction.

Wiring observability is a side project.

todayTo know what your agent did, you bolt on a tracing SDK, ship spans to a separate dashboard, and never recall any of it from the agent itself.

with neruvaauto_record(Anthropic()) and neruva-record-install land every turn, every tool call, every subagent into the same namespace your agent already queries. Observability and memory are the same store.

How it feels in production

Designed for the workload you actually run.

Writes

Hot-path safe.

Agents write memories inline without blocking. The index catches up in the background.

Tenancy

Namespace per agent, per user, per anything.

Spin up a million namespaces. We don't charge for them individually.

Filters

The operators you expect.

$eq, $ne, $in, $nin, $gt, $gte, $lt, $lte.

Recency

Time-aware retrieval, first-class.

Decay weights and time-window filters built into the index, not bolted on with metadata.

Compliance

Surgical forget. Full audit.

Delete by id, by user, by predicate. Every memory is inspectable and removable.

Billing

Wallet model. No surprises.

Top up via PayPal. Operations deduct in real time. No subscription, no minimum, no overage trap.

Stop renting search infra.
Start owning agent memory.

$5 in credits on signup. No card. No subscription. No demo call. Wire it in and decide.

Get an API key See the API surface

The agent substrateunderneath your LLM.

Auto-record every turn into the substrate.

Wire into the tool you already use.

Operations on memory, not just storage of it.

Four primitives. One mental model.

Every offloaded recall is tokens you don't pay for.

Every agent team hits the same wall.

Your agents forget between sessions.

The memory bill scales with every user.

LLM-as-memory hallucinates -- and bills you for it.

Write-heavy workloads break read-optimized indexes.

Nobody can audit what your agent remembers.

Surgical forget is an afterthought.

Wiring observability is a side project.

Designed for the workload you actually run.

Stop renting search infra. Start owning agent memory.

The agent substrate
underneath your LLM.

Stop renting search infra.
Start owning agent memory.