Get a memory.
Neruva Memory exposes a Pinecone-compatible REST API at https://api.neruva.io/v1. If you can call Pinecone, you can call us.
Authenticate
Issue an API key from the dashboard. Send it with every request as either an Api-Key header or a bearer token.
curl https://api.neruva.io/v1/health \ -H "Api-Key: nv_..."
Typed events, not raw vectors.
Records carry first-class kind, tags, ts, plus free-form meta. Server auto- embeds the text via the static-MRL D=1024 encoder and assigns an id. Querying is semantic + typed -- no Pinecone filter dict gymnastics. Use this for anything an agent will recall later.
Ingest typed events
POST /v1/records/{namespace}
{
"items": [
{
"kind": "decision",
"text": "switch to substrate-first positioning",
"tags": ["positioning", "shipped"],
"ts": 1715680000000, // optional; server fills in now()
"meta": {"priority": "high"}, // free-form
"ttlDays": 30 // optional auto-expiry
},
{ "kind": "mistake", "text": "deploy script wiped env vars",
"tags": ["deploy", "shipped"] }
]
}
-> {"ids": ["rec_<hex>", "rec_<hex>"], "count": 2}Canonical kinds (free-form strings accepted, this list is the cross-vendor convention): decision, mistake, handoff, llm_turn, tool_call, tool_failure, user_prompt, assistant_turn, session_start, session_end, subagent_start, subagent_stop, task_created, task_completed, note.
Query: semantic + typed
Pass text for cosine ranking; omit it for a pure typed-filter scan ordered by ts descending. Filters are first-class (no $in dict gymnastics).
POST /v1/records/{namespace}/query
{
"text": "what did I decide about positioning?",
"topK": 5,
"kind": ["decision", "mistake"],
"tagsAny": ["positioning"], // matches if any tag intersects
"tagsAll": ["shipped", "production"], // matches only if record contains all
"tsGte": 1715000000000,
"tsLt": 1715900000000,
"includeText": true,
"includeMeta": true
}
-> {
"records": [
{ "id": "...", "kind": "decision", "tags": [...],
"ts": 1715680000000, "text": "...", "meta": {...},
"score": 0.83 },
...
],
"namespace": "main"
}Timeline (most-recent-first stream)
GET /v1/records/{namespace}/timeline
?since=1715000000000 # inclusive lower-bound ts
&until=1715900000000 # exclusive upper-bound ts (page back via nextCursor)
&kind=decision,mistake # comma-separated
&tagsAny=positioning,api
&tagsAll=shipped
&limit=50 # max 500
-> { "records": [...], "namespace": "main", "nextCursor": <oldest_ts> }
GET /v1/records/{namespace}/{id} -> single record
DELETE /v1/records/{namespace}/{id} -> {"deleted": true|false}
GET /v1/records/{namespace}/stats -> count, byKind, oldestTs, newestTsGDPR forget
One semantic operation, typed predicates. Pass any combination of kind, tagsAny, tagsAll, tsGte, tsLt, ids.
POST /v1/records/{namespace}/forget
{
"kind": ["user_prompt"],
"tagsAny": ["user:abc-123"] // every record tagged for that user
}
-> {"forgottenCount": 47}.neruva portable file
One file per namespace. Atomic, point-in-time consistent, versioned. The container is a zip with a manifest.json + records.nmm + reserved slots for kg/, scm/, and analogy.json sections. Forward-compatible: V2 readers ignore unknown sections.
# Export a namespace as one .neruva file
GET /v1/records/{namespace}/export
-> Content-Type: application/x-neruva
-> Content-Disposition: attachment; filename="{namespace}.neruva"
# Import (REPLACE semantic in V1; merge is V2)
POST /v1/records/{namespace}/import
Content-Type: multipart/form-data
file=<.neruva blob>
-> {"imported": 47, "manifest": {...}}
# Container layout:
# manifest.json {schema_version, exported_at_ms, namespace,
# sections, counts, exported_by}
# records.nmm typed records substrate
# kg/<name>.hdkg reserved (V2)
# scm/<name>.hdscm reserved (V2)
# analogy.json reserved (V2)The compat layer.
If you're migrating from Pinecone, the /v1/indexes/* endpoints accept the same shapes you already use -- swap one import and your existing code works. For new agents, start with the typed Records API above.
Create an index
POST /v1/indexes
{
"name": "agent-memory",
"dimension": 1024,
"metric": "cosine",
"spec": {
"serverless": {"cloud": "gcp", "region": "us-central1"}
}
}Upsert vectors
Submit float vectors. They are normalized, 1-bit-encoded, and written to an append-only WAL. Index updates asynchronously and is queryable within milliseconds.
POST /v1/indexes/agent-memory/vectors/upsert
{
"namespace": "agent_42",
"vectors": [
{
"id": "mem_001",
"values": [0.1, -0.3, ...],
"metadata": {"role": "assistant", "ts": 1715533200}
}
]
}Query
POST /v1/indexes/agent-memory/query
{
"namespace": "agent_42",
"vector": [0.1, -0.3, ...],
"topK": 8,
"includeMetadata": true,
"filter": {
"role": {"$eq": "assistant"},
"ts": {"$gte": 1715000000}
}
}Supported operators: $eq, $ne, $in, $nin, $gt, $gte, $lt, $lte.
Drop-in Pinecone client
# Existing Pinecone code:
from pinecone import Pinecone
pc = Pinecone(api_key="pcsk_...")
# Switch to Neruva (zero changes below this line):
from neruva import Pinecone
pc = Pinecone(api_key="nv_...")
index = pc.Index("agent-memory")
index.upsert([("mem-1", vec, {"agent": "coder"})])
index.query(vector=vec, top_k=8)Wrap one client. Every turn upserts.
neruva-record wraps an Anthropic Python client so that every messages.create call silently records the user message and assistant response into a Memory namespace as a side-effect. Recording is fire-and-forget: failures are swallowed, your call never blocks.
pip install neruva-record anthropic
export NERUVA_API_KEY=nv_...
import anthropic
from neruva_record import auto_record
client = auto_record(
anthropic.Anthropic(),
index="brain", # one per user/account
namespace="main", # one per agent
ttl_days=30, # optional auto-expiry
)
response = client.messages.create(
model="claude-opus-4-7",
max_tokens=200,
messages=[{"role": "user", "content": "Hi!"}],
)
# AsyncAnthropic supported the same way.
# Every turn becomes one record:
# {
# "id": "llm-<unix-ms>-<rand>",
# "text": "USER: ...\n\nASSISTANT: ...",
# "metadata": {
# "kind": "llm_turn", "vendor": "anthropic",
# "model": "claude-opus-4-7", "stop_reason": "end_turn",
# "input_tokens": 12, "output_tokens": 87,
# "latency_ms": 1240, "ts": <unix-ms>
# }
# }Auto-record -- Claude Code hooks
One command merges 10 lifecycle hooks into ~/.claude/settings.json and registers @neruva/mcp. After restart, every Bash, Read, Edit, Write, WebFetch, MCP tool call, prompt, response, subagent and task lands in your Memory namespace. Hooks run async ( async: true) so they never slow the agent. Calls into Neruva's own MCP are auto-skipped to prevent recording the recording.
pip install neruva-record NERUVA_API_KEY=nv_... neruva-record-install --yes # Or interactive: neruva-record-install # Skip MCP registration: neruva-record-install --no-mcp # Custom namespace + TTL: neruva-record-install --api-key nv_... \ --namespace research-bot --ttl 7 --yes # Remove later: neruva-record-install --uninstall # Captured event kinds (metadata.kind): # user_prompt, tool_call, tool_failure, assistant_turn, # session_start, session_end, subagent_start, # subagent_stop, task_created, task_completed
The installer backs up your existing settings with a timestamp before merging, and preserves any user hooks already wired.
The substrate reasons.
Every endpoint above operates on a vector by similarity. The endpoints below operate on the vector's algebra. Triples bind, queries unbind, analogies parallelogram, interventions substitute, plans minimize Expected Free Energy -- all in the substrate, none of them touching an LLM.
All HD endpoints accept the same Api-Key header. JSON in, JSON out. Sub-millisecond per call.
Knowledge graphs
Bind (subject, relation, object) triples into a single ~32KB vector per relation shard. Query by (subject, relation) -- unbind returns the most likely object with a calibrated cosine-based confidence. Thousands of facts per shard. No materialized triple table.
POST /v1/hd/kg/people/facts
{
"facts": [
{"subject": "alice", "relation": "lives_in", "object": "toronto"},
{"subject": "bob", "relation": "lives_in", "object": "vancouver"},
{"subject": "alice", "relation": "works_at", "object": "acme"}
]
}
-> {"added": 3, "relations": 2}
POST /v1/hd/kg/people/query
{"subject": "alice", "relation": "lives_in"}
-> {"object": "toronto", "confidence": 0.71}
GET /v1/hd/kg/people/stats
DELETE /v1/hd/kg/peopleAnalogy by algebra
Parallelogram completion: A:B::C:?. The substrate computes the answer D = C xor (A xor B) over factored binary items. Stateless -- the codebook is deterministic in (n_feat, seed).
POST /v1/hd/analogy
{"n_feat": 6, "a": 0, "b": 1, "c": 2, "seed": 4301}
-> {
"candidate": 3,
"candidate_bits": [1,1,0,0,0,0],
"cosine": 0.999,
"runner_up": 0.83,
"ambiguity": 0.83,
"confidence": 0.17
}Causal do-operator
Upload worlds (rows of categorical variables). Then query either observation (conditional probability) or intervention (Pearl's do-operator -- forced assignment that cuts the confounder path). Same logged data, two arithmetically distinct queries.
POST /v1/hd/causal/scm1/worlds
{
"n_vars": 3,
"vocab_per_var": [2, 2, 2],
"worlds": [[0,1,1], [1,1,0], ...], # rows of int category indices
"seed": 4401
}
# What did we observe? P(Y=1 | X=1)
POST /v1/hd/causal/scm1/query
{
"query_type": "observation",
"condition_var": 1, "condition_value": 1,
"query_var": 2, "query_value": 1
}
# What WOULD happen if we forced X=1? P(Y=1 | do(X=1))
POST /v1/hd/causal/scm1/query
{"query_type": "intervention", ...}
DELETE /v1/hd/causal/scm1Endpoint reference
| Method | Path | Purpose |
|---|---|---|
| GET | /v1/health | Liveness |
| POST | /v1/records/{ns} | Records -- ingest typed events |
| POST | /v1/records/{ns}/query | Records -- semantic + typed query |
| GET | /v1/records/{ns}/timeline | Records -- most-recent-first stream |
| GET | /v1/records/{ns}/{id} | Records -- fetch by id |
| DELETE | /v1/records/{ns}/{id} | Records -- soft delete |
| POST | /v1/records/{ns}/forget | Records -- typed-predicate forget |
| GET | /v1/records/{ns}/stats | Records -- count / by_kind / ts range |
| GET | /v1/records/{ns}/export | Records -- .neruva container export |
| POST | /v1/records/{ns}/import | Records -- .neruva container import |
| POST | /v1/indexes | Create index |
| GET | /v1/indexes | List indexes |
| GET | /v1/indexes/{name} | Describe |
| DELETE | /v1/indexes/{name} | Delete index |
| POST | /v1/indexes/{name}/vectors/upsert | Write vectors |
| POST | /v1/indexes/{name}/query | Top-K query |
| POST | /v1/indexes/{name}/vectors/delete | Delete by id / filter |
| GET | /v1/indexes/{name}/vectors/fetch | Fetch by IDs |
| POST | /v1/indexes/{name}/vectors/update | Patch metadata |
| GET | /v1/indexes/{name}/describe_index_stats | Per-namespace counts |
| POST | /v1/hd/kg/{name}/facts | HD KG -- bind triples |
| POST | /v1/hd/kg/{name}/query | HD KG -- unbind (s,r) -> (o, conf) |
| GET | /v1/hd/kg/{name}/stats | HD KG -- shard stats |
| DELETE | /v1/hd/kg/{name} | HD KG -- drop |
| POST | /v1/hd/analogy | HD parallelogram analogy |
| POST | /v1/hd/causal/{name}/worlds | HD causal -- add SCM worlds |
| POST | /v1/hd/causal/{name}/query | HD causal -- observe vs intervene |
| DELETE | /v1/hd/causal/{name} | HD causal -- drop |