Docs

Get a memory.

Neruva Memory exposes a Pinecone-compatible REST API at https://api.neruva.io/v1. If you can call Pinecone, you can call us.

Authenticate

Issue an API key from the dashboard. Send it with every request as either an Api-Key header or a bearer token.

curl https://api.neruva.io/v1/health \
  -H "Api-Key: nv_..."
Records -- the primary substrate

Typed events, not raw vectors.

Records carry first-class kind, tags, ts, plus free-form meta. Server auto- embeds the text via the static-MRL D=1024 encoder and assigns an id. Querying is semantic + typed -- no Pinecone filter dict gymnastics. Use this for anything an agent will recall later.

Ingest typed events

POST /v1/records/{namespace}
{
  "items": [
    {
      "kind": "decision",
      "text": "switch to substrate-first positioning",
      "tags": ["positioning", "shipped"],
      "ts": 1715680000000,            // optional; server fills in now()
      "meta": {"priority": "high"},   // free-form
      "ttlDays": 30                   // optional auto-expiry
    },
    { "kind": "mistake", "text": "deploy script wiped env vars",
      "tags": ["deploy", "shipped"] }
  ]
}
-> {"ids": ["rec_<hex>", "rec_<hex>"], "count": 2}

Canonical kinds (free-form strings accepted, this list is the cross-vendor convention): decision, mistake, handoff, llm_turn, tool_call, tool_failure, user_prompt, assistant_turn, session_start, session_end, subagent_start, subagent_stop, task_created, task_completed, note.

Query: semantic + typed

Pass text for cosine ranking; omit it for a pure typed-filter scan ordered by ts descending. Filters are first-class (no $in dict gymnastics).

POST /v1/records/{namespace}/query
{
  "text": "what did I decide about positioning?",
  "topK": 5,
  "kind": ["decision", "mistake"],
  "tagsAny": ["positioning"],          // matches if any tag intersects
  "tagsAll": ["shipped", "production"], // matches only if record contains all
  "tsGte": 1715000000000,
  "tsLt":  1715900000000,
  "includeText": true,
  "includeMeta": true
}
-> {
  "records": [
    { "id": "...", "kind": "decision", "tags": [...],
      "ts": 1715680000000, "text": "...", "meta": {...},
      "score": 0.83 },
    ...
  ],
  "namespace": "main"
}

Timeline (most-recent-first stream)

GET /v1/records/{namespace}/timeline
    ?since=1715000000000          # inclusive lower-bound ts
    &until=1715900000000          # exclusive upper-bound ts (page back via nextCursor)
    &kind=decision,mistake        # comma-separated
    &tagsAny=positioning,api
    &tagsAll=shipped
    &limit=50                     # max 500
-> { "records": [...], "namespace": "main", "nextCursor": <oldest_ts> }

GET    /v1/records/{namespace}/{id}     -> single record
DELETE /v1/records/{namespace}/{id}     -> {"deleted": true|false}
GET    /v1/records/{namespace}/stats    -> count, byKind, oldestTs, newestTs

GDPR forget

One semantic operation, typed predicates. Pass any combination of kind, tagsAny, tagsAll, tsGte, tsLt, ids.

POST /v1/records/{namespace}/forget
{
  "kind": ["user_prompt"],
  "tagsAny": ["user:abc-123"]   // every record tagged for that user
}
-> {"forgottenCount": 47}

.neruva portable file

One file per namespace. Atomic, point-in-time consistent, versioned. The container is a zip with a manifest.json + records.nmm + reserved slots for kg/, scm/, and analogy.json sections. Forward-compatible: V2 readers ignore unknown sections.

# Export a namespace as one .neruva file
GET /v1/records/{namespace}/export
-> Content-Type: application/x-neruva
-> Content-Disposition: attachment; filename="{namespace}.neruva"

# Import (REPLACE semantic in V1; merge is V2)
POST /v1/records/{namespace}/import
Content-Type: multipart/form-data
file=<.neruva blob>
-> {"imported": 47, "manifest": {...}}

# Container layout:
#   manifest.json    {schema_version, exported_at_ms, namespace,
#                     sections, counts, exported_by}
#   records.nmm      typed records substrate
#   kg/<name>.hdkg   reserved (V2)
#   scm/<name>.hdscm reserved (V2)
#   analogy.json     reserved (V2)
Pinecone-compat -- migration on-ramp

The compat layer.

If you're migrating from Pinecone, the /v1/indexes/* endpoints accept the same shapes you already use -- swap one import and your existing code works. For new agents, start with the typed Records API above.

Create an index

POST /v1/indexes
{
  "name": "agent-memory",
  "dimension": 1024,
  "metric": "cosine",
  "spec": {
    "serverless": {"cloud": "gcp", "region": "us-central1"}
  }
}

Upsert vectors

Submit float vectors. They are normalized, 1-bit-encoded, and written to an append-only WAL. Index updates asynchronously and is queryable within milliseconds.

POST /v1/indexes/agent-memory/vectors/upsert
{
  "namespace": "agent_42",
  "vectors": [
    {
      "id": "mem_001",
      "values": [0.1, -0.3, ...],
      "metadata": {"role": "assistant", "ts": 1715533200}
    }
  ]
}

Query

POST /v1/indexes/agent-memory/query
{
  "namespace": "agent_42",
  "vector": [0.1, -0.3, ...],
  "topK": 8,
  "includeMetadata": true,
  "filter": {
    "role": {"$eq": "assistant"},
    "ts":   {"$gte": 1715000000}
  }
}

Supported operators: $eq, $ne, $in, $nin, $gt, $gte, $lt, $lte.

Drop-in Pinecone client

# Existing Pinecone code:
from pinecone import Pinecone
pc = Pinecone(api_key="pcsk_...")

# Switch to Neruva (zero changes below this line):
from neruva import Pinecone
pc = Pinecone(api_key="nv_...")

index = pc.Index("agent-memory")
index.upsert([("mem-1", vec, {"agent": "coder"})])
index.query(vector=vec, top_k=8)
Auto-record -- Anthropic SDK

Wrap one client. Every turn upserts.

neruva-record wraps an Anthropic Python client so that every messages.create call silently records the user message and assistant response into a Memory namespace as a side-effect. Recording is fire-and-forget: failures are swallowed, your call never blocks.

pip install neruva-record anthropic
export NERUVA_API_KEY=nv_...

import anthropic
from neruva_record import auto_record

client = auto_record(
    anthropic.Anthropic(),
    index="brain",       # one per user/account
    namespace="main",    # one per agent
    ttl_days=30,         # optional auto-expiry
)

response = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=200,
    messages=[{"role": "user", "content": "Hi!"}],
)
# AsyncAnthropic supported the same way.

# Every turn becomes one record:
# {
#   "id": "llm-<unix-ms>-<rand>",
#   "text": "USER: ...\n\nASSISTANT: ...",
#   "metadata": {
#     "kind": "llm_turn", "vendor": "anthropic",
#     "model": "claude-opus-4-7", "stop_reason": "end_turn",
#     "input_tokens": 12, "output_tokens": 87,
#     "latency_ms": 1240, "ts": <unix-ms>
#   }
# }

Auto-record -- Claude Code hooks

One command merges 10 lifecycle hooks into ~/.claude/settings.json and registers @neruva/mcp. After restart, every Bash, Read, Edit, Write, WebFetch, MCP tool call, prompt, response, subagent and task lands in your Memory namespace. Hooks run async ( async: true) so they never slow the agent. Calls into Neruva's own MCP are auto-skipped to prevent recording the recording.

pip install neruva-record
NERUVA_API_KEY=nv_... neruva-record-install --yes

# Or interactive:
neruva-record-install

# Skip MCP registration:
neruva-record-install --no-mcp

# Custom namespace + TTL:
neruva-record-install --api-key nv_... \
  --namespace research-bot --ttl 7 --yes

# Remove later:
neruva-record-install --uninstall

# Captured event kinds (metadata.kind):
#   user_prompt, tool_call, tool_failure, assistant_turn,
#   session_start, session_end, subagent_start,
#   subagent_stop, task_created, task_completed

The installer backs up your existing settings with a timestamp before merging, and preserves any user hooks already wired.

HD substrate -- operations on memory

The substrate reasons.

Every endpoint above operates on a vector by similarity. The endpoints below operate on the vector's algebra. Triples bind, queries unbind, analogies parallelogram, interventions substitute, plans minimize Expected Free Energy -- all in the substrate, none of them touching an LLM.

All HD endpoints accept the same Api-Key header. JSON in, JSON out. Sub-millisecond per call.

Knowledge graphs

Bind (subject, relation, object) triples into a single ~32KB vector per relation shard. Query by (subject, relation) -- unbind returns the most likely object with a calibrated cosine-based confidence. Thousands of facts per shard. No materialized triple table.

POST /v1/hd/kg/people/facts
{
  "facts": [
    {"subject": "alice", "relation": "lives_in", "object": "toronto"},
    {"subject": "bob",   "relation": "lives_in", "object": "vancouver"},
    {"subject": "alice", "relation": "works_at", "object": "acme"}
  ]
}
-> {"added": 3, "relations": 2}

POST /v1/hd/kg/people/query
{"subject": "alice", "relation": "lives_in"}
-> {"object": "toronto", "confidence": 0.71}

GET    /v1/hd/kg/people/stats
DELETE /v1/hd/kg/people

Analogy by algebra

Parallelogram completion: A:B::C:?. The substrate computes the answer D = C xor (A xor B) over factored binary items. Stateless -- the codebook is deterministic in (n_feat, seed).

POST /v1/hd/analogy
{"n_feat": 6, "a": 0, "b": 1, "c": 2, "seed": 4301}
-> {
     "candidate": 3,
     "candidate_bits": [1,1,0,0,0,0],
     "cosine": 0.999,
     "runner_up": 0.83,
     "ambiguity": 0.83,
     "confidence": 0.17
   }

Causal do-operator

Upload worlds (rows of categorical variables). Then query either observation (conditional probability) or intervention (Pearl's do-operator -- forced assignment that cuts the confounder path). Same logged data, two arithmetically distinct queries.

POST /v1/hd/causal/scm1/worlds
{
  "n_vars": 3,
  "vocab_per_var": [2, 2, 2],
  "worlds": [[0,1,1], [1,1,0], ...],   # rows of int category indices
  "seed": 4401
}

# What did we observe? P(Y=1 | X=1)
POST /v1/hd/causal/scm1/query
{
  "query_type": "observation",
  "condition_var": 1, "condition_value": 1,
  "query_var": 2,     "query_value": 1
}

# What WOULD happen if we forced X=1? P(Y=1 | do(X=1))
POST /v1/hd/causal/scm1/query
{"query_type": "intervention", ...}

DELETE /v1/hd/causal/scm1

Endpoint reference

MethodPathPurpose
GET/v1/healthLiveness
POST/v1/records/{ns}Records -- ingest typed events
POST/v1/records/{ns}/queryRecords -- semantic + typed query
GET/v1/records/{ns}/timelineRecords -- most-recent-first stream
GET/v1/records/{ns}/{id}Records -- fetch by id
DELETE/v1/records/{ns}/{id}Records -- soft delete
POST/v1/records/{ns}/forgetRecords -- typed-predicate forget
GET/v1/records/{ns}/statsRecords -- count / by_kind / ts range
GET/v1/records/{ns}/exportRecords -- .neruva container export
POST/v1/records/{ns}/importRecords -- .neruva container import
POST/v1/indexesCreate index
GET/v1/indexesList indexes
GET/v1/indexes/{name}Describe
DELETE/v1/indexes/{name}Delete index
POST/v1/indexes/{name}/vectors/upsertWrite vectors
POST/v1/indexes/{name}/queryTop-K query
POST/v1/indexes/{name}/vectors/deleteDelete by id / filter
GET/v1/indexes/{name}/vectors/fetchFetch by IDs
POST/v1/indexes/{name}/vectors/updatePatch metadata
GET/v1/indexes/{name}/describe_index_statsPer-namespace counts
POST/v1/hd/kg/{name}/factsHD KG -- bind triples
POST/v1/hd/kg/{name}/queryHD KG -- unbind (s,r) -> (o, conf)
GET/v1/hd/kg/{name}/statsHD KG -- shard stats
DELETE/v1/hd/kg/{name}HD KG -- drop
POST/v1/hd/analogyHD parallelogram analogy
POST/v1/hd/causal/{name}/worldsHD causal -- add SCM worlds
POST/v1/hd/causal/{name}/queryHD causal -- observe vs intervene
DELETE/v1/hd/causal/{name}HD causal -- drop