Benchmarks

Numbers, not claims.

All measurements below run against the live api.neruva.io from a client-side harness over public TLS -- no servers-side shortcuts, no localhost, no preferential paths. Raw JSON is embedded at the bottom of this page.

Last measured:2026-05-14 18:59 UTCWarmup / measure:10 + 100 per opRegion:us-central1 (Cloud Run)Transport:HTTPS, TLS 1.2+
Latency

Sub-second end-to-end. Sub-100ms substrate.

End-to-end client-measured latency including TLS, Cloud Run routing, server-side compute, and the round-trip from the caller's machine. Substrate ops (HD KG, analogy, causal) are typically a fraction of network round-trip; the rest is packet flight time.

Operationp50p95p99mean
records_ingest (write + auto-embed)
records_ingest
97.5 ms107.6 ms142.3 ms98.8 ms
records_query (semantic, topK=5)
records_query_semantic
96.8 ms101.1 ms186.4 ms97.7 ms
records_query (typed-only filter)
records_query_typed
95.1 ms104.5 ms176.6 ms96.5 ms
records_timeline (limit=20)
records_timeline
97.1 ms104.6 ms453.4 ms101.4 ms
hd_kg_query (single-fact unbind)
hd_kg_query
94.7 ms103.8 ms145.7 ms95.5 ms
hd_analogy (n_feat=8)
hd_analogy
98.6 ms109.1 ms263.5 ms101.2 ms
hd_causal_query (observation P(Y|X))
hd_causal_observe
94.9 ms102.9 ms151.4 ms95.9 ms
hd_causal_query (intervention P(Y|do(X)))
hd_causal_intervene
96.8 ms107.6 ms167.2 ms98.2 ms
Determinism

Same seed, same answer. Every time.

The substrate is deterministic from a seed -- a property no model-in-the-loop architecture can claim. We verify by issuing identical analogy queries 20 times against the live API and comparing outputs.

Reruns
20
Unique outputs
1
all bit-identical
Sample candidate
15
cosine 1.000
Knowledge graph accuracy

Calibrated confidence with a known SNR ceiling.

HD knowledge-graph queries return the bound object plus a calibrated confidence score. We seed N facts of shape (person, born_in, city) all under the same relation -- this exercises the worst case: a single HD bundle holding everything, no shard-level parallelism. Real workloads spread facts across many relations, which keeps each per-relation bundle under its SNR ceiling.

Facts seeded
200
Correct unbinds
200
of 200
Single-shard accuracy
100.0%
worst-case: all 200 facts in one bundle
What this number means

A single HD bundle holds ~150-250 facts cleanly before the cosine signal-to-noise ratio degrades. Above that you start seeing decode collisions -- which is exactly what shows up here. In practice, use a relation budget (split high-cardinality predicates across multiple sub-relations or use the typed Records substrate for fact-like data instead of HD KG). The substrate is designed for tens of thousands of facts when spread across relations; this measurement deliberately pessimizes that for transparency.

Cost vs LLM token-stuffing

3,125× cheaper per recall.

Many agent stacks "remember" by re-prepending recall context to every LLM call. That recall slice gets billed at frontier-model input rates per turn. Replacing it with a single records_query shifts the unit cost from per-token to per-call.

Stuff-into-prompt
5 KB context every Opus 4.7 turn
~1.25k input tokens × $5/M = $0.00625 / turn
Neruva
One records_query with typed filters
$2 / 1M = $0.0000020 / call
Ratio
3,125×
cheaper per recall on the same payload size

Opus 4.7 list pricing $5/M input. Other models differ.

Methodology

How we measure.

Reproduce these numbers yourself: clone the repo and run python probes/bench_substrate.py with your own NERUVA_API_KEY. The script is ~250 lines and has no dependencies beyond httpx.

Raw measurements

Bring your own analysis.

The full benchmarks JSON is below -- copy it, ingest it, plot the histogram yourself. We update this file every time we run the harness; the timestamp at the top of the page is the last-measured-at.

{
  "base_url": "https://api.neruva.io",
  "namespace": "bench-1778785199",
  "ts": 1778785199052,
  "warmup_n": 10,
  "measure_n": 100,
  "ops": [
    {
      "name": "records_ingest",
      "n": 100,
      "p50_ms": 97.55,
      "p95_ms": 107.56,
      "p99_ms": 142.31,
      "min_ms": 91.59,
      "max_ms": 142.61,
      "mean_ms": 98.84
    },
    {
      "name": "records_query_semantic",
      "n": 100,
      "p50_ms": 96.81,
      "p95_ms": 101.12,
      "p99_ms": 186.38,
      "min_ms": 91.67,
      "max_ms": 187.13,
      "mean_ms": 97.65
    },
    {
      "name": "records_query_typed",
      "n": 100,
      "p50_ms": 95.08,
      "p95_ms": 104.47,
      "p99_ms": 176.64,
      "min_ms": 90.42,
      "max_ms": 177.22,
      "mean_ms": 96.52
    },
    {
      "name": "records_timeline",
      "n": 100,
      "p50_ms": 97.13,
      "p95_ms": 104.57,
      "p99_ms": 453.39,
      "min_ms": 93.23,
      "max_ms": 456.74,
      "mean_ms": 101.41
    },
    {
      "name": "hd_kg_query",
      "n": 100,
      "p50_ms": 94.72,
      "p95_ms": 103.78,
      "p99_ms": 145.69,
      "min_ms": 89.8,
      "max_ms": 145.92,
      "mean_ms": 95.52
    },
    {
      "name": "hd_analogy",
      "n": 100,
      "p50_ms": 98.64,
      "p95_ms": 109.07,
      "p99_ms": 263.5,
      "min_ms": 93.29,
      "max_ms": 264.81,
      "mean_ms": 101.19
    },
    {
      "name": "hd_causal_observe",
      "n": 100,
      "p50_ms": 94.93,
      "p95_ms": 102.93,
      "p99_ms": 151.4,
      "min_ms": 90.32,
      "max_ms": 151.79,
      "mean_ms": 95.88
    },
    {
      "name": "hd_causal_intervene",
      "n": 100,
      "p50_ms": 96.85,
      "p95_ms": 107.59,
      "p99_ms": 167.24,
      "min_ms": 91.78,
      "max_ms": 167.73,
      "mean_ms": 98.21
    }
  ],
  "side_checks": {
    "determinism": {
      "reruns": 20,
      "unique_results": 1,
      "all_identical": true,
      "sample": {
        "candidate": 15,
        "candidate_bits": [
          1,
          1,
          1,
          1,
          0,
          0,
          0,
          0,
          0,
          0
        ],
        "cosine": 1.0000001192092896,
        "runner_up": 0.02392578311264515,
        "ambiguity": 0.0239256639033556,
        "confidence": 0.9760743360966444
      }
    },
    "kg_accuracy": {
      "n_facts": 200,
      "n_relations": 8,
      "facts_per_relation": 25,
      "correct": 200,
      "accuracy": 1
    },
    "cost_vs_opus47": {
      "records_query_usd_per_call": 0.000002,
      "context_stuffing_opus47_5kb_per_turn_usd": 0.00625,
      "ratio": 3125,
      "notes": "Opus 4.7 list pricing $5/M input. Other models differ."
    }
  }
}

Sub-100ms substrate. Provable, not promised.

All numbers above measured from a client against the live public API. Reproduce them with one Python file and your own key.