Benchmarks

Numbers, not claims.

All measurements below run against the live api.neruva.io from a client-side harness over public TLS -- no servers-side shortcuts, no localhost, no preferential paths. Raw JSON is embedded at the bottom of this page.

Last measured:2026-05-14 18:59 UTCWarmup / measure:10 + 100 per opRegion:us-central1 (Cloud Run)Transport:HTTPS, TLS 1.2+

Latency

Sub-second end-to-end. Sub-100ms substrate.

End-to-end client-measured latency including TLS, Cloud Run routing, server-side compute, and the round-trip from the caller's machine. Substrate ops (HD KG, analogy, causal) are typically a fraction of network round-trip; the rest is packet flight time.

Operation	p50	p95	p99	mean
records_ingest (write + auto-embed) records_ingest	97.5 ms	107.6 ms	142.3 ms	98.8 ms
records_query (semantic, topK=5) records_query_semantic	96.8 ms	101.1 ms	186.4 ms	97.7 ms
records_query (typed-only filter) records_query_typed	95.1 ms	104.5 ms	176.6 ms	96.5 ms
records_timeline (limit=20) records_timeline	97.1 ms	104.6 ms	453.4 ms	101.4 ms
hd_kg_query (single-fact unbind) hd_kg_query	94.7 ms	103.8 ms	145.7 ms	95.5 ms
hd_analogy (n_feat=8) hd_analogy	98.6 ms	109.1 ms	263.5 ms	101.2 ms
hd_causal_query (observation P(Y\|X)) hd_causal_observe	94.9 ms	102.9 ms	151.4 ms	95.9 ms
hd_causal_query (intervention P(Y\|do(X))) hd_causal_intervene	96.8 ms	107.6 ms	167.2 ms	98.2 ms

Determinism

Same seed, same answer. Every time.

The substrate is deterministic from a seed -- a property no model-in-the-loop architecture can claim. We verify by issuing identical analogy queries 20 times against the live API and comparing outputs.

Reruns

Unique outputs

all bit-identical

Sample candidate

cosine 1.000

Knowledge graph accuracy

Calibrated confidence with a known SNR ceiling.

HD knowledge-graph queries return the bound object plus a calibrated confidence score. We seed N facts of shape (person, born_in, city) all under the same relation -- this exercises the worst case: a single HD bundle holding everything, no shard-level parallelism. Real workloads spread facts across many relations, which keeps each per-relation bundle under its SNR ceiling.

Facts seeded

200

Correct unbinds

200

of 200

Single-shard accuracy

100.0%

worst-case: all 200 facts in one bundle

What this number means

A single HD bundle holds ~150-250 facts cleanly before the cosine signal-to-noise ratio degrades. Above that you start seeing decode collisions -- which is exactly what shows up here. In practice, use a relation budget (split high-cardinality predicates across multiple sub-relations or use the typed Records substrate for fact-like data instead of HD KG). The substrate is designed for tens of thousands of facts when spread across relations; this measurement deliberately pessimizes that for transparency.

Cost vs LLM token-stuffing

3,125× cheaper per recall.

Many agent stacks "remember" by re-prepending recall context to every LLM call. That recall slice gets billed at frontier-model input rates per turn. Replacing it with a single records_query shifts the unit cost from per-token to per-call.

Stuff-into-prompt

5 KB context every Opus 4.7 turn

~1.25k input tokens × $5/M = $0.00625 / turn

Neruva

One records_query with typed filters

$2 / 1M = $0.0000020 / call

Ratio

3,125×

cheaper per recall on the same payload size

Opus 4.7 list pricing $5/M input. Other models differ.

Methodology

How we measure.

All measurements run against https://api.neruva.io over public TLS. No localhost, no preferential routing, no server-side shortcuts.
Each operation is warmed up 10 times to absorb cold-start cost (Cloud Run scale-to-zero adds a one- time penalty on the first call after idle), then measured 100 times.
Latency is measured client-side via time.perf_counter() around the HTTP call -- the number includes TLS handshake, round-trip from caller location to us-central1, and server-side compute.
Determinism is verified by comparing 20 reruns of the same analogy query and checking outputs are bit-identical.
KG accuracy seeds N synthetic facts and queries each subject, comparing the returned object against the originally-bound one.
Cost ratio uses Opus 4.7 list pricing ($5/M input) for the stuff-into-prompt baseline and our published per-op rate ($2/M for records_query). Other models / cheaper tiers shift the absolute numbers but preserve the order of magnitude.

Reproduce these numbers yourself: clone the repo and run python probes/bench_substrate.py with your own NERUVA_API_KEY. The script is ~250 lines and has no dependencies beyond httpx.

Raw measurements

Bring your own analysis.

The full benchmarks JSON is below -- copy it, ingest it, plot the histogram yourself. We update this file every time we run the harness; the timestamp at the top of the page is the last-measured-at.

{
  "base_url": "https://api.neruva.io",
  "namespace": "bench-1778785199",
  "ts": 1778785199052,
  "warmup_n": 10,
  "measure_n": 100,
  "ops": [
    {
      "name": "records_ingest",
      "n": 100,
      "p50_ms": 97.55,
      "p95_ms": 107.56,
      "p99_ms": 142.31,
      "min_ms": 91.59,
      "max_ms": 142.61,
      "mean_ms": 98.84
    },
    {
      "name": "records_query_semantic",
      "n": 100,
      "p50_ms": 96.81,
      "p95_ms": 101.12,
      "p99_ms": 186.38,
      "min_ms": 91.67,
      "max_ms": 187.13,
      "mean_ms": 97.65
    },
    {
      "name": "records_query_typed",
      "n": 100,
      "p50_ms": 95.08,
      "p95_ms": 104.47,
      "p99_ms": 176.64,
      "min_ms": 90.42,
      "max_ms": 177.22,
      "mean_ms": 96.52
    },
    {
      "name": "records_timeline",
      "n": 100,
      "p50_ms": 97.13,
      "p95_ms": 104.57,
      "p99_ms": 453.39,
      "min_ms": 93.23,
      "max_ms": 456.74,
      "mean_ms": 101.41
    },
    {
      "name": "hd_kg_query",
      "n": 100,
      "p50_ms": 94.72,
      "p95_ms": 103.78,
      "p99_ms": 145.69,
      "min_ms": 89.8,
      "max_ms": 145.92,
      "mean_ms": 95.52
    },
    {
      "name": "hd_analogy",
      "n": 100,
      "p50_ms": 98.64,
      "p95_ms": 109.07,
      "p99_ms": 263.5,
      "min_ms": 93.29,
      "max_ms": 264.81,
      "mean_ms": 101.19
    },
    {
      "name": "hd_causal_observe",
      "n": 100,
      "p50_ms": 94.93,
      "p95_ms": 102.93,
      "p99_ms": 151.4,
      "min_ms": 90.32,
      "max_ms": 151.79,
      "mean_ms": 95.88
    },
    {
      "name": "hd_causal_intervene",
      "n": 100,
      "p50_ms": 96.85,
      "p95_ms": 107.59,
      "p99_ms": 167.24,
      "min_ms": 91.78,
      "max_ms": 167.73,
      "mean_ms": 98.21
    }
  ],
  "side_checks": {
    "determinism": {
      "reruns": 20,
      "unique_results": 1,
      "all_identical": true,
      "sample": {
        "candidate": 15,
        "candidate_bits": [
          1,
          1,
          1,
          1,
          0,
          0,
          0,
          0,
          0,
          0
        ],
        "cosine": 1.0000001192092896,
        "runner_up": 0.02392578311264515,
        "ambiguity": 0.0239256639033556,
        "confidence": 0.9760743360966444
      }
    },
    "kg_accuracy": {
      "n_facts": 200,
      "n_relations": 8,
      "facts_per_relation": 25,
      "correct": 200,
      "accuracy": 1
    },
    "cost_vs_opus47": {
      "records_query_usd_per_call": 0.000002,
      "context_stuffing_opus47_5kb_per_turn_usd": 0.00625,
      "ratio": 3125,
      "notes": "Opus 4.7 list pricing $5/M input. Other models differ."
    }
  }
}

Sub-100ms substrate. Provable, not promised.

All numbers above measured from a client against the live public API. Reproduce them with one Python file and your own key.

Get an API key Read the docs