Numbers, not claims.
All measurements below run against the live api.neruva.io from a client-side harness over public TLS -- no servers-side shortcuts, no localhost, no preferential paths. Raw JSON is embedded at the bottom of this page.
Sub-second end-to-end. Sub-100ms substrate.
End-to-end client-measured latency including TLS, Cloud Run routing, server-side compute, and the round-trip from the caller's machine. Substrate ops (HD KG, analogy, causal) are typically a fraction of network round-trip; the rest is packet flight time.
| Operation | p50 | p95 | p99 | mean |
|---|---|---|---|---|
records_ingest (write + auto-embed) records_ingest | 97.5 ms | 107.6 ms | 142.3 ms | 98.8 ms |
records_query (semantic, topK=5) records_query_semantic | 96.8 ms | 101.1 ms | 186.4 ms | 97.7 ms |
records_query (typed-only filter) records_query_typed | 95.1 ms | 104.5 ms | 176.6 ms | 96.5 ms |
records_timeline (limit=20) records_timeline | 97.1 ms | 104.6 ms | 453.4 ms | 101.4 ms |
hd_kg_query (single-fact unbind) hd_kg_query | 94.7 ms | 103.8 ms | 145.7 ms | 95.5 ms |
hd_analogy (n_feat=8) hd_analogy | 98.6 ms | 109.1 ms | 263.5 ms | 101.2 ms |
hd_causal_query (observation P(Y|X)) hd_causal_observe | 94.9 ms | 102.9 ms | 151.4 ms | 95.9 ms |
hd_causal_query (intervention P(Y|do(X))) hd_causal_intervene | 96.8 ms | 107.6 ms | 167.2 ms | 98.2 ms |
Same seed, same answer. Every time.
The substrate is deterministic from a seed -- a property no model-in-the-loop architecture can claim. We verify by issuing identical analogy queries 20 times against the live API and comparing outputs.
Calibrated confidence with a known SNR ceiling.
HD knowledge-graph queries return the bound object plus a calibrated confidence score. We seed N facts of shape (person, born_in, city) all under the same relation -- this exercises the worst case: a single HD bundle holding everything, no shard-level parallelism. Real workloads spread facts across many relations, which keeps each per-relation bundle under its SNR ceiling.
A single HD bundle holds ~150-250 facts cleanly before the cosine signal-to-noise ratio degrades. Above that you start seeing decode collisions -- which is exactly what shows up here. In practice, use a relation budget (split high-cardinality predicates across multiple sub-relations or use the typed Records substrate for fact-like data instead of HD KG). The substrate is designed for tens of thousands of facts when spread across relations; this measurement deliberately pessimizes that for transparency.
3,125× cheaper per recall.
Many agent stacks "remember" by re-prepending recall context to every LLM call. That recall slice gets billed at frontier-model input rates per turn. Replacing it with a single records_query shifts the unit cost from per-token to per-call.
records_query with typed filtersOpus 4.7 list pricing $5/M input. Other models differ.
How we measure.
- All measurements run against
https://api.neruva.ioover public TLS. No localhost, no preferential routing, no server-side shortcuts. - Each operation is warmed up 10 times to absorb cold-start cost (Cloud Run scale-to-zero adds a one- time penalty on the first call after idle), then measured 100 times.
- Latency is measured client-side via
time.perf_counter()around the HTTP call -- the number includes TLS handshake, round-trip from caller location tous-central1, and server-side compute. - Determinism is verified by comparing 20 reruns of the same analogy query and checking outputs are bit-identical.
- KG accuracy seeds N synthetic facts and queries each subject, comparing the returned object against the originally-bound one.
- Cost ratio uses Opus 4.7 list pricing ($5/M input) for the stuff-into-prompt baseline and our published per-op rate ($2/M for
records_query). Other models / cheaper tiers shift the absolute numbers but preserve the order of magnitude.
Reproduce these numbers yourself: clone the repo and run python probes/bench_substrate.py with your own NERUVA_API_KEY. The script is ~250 lines and has no dependencies beyond httpx.
Bring your own analysis.
The full benchmarks JSON is below -- copy it, ingest it, plot the histogram yourself. We update this file every time we run the harness; the timestamp at the top of the page is the last-measured-at.
{
"base_url": "https://api.neruva.io",
"namespace": "bench-1778785199",
"ts": 1778785199052,
"warmup_n": 10,
"measure_n": 100,
"ops": [
{
"name": "records_ingest",
"n": 100,
"p50_ms": 97.55,
"p95_ms": 107.56,
"p99_ms": 142.31,
"min_ms": 91.59,
"max_ms": 142.61,
"mean_ms": 98.84
},
{
"name": "records_query_semantic",
"n": 100,
"p50_ms": 96.81,
"p95_ms": 101.12,
"p99_ms": 186.38,
"min_ms": 91.67,
"max_ms": 187.13,
"mean_ms": 97.65
},
{
"name": "records_query_typed",
"n": 100,
"p50_ms": 95.08,
"p95_ms": 104.47,
"p99_ms": 176.64,
"min_ms": 90.42,
"max_ms": 177.22,
"mean_ms": 96.52
},
{
"name": "records_timeline",
"n": 100,
"p50_ms": 97.13,
"p95_ms": 104.57,
"p99_ms": 453.39,
"min_ms": 93.23,
"max_ms": 456.74,
"mean_ms": 101.41
},
{
"name": "hd_kg_query",
"n": 100,
"p50_ms": 94.72,
"p95_ms": 103.78,
"p99_ms": 145.69,
"min_ms": 89.8,
"max_ms": 145.92,
"mean_ms": 95.52
},
{
"name": "hd_analogy",
"n": 100,
"p50_ms": 98.64,
"p95_ms": 109.07,
"p99_ms": 263.5,
"min_ms": 93.29,
"max_ms": 264.81,
"mean_ms": 101.19
},
{
"name": "hd_causal_observe",
"n": 100,
"p50_ms": 94.93,
"p95_ms": 102.93,
"p99_ms": 151.4,
"min_ms": 90.32,
"max_ms": 151.79,
"mean_ms": 95.88
},
{
"name": "hd_causal_intervene",
"n": 100,
"p50_ms": 96.85,
"p95_ms": 107.59,
"p99_ms": 167.24,
"min_ms": 91.78,
"max_ms": 167.73,
"mean_ms": 98.21
}
],
"side_checks": {
"determinism": {
"reruns": 20,
"unique_results": 1,
"all_identical": true,
"sample": {
"candidate": 15,
"candidate_bits": [
1,
1,
1,
1,
0,
0,
0,
0,
0,
0
],
"cosine": 1.0000001192092896,
"runner_up": 0.02392578311264515,
"ambiguity": 0.0239256639033556,
"confidence": 0.9760743360966444
}
},
"kg_accuracy": {
"n_facts": 200,
"n_relations": 8,
"facts_per_relation": 25,
"correct": 200,
"accuracy": 1
},
"cost_vs_opus47": {
"records_query_usd_per_call": 0.000002,
"context_stuffing_opus47_5kb_per_turn_usd": 0.00625,
"ratio": 3125,
"notes": "Opus 4.7 list pricing $5/M input. Other models differ."
}
}
}Sub-100ms substrate. Provable, not promised.
All numbers above measured from a client against the live public API. Reproduce them with one Python file and your own key.