Is HeurChain faster than Cognee?

On flat retrieval, yes — HeurChain measures 20.5ms p95 under multi-tenant Docker load; Cognee's graph traversal is variable and depends on graph depth + the chosen backend. The architectural difference is the point, though: HeurChain returns relevant passages quickly; Cognee answers structured reasoning questions over an entity graph. They're solving overlapping but different problems.

Does HeurChain build a knowledge graph like Cognee?

No — that's deliberate. HeurChain ranks passages with BM25 plus dense vectors plus RRF, and lets the downstream LLM do the reasoning over the returned context. Cognee extracts entities and relationships into a graph at ingestion time (with an LLM call per document) so it can do graph reasoning at query time. Both approaches have legitimate use cases; they're not substitutes.

How much does HeurChain cost compared to Cognee?

Both have free options — Cognee is Apache-2.0, HeurChain is MIT, both have free tiers on the managed side. The real cost difference at scale is ingestion: Cognee's per-document LLM extraction adds up for high-volume corpora. For a few hundred docs, it's nothing; for ongoing chat-log ingestion, it materially compounds.

When is Cognee the better choice?

Four cases. First, queries that require entity relationship reasoning, not relevant-context lookups. Second, relationship-dense corpora (legal, scientific, organizational records). Third, when you want reasoning to happen at the storage layer rather than in the downstream LLM. Fourth, when your corpus is small enough that per-document ingest cost is a one-time tax.

When is HeurChain the better choice?

Four cases. First, when your agent memory needs are "give me the most relevant prior context" — the common case. Second, ingestion-cost-sensitive workloads where embed-only beats LLM-driven extraction per doc. Third, latency-sensitive retrieval. Fourth, multi-tenant SaaS with auditable isolation.

HeurChain vs Cognee — Honest Comparison

Methodology — read this first

How we measured this comparison

Dataset: LongMemEval-S (ICLR 2025), 500 questions across 6 reasoning categories.

HeurChain numbers: measured on the heurchain-benchmarks harness (sharded_bench.py and multitenant_bench.py) against the broker in the main repo. Both are public; you can rerun the whole thing.

Cognee numbers: from the topoteretes/cognee GitHub repo and cognee.ai docs where available; left blank otherwise. We do not fabricate competitor numbers.

What we measured: retrieval R@k, MRR, NDCG@10, p50/p95 latency, and end-to-end QA accuracy with three independent judge models.

Cross-judge QA validation (May 2026): We ran the same retrieved facts through three judges from independent model families — full results published here. Mean QA accuracy (6 categories × 30 tasks): Local 14B 32.8%, DeepSeek V3.1 671B 31.7%, Kimi K2.6 28.3%. The two frontier judges agreed with each other on 87.8% of per-question verdicts, validating each as an independent judge. The local-14B mean was confirmed directionally correct within 4.5 pp of frontier judges — no inflation at the headline level.

What the per-category swings showed: the cross-judge run exposed a v2 fact-extraction quality bottleneck (specific entity-action assignments stripped to meta-summaries) on multi-session, knowledge-update, and temporal-reasoning categories. Where extraction preserves the answer-bearing detail, all three judges converge. Where it doesn't, the local 14B "won" by confabulating answers the local 14B judge then accepted — frontier judges honestly refused. Smoking-gun example in the writeup.

What we still owe: a v3 extraction prompt that preserves entity-action-value triples, and a closed-weight frontier judge run (Claude Sonnet 4.6 via Anthropic API) for additional independent validation. Both queued.

Bias disclosure: this is our internal harness, written by us. Of course it favors what we built well. The cross-judge run is the way we expose that bias and report it honestly. If you're evaluating both, the most reliable move is to run them on your data.

Retrieval p95 (fair comparison)

Different shape

Cognee's graph traversal trades retrieval speed for relationship reasoning — by design. HeurChain measures 20.5 ms p95 on hybrid retrieval. Different output shapes; not the same metric.

Inspectable

Open

Harness + every benchmark number in a public repo. Reproduce or refute on your own data.

Architecture

No LLM ingest

HeurChain ingestion is embed-only. Cognee calls an LLM per document at ingest time to extract entities — that's where the graph reasoning comes from, but it's a real cost.

Retrieval quality

Retrieval-only metrics on LongMemEval-S

Cognee evaluates on knowledge-graph benchmarks (entity recall, multi-hop reasoning) — not retrieval R@k. We don't have apples-to-apples Cognee LongMemEval-S numbers to publish, and we won't fabricate them. The table below is HeurChain's measured performance.

Where this comparison gets muddy: Cognee's strength is knowledge-graph reasoning, not flat-passage retrieval. Comparing the two on R@k is category-mixing. Pick Cognee if your queries require entity relationship traversal ("what companies has this VC invested in alongside Acme?"). Pick HeurChain if your queries are "give me the most relevant context for this prompt." Both are legitimate; they're solving different problems with overlapping interfaces.

Metric	HeurChain (dense)	HeurChain (hybrid α=0.9)	Cognee
R@1	0.543	0.542	—
R@5	0.939	0.933	—
R@10	0.972	0.978	—
MRR	0.911	0.913	—
NDCG@10	0.911	0.914	—

Latency

P95 retrieval latency in context

Latencies from different harnesses on different deployment topologies — not strictly apples-to-apples. The multi-tenant Docker number is the closest analog to what a SaaS would actually serve. The in-process number shows what the algorithm itself is capable of with the network removed.

System / configuration	P95 latency	Source	What it actually measures
HeurChain — multi-tenant load (Docker, 10 tenants concurrent)	20.5 ms	This benchmark	Closest to production SaaS scenario
Cognee (graph query)	Variable	Cognee docs	Depends on graph depth + backend
HeurChain — dense, in-process	35 ms	This benchmark	Algorithm-only ceiling; no network
HeurChain — BM25 only	4.6 ms	This benchmark	Keyword-only path; useful for hot queries
Mem0 (reference)	200 ms	Mem0 paper Table 1	Search latency; stack-specific
LangMem (reference)	59,820 ms	Mem0 paper Table 1	Vector scan; broken at LongMemEval scale

Under the hood

Architecture comparison

	HeurChain	Cognee
Retrieval method	BM25 + dense (bge-m3) + RRF (tunable α)	LLM-driven entity + relationship extraction into a graph; traversal + vector search
Storage backend	Redis (vectors + BM25) + SQLite (metadata)	Graph DB (Kuzu / Neo4j / FalkorDB) + vector DB (LanceDB / Qdrant / others)
Ingestion model	Embed-only — lightweight	LLM call per document for entity extraction — heavyweight (paid at ingest)
Query model	Single retrieval call returns ranked passages	Cognee Search API: multi-step graph traversal returning structured results
Multi-tenant model	Per-tenant namespace + agent_id sub-isolation; published zero-leak verification	Operator-managed; depends on which graph DB + vector DB backends you wire up
Self-hosted option	Single Go binary + Redis + SQLite	Python service + your choice of graph DB + vector DB
API surface	REST + MCP SSE — auto-discovered by Claude Code, ChatGPT Apps	Python SDK + REST; MCP integration available

Honest assessment

When Cognee is the better choice

We're not going to pretend HeurChain wins on every dimension. These are real cases where Cognee is the better fit:

Your queries require entity relationship reasoning. "Which companies in my portfolio have shared board members with companies acquired by Acme in the last 5 years?" That's a graph query, not a retrieval query. Cognee was built for exactly this shape.
You're indexing relationship-dense corpora. Legal documents, scientific papers, organizational records — content where relationships between entities are as important as the content itself. Cognee's LLM-driven extraction adds structure that ranked passages can't.
You want the storage layer to reason at query time. Cognee Search traverses the graph to answer multi-hop questions. HeurChain returns passages and lets the downstream LLM reason. If you want reasoning closer to storage, Cognee.
You can amortize the per-document ingest cost. If your corpus is small or stable, the LLM-driven ingest cost is a one-time tax for ongoing reasoning power. For batch-indexed reference corpora, that math works.

Where HeurChain fits

When HeurChain is the better choice

Your queries are "give me relevant context". Most agent memory use cases. If your downstream LLM does the reasoning over returned context, you don't need graph machinery in the storage layer.
Ingestion cost matters. No LLM in the ingest path. Embed-only. For chat logs, doc dumps, web scrapes, this compounds significantly.
Latency-sensitive retrieval. Sub-50ms p95 on hybrid retrieval. Cognee's graph traversal latency is variable and depends on depth + backend choice.
Multi-tenant SaaS with auditable isolation. Per-tenant namespacing with published zero-leak verification.

Pricing

Cost — for reference, not the headline

Most readers should pick on architecture fit, not price. Cognee is Apache-2.0 open source; Cognee Cloud has a free tier with limits. HeurChain self-host is MIT-licensed. The numbers below exist so you can see them, not because we think they should drive your decision.

If you...	HeurChain	Cognee
Hobby / kicking the tires	Free self-host (MIT)	Free self-host (Apache 2.0) or Cognee Cloud free tier
Solo developer, managed	$5/mo (Solo)	Cognee Cloud paid tiers
Team, shared workspace	$49.99/mo (Workgroup)	Cognee Cloud team pricing
Enterprise — SOC2, SAML	Custom	Custom

Both have free options. The real cost difference at scale is ingestion: Cognee runs an LLM call per document at ingest time for entity extraction (~$0.005-0.015 per doc at GPT-4o prices, depending on size). HeurChain ingestion is embed-only. If you're indexing high-volume content streams, that compounds; if you're indexing a few hundred documents, it's noise.

Don't take our word

Reproduce these numbers yourself

Clone heurchain-benchmarks and the main HeurChain repo; pull the LongMemEval-S dataset (instructions in the README).
Run python3 sharded_bench.py for the single-tenant baseline; python3 multitenant_bench.py --mode load --max-tenants 10 for the Docker multi-tenant number.
Re-run on your data — your conversation logs, your documents. Public benchmarks correlate with real workloads, but they're not the same thing.
If our numbers don't reproduce on your hardware, open an issue. We'll fix or correct.

HeurChain vs Cognee: different tools for different questions.

How we measured this comparison

Retrieval-only metrics on LongMemEval-S

P95 retrieval latency in context

Architecture comparison

When Cognee is the better choice

When HeurChain is the better choice

Cost — for reference, not the headline

Reproduce these numbers yourself

If HeurChain is a fit, the easiest start is the Solo plan.