Is HeurChain faster than Letta?

It's not a like-for-like comparison. Letta's memory I/O is one layer inside a full agent runtime; it isn't exposed as a standalone latency metric. HeurChain measures 20.5ms p95 as a pure memory service under multi-tenant load. If you care about end-to-end agent turn latency, Letta's number would include LLM inference plus tool calls plus memory; HeurChain's number is retrieval-only. Pick on what you actually need.

Can I use HeurChain with my existing ChatGPT or Claude agent?

Yes — that's the main use case. HeurChain is HTTP and MCP-native. Any agent that can make a web request can read and write into your HeurChain tenant. Letta requires porting your agent logic into the Letta runtime. With HeurChain, your existing ChatGPT conversation, Claude Code session, Cursor plugin, or LangChain pipeline can share one memory store.

How much does HeurChain cost compared to Letta?

Direct price comparison is misleading because the scopes differ. HeurChain Solo is $5/month for the memory layer only. Letta Cloud pricing covers the full agent runtime. If you already have an agent runtime, HeurChain just adds memory. If you don't, Letta gives you more of the stack for what you pay. Both have free tiers.

When is Letta the better choice?

Four cases. First, when you don't have an agent runtime and want an opinionated, batteries-included framework. Second, when the MemGPT tiered-memory model directly fits your application. Third, when you want runtime-level memory integration enabling optimizations external memory can't match. Fourth, when you value the MemGPT research lineage and community.

When is HeurChain the better choice?

Four cases. First, when you already have agents (ChatGPT, Claude, Cursor, custom) and want to add memory without changing runtimes. Second, when memory needs to outlive any one agent — swap models, keep your data. Third, when memory is one composable piece, not an opinionated whole. Fourth, multi-tenant SaaS requiring auditable isolation.

HeurChain vs Letta — Honest Comparison

Methodology — read this first

How we measured this comparison

Dataset: LongMemEval-S (ICLR 2025), 500 questions across 6 reasoning categories.

HeurChain numbers: measured on the heurchain-benchmarks harness (sharded_bench.py and multitenant_bench.py) against the broker in the main repo. Both are public; you can rerun the whole thing.

Letta numbers: from MemGPT paper (arXiv 2310.08560) and current Letta docs where available; left blank otherwise. We do not fabricate competitor numbers.

What we measured: retrieval R@k, MRR, NDCG@10, p50/p95 latency, and end-to-end QA accuracy with three independent judge models.

Cross-judge QA validation (May 2026): We ran the same retrieved facts through three judges from independent model families — full results published here. Mean QA accuracy (6 categories × 30 tasks): Local 14B 32.8%, DeepSeek V3.1 671B 31.7%, Kimi K2.6 28.3%. The two frontier judges agreed with each other on 87.8% of per-question verdicts, validating each as an independent judge. The local-14B mean was confirmed directionally correct within 4.5 pp of frontier judges — no inflation at the headline level.

What the per-category swings showed: the cross-judge run exposed a v2 fact-extraction quality bottleneck (specific entity-action assignments stripped to meta-summaries) on multi-session, knowledge-update, and temporal-reasoning categories. Where extraction preserves the answer-bearing detail, all three judges converge. Where it doesn't, the local 14B "won" by confabulating answers the local 14B judge then accepted — frontier judges honestly refused. Smoking-gun example in the writeup.

What we still owe: a v3 extraction prompt that preserves entity-action-value triples, and a closed-weight frontier judge run (Claude Sonnet 4.6 via Anthropic API) for additional independent validation. Both queued.

Bias disclosure: this is our internal harness, written by us. Of course it favors what we built well. The cross-judge run is the way we expose that bias and report it honestly. If you're evaluating both, the most reliable move is to run them on your data.

Retrieval p95 (fair comparison)

Different shape

Letta's memory I/O is embedded in the agent loop, not exposed as a standalone retrieval metric. HeurChain measures 20.5 ms p95 as a pure memory layer under multi-tenant Docker load. These aren't directly comparable numbers — they measure different things.

Inspectable

Open

Harness + every benchmark number in a public repo. Reproduce or refute on your own data.

Architecture

Bring-your-agent

HeurChain calls into any existing agent via HTTP or MCP. Letta provides the full runtime — memory ships with the agent. Different value, different lock-in profile.

Retrieval quality

Retrieval-only metrics on LongMemEval-S

Letta doesn't publish standalone retrieval R@k — memory is one layer of a full agent runtime, evaluated end-to-end on task completion. Comparing retrieval directly would mean instrumenting Letta's internal layer, which we haven't done. Our numbers below are HeurChain on LongMemEval-S; treat the cross-system comparison as illustrative of architecture, not of "who wins."

Where this comparison gets muddy: Letta is an agent runtime, not a memory-only service. Comparing retrieval R@k apples-to-apples is category-mixing — Letta's memory is evaluated as part of the whole agent on task benchmarks (MMLU-style, AgentBench-style), not as a standalone retriever. Pick Letta if you want a memory-aware agent runtime. Pick HeurChain if you want a memory layer your existing agents can call into.

Metric	HeurChain (dense)	HeurChain (hybrid α=0.9)	Letta
R@1	0.543	0.542	—
R@5	0.939	0.933	—
R@10	0.972	0.978	—
MRR	0.911	0.913	—
NDCG@10	0.911	0.914	—

Latency

P95 retrieval latency in context

Latencies from different harnesses on different deployment topologies — not strictly apples-to-apples. The multi-tenant Docker number is the closest analog to what a SaaS would actually serve. The in-process number shows what the algorithm itself is capable of with the network removed.

System / configuration	P95 latency	Source	What it actually measures
HeurChain — multi-tenant load (Docker, 10 tenants concurrent)	20.5 ms	This benchmark	Closest to production SaaS scenario
Letta agent loop (memory I/O)	Embedded	Letta docs	Not exposed as standalone metric
HeurChain — dense, in-process	35 ms	This benchmark	Algorithm-only ceiling; no network
HeurChain — BM25 only	4.6 ms	This benchmark	Keyword-only path; useful for hot queries
Mem0 (reference)	200 ms	Mem0 paper Table 1	Search latency; stack-specific
LangMem (reference)	59,820 ms	Mem0 paper Table 1	Vector scan; broken at LongMemEval scale

Under the hood

Architecture comparison

	HeurChain	Letta
Retrieval method	BM25 + dense (bge-m3) + RRF (tunable α)	Tiered virtual memory (core / archival / recall) with dense retrieval over archival
Storage backend	Redis (vectors + BM25) + SQLite (metadata)	Postgres + pgvector for archival memory
Position in stack	External memory service called by any agent (ChatGPT, Claude, Cursor, custom)	Memory lives inside the Letta agent runtime — agent + memory ship together
Multi-tenant model	Per-tenant namespace + agent_id sub-isolation; published zero-leak verification	Per-agent isolation within a Letta server; multi-tenant deployment is your responsibility
Self-hosted option	Single Go binary + Redis + SQLite	Letta server + Postgres + pgvector via Docker Compose
API surface	REST + MCP SSE — auto-discovered by Claude Code, ChatGPT Apps	Letta REST API (agent-centric) + Python SDK
Coupling to LLM choice	None — pure retrieval infrastructure	Letta loop expects an LLM with tool-calling; ships with provider integrations

Honest assessment

When Letta is the better choice

We're not going to pretend HeurChain wins on every dimension. These are real cases where Letta is the better fit:

You don't have an agent runtime yet. Letta gives you the whole loop: LLM orchestration, tool calling, memory. If you're starting from scratch and want one opinionated framework, that's a credible choice.
You like the MemGPT tiered-memory model. Core context / archival / recall is a thoughtful architecture from the original MemGPT paper. If that mental model fits your application, Letta implements it natively.
You want memory tightly coupled to one specific agent. When agent and memory ship together, certain optimizations (eager paging of relevant context, tool-mediated memory ops) are easier than via an external service.
You value the research lineage. MemGPT was an influential paper; Letta is its evolution. There's a real research community around the approach. For research-adjacent work, that matters.

Where HeurChain fits

When HeurChain is the better choice

You already have agents you want to add memory to. ChatGPT, Claude, Cursor, n8n, custom scripts — all of these can call HeurChain via HTTP or MCP. Letta would mean porting them into the Letta runtime.
You want memory to outlive any single agent runtime. Your data lives in your HeurChain tenant. Swap models, swap runtimes — memory stays. Letta-managed memory is coupled to the Letta runtime.
Memory is one piece of a larger stack. HeurChain is one service in your architecture. Letta is an opinionated whole. If your team prefers composable infrastructure, that fits.
Multi-tenant SaaS with auditable isolation. Per-tenant namespacing with published zero-leak verification.

Pricing

Cost — for reference, not the headline

Most readers should pick on architecture fit, not price. Letta Cloud has a free tier; the runtime is Apache-2.0 open source. HeurChain self-host is MIT-licensed. The numbers below exist so you can see them, not because we think they should drive your decision.

If you...	HeurChain	Letta
Hobby / kicking the tires	Free self-host (MIT)	Free tier (Letta Cloud) or self-host (Apache 2.0)
Solo developer, managed	$5/mo (Solo — memory only)	Letta Cloud paid plans (full runtime included)
Team, shared workspace	$49.99/mo (Workgroup — memory only)	Letta team pricing varies
Enterprise — SOC2, SAML	Custom	Custom

Apples vs oranges note: Letta's price covers a full agent runtime (LLM orchestration, tool calling, memory — the whole loop). HeurChain's price covers only the memory layer; you bring the agent. If you don't have an agent runtime yet, Letta gives you more per dollar. If you already have one in ChatGPT / Claude / Cursor, HeurChain adds memory without forcing you to adopt a new runtime.

Don't take our word

Reproduce these numbers yourself

Clone heurchain-benchmarks and the main HeurChain repo; pull the LongMemEval-S dataset (instructions in the README).
Run python3 sharded_bench.py for the single-tenant baseline; python3 multitenant_bench.py --mode load --max-tenants 10 for the Docker multi-tenant number.
Re-run on your data — your conversation logs, your documents. Public benchmarks correlate with real workloads, but they're not the same thing.
If our numbers don't reproduce on your hardware, open an issue. We'll fix or correct.

HeurChain vs Letta: different scopes, different jobs.

How we measured this comparison

Retrieval-only metrics on LongMemEval-S

P95 retrieval latency in context

Architecture comparison

When Letta is the better choice

When HeurChain is the better choice

Cost — for reference, not the headline

Reproduce these numbers yourself

If HeurChain is a fit, the easiest start is the Solo plan.