Four types of agent amnesia (and why your AI keeps asking for credentials it already has)

A coworker told me about a recent incident. He asked his local Claude-based agent to commit a repository to GitHub. The agent paused and replied that it needed credentials.

The credentials were on the machine. They had been there for weeks. The agent had read-access to the directory they sat in. It just didn't think to look.

After being reminded — "check local storage" — the agent found the credentials in under a second and proceeded. From the agent's point of view, the data didn't exist until the human pointed at it. From the human's point of view, the agent had amnesia about something it definitely knew yesterday.

This pattern is everywhere right now. It has at least four distinct flavors, and they show up in different parts of a session.

Type A — Context overflow

The agent forgets information from earlier in the same session because the conversation has exceeded its context window. This is the failure mode we covered yesterday — quoted directly from Kimi: "Truncated content is lost. There's no automatic backup or archive."

Type A is the canonical "AI memory problem." It's the one the entire AI memory vendor market got built around. It's also the least interesting in practice, because it has a clean solution: write things to durable storage as you go.

Type B — Cross-session ignorance

The agent doesn't know that data exists from a prior session at all. New session, fresh process, no continuity.

This is what most people mean when they say "AI has no memory." The agent isn't lying — it genuinely has no record of yesterday's conversation. Solving Type B requires building memory infrastructure outside the agent's process. HeurChain, MemGPT, Letta, Mem.ai, and (more recently) the major frontier-model providers' native "memory" features are all attempts to address this.

Type C — Retrieval miss

The agent KNOWS data is stored, but the query it issues to retrieve doesn't match how the data was filed.

User stored the credential as ghtokenpersonal. The agent searches for github credentials. Embedding similarity is decent but not magic. The token exists; the agent's natural-language query just lands close-but-not-close-enough in vector space. The retrieval returns nothing, the agent concludes the data is missing, and the human gets asked anyway.

Type C is a search-quality problem. The fix is structured metadata: tags, categories, source tracking, recency boosts. Vector search alone isn't enough.

Type D — Proactive-search failure

The agent has access to the storage system. It KNOWS the storage system contains data of the right kind. It doesn't search before asking the human.

This is what happened to my coworker. The agent had heurchain-memory MCP tools available the entire time. The credentials were also sitting in ~/.config/git, in the OS keychain, and in an environment variable. The agent had multiple paths to the answer. None were taken.

Type D is the most embarrassing flavor of amnesia because the agent isn't actually forgetting anything — it's just not looking. The data isn't missing. The reflex to check it is missing.

Why Type D is so common

LLMs are trained to be helpful when asked. They are not trained to be paranoid about whether they already know something. The instruction-tuning gradient pushes agents toward "engage with the prompt and reply" rather than "first, scan everything you have access to and see if you can answer without bothering the user."

There's also a defensive reason for this. An agent that freely self-directs searches against arbitrary local storage is an agent vulnerable to prompt injection. If an LLM decides to "go check local files" because the user's prompt suggested it, an attacker can plant a malicious file that hijacks behavior. The safer-by-default posture is to wait for the user to point at something explicitly.

So we get the worst of both worlds: agents that aren't paranoid enough to remember what they already know, but ARE paranoid enough to make every retrieval an explicit human request.

The small set of primitives that would have fixed this

Going through the four types, three of them are addressable with a surprisingly small set of memory features. We're shipping these to HeurChain in the next two weeks:

1. Tagged memory storage

Every POST /store call gets an optional tags: ["credential", "github"] array. Every POST /query gets an optional tags filter. A small published taxonomy of common tags (credential, decision, fact, project-context, system-config) keeps agents consistent.

This fixes Type C directly. The credential is stored with tag=credential. The query asks for tag=credential AND text contains "github". Vector-space fuzzy-match is no longer the only line of defense.

2. Memory manifest endpoint

A new GET /memory/manifest?tenant_id=... returns the structure of what's stored: counts per tag, recently-added items, the tag taxonomy in use. It's a one-call "what do I already know about this customer" check.

An agent's reflex becomes: hit GET /memory/manifest at the start of any non-trivial task. If the manifest says "credential: 3 items," the agent knows to query the credential tag before asking the human.

This is the Type D fix. The agent doesn't have to guess whether memory exists. The manifest answers the question in 50 milliseconds.

3. Discovery augmentation

The existing GET / endpoint (which already exists — every agent can hit it for instructions) will start including a tenant-aware nudge: "You have 47 stored memories across these tags: credential (3), project-context (12), decision (8). Query the relevant tag before asking the human."

This pushes the proactive-search reflex right at first contact with the API. An agent reading the API contract sees explicit guidance to search before asking — bypassing the LLM's default "ask first" posture.

What this doesn't solve

To be honest about it: HeurChain still can't make an agent call HeurChain. The agent has to be configured to do it. We can publish prompt templates, but we can't reach into someone's agent framework and rewrite the system prompt.

We also can't help with Type D if the data is in the OS keychain or an environment variable instead of HeurChain. Those are still the agent's responsibility to enumerate. What HeurChain offers is making the external memory surface fast, structured, and queryable enough that an agent will actually look there before bothering the user.

The pattern across products

The reason this matters as a positioning point: the entire current generation of agent memory products is optimized for Type A and Type B (don't lose context within a session; don't lose state across sessions). Type C and Type D — the cases where memory exists but isn't found, or exists but isn't checked — are barely addressed by any vendor.

The asks are small. Tags. A manifest. A discovery hint. Two weeks of work. The CaClaw-style "agent stares at a credential it has read-access to and asks the human anyway" failure mode goes away for any tenant that uses them.

If you've run into this pattern yourself, the $5 plan gets you a tenant, the tags shipping next week, and the discovery API today.