Show HN: A benchmark for the failure modes of agent memory

Category: devtools

Tags: benchmark, ai-agents, memory-testing, typescript

Score: 7.3/10 (Innovation: 7, Technical: 8, Documentation: 8, Utility: 6)

Agent-memory-bench is an offline, reproducible benchmark that scores AI agent memory systems on four specific failure modes—retraction, collision, recall, and conflict—rather than shallow retrieval metrics. Its innovative focus on real-world memory bugs, combined with a clean interface for adding new systems and adversarial test coverage, makes it a valuable tool for improving agent reliability.

Target audience: ai engineers, ml researchers

Repository: https://github.com/Kausha3/agent-memory-bench · TypeScript · MIT

View on Hacker News