Show HN: A benchmark for the failure modes of agent memory
Category: devtools
Tags: benchmark, ai-agents, memory-testing, typescript
Score: 7.3/10 (Innovation: 7, Technical: 8, Documentation: 8, Utility: 6)
Agent-memory-bench is an offline, reproducible benchmark that scores AI agent memory systems on four specific failure modes—retraction, collision, recall, and conflict—rather than shallow retrieval metrics. Its innovative focus on real-world memory bugs, combined with a clean interface for adding new systems and adversarial test coverage, makes it a valuable tool for improving agent reliability.
Target audience: ai engineers, ml researchers
Repository: https://github.com/Kausha3/agent-memory-bench · TypeScript · MIT
View on Hacker News