Show HN: Bosun – a small model that keeps an agent's memory graph clean
Category: ai-ml
Tags: knowledge-graph, benchmark, ai-agent, memory, instruction-following, huggingface
Score: 7.0/10 (Innovation: 7, Technical: 6, Documentation: 8, Utility: 7)
WarrantBench is a benchmark for evaluating how well a judge system (like a language model) can determine whether edges in a knowledge graph are warranted, based on a programmable instruction. It fills a critical gap in agent memory management by measuring instruction-following for arbitrary rules, and its companion Bosun model outperforms frontier LLMs on steerability. The project is interesting for its novel focus on the 'judge' verb in knowledge graph lifecycle, with deterministic gold labels and strong documentation.
Target audience: ai researchers, ml engineers, agent developers
Repository: https://huggingface.co/Hanno-Labs/bosun-xs · Python · MIT
View on Hacker News