Show HN: HermesBench – workflow reliability evals for personal AI agents
Category: ai-ml
Tags: benchmark, ai-agent, evaluation-harness
Score: 6.0/10 (Innovation: 6, Technical: 7, Documentation: 6, Utility: 5)
HermesBench is a reliability-first benchmark and evaluation harness for personal AI agent configurations, targeting Hermes Agent setups. It provides 27 workflow recipes across 9 categories to test agent reliability in real-world tasks like calendar, email, and finance. Its interesting approach separates driver and target adapters, uses deterministic checks plus LLM judgment, and emphasizes agent-driven workflows.
Target audience: AI engineers, agent developers
Repository: https://verkyyi.github.io/hermesbench/ · HTML · MIT · 1 stars
View on Hacker News