Show HN: Agent-evals – Claude skill to build your own evals
Category: ai-ml
Tags: ai-evaluation, agentic-ai, claude-skill
Score: 3.8/10 (Innovation: 4, Technical: 3, Documentation: 4, Utility: 4)
Agent-evals is a skill for Claude that helps evaluate agentic AI pipeline systems by defining metrics, generating test cases, and tracking regressions. It aims to bring systematic evaluation to AI agents, but currently lacks mature documentation and community adoption.
Target audience: AI/ML engineers and researchers working with agentic systems
Repository: https://github.com/fsilavong/agent-eval · 13 stars
View on Hacker News