Show HN: Agent-evals – Claude skill to build your own evals

Category: ai-ml

Tags: ai-evaluation, agentic-ai, claude-skill

Score: 3.8/10 (Innovation: 4, Technical: 3, Documentation: 4, Utility: 4)

Agent-evals is a skill for Claude that helps evaluate agentic AI pipeline systems by defining metrics, generating test cases, and tracking regressions. It aims to bring systematic evaluation to AI agents, but currently lacks mature documentation and community adoption.

Target audience: AI/ML engineers and researchers working with agentic systems

Repository: https://github.com/fsilavong/agent-eval · 13 stars

View on Hacker News