Show HN: Agent-skills-eval – Test whether Agent Skills improve outputs
Category: devtools
Tags: agent-skills, eval, llm-testing, typescript, cli
Score: 7.0/10 (Innovation: 7, Technical: 7, Documentation: 8, Utility: 6)
This is a test runner for Agent Skills (the open standard from Anthropic) that empirically measures whether an agent skill improves model outputs by running the same prompt with and without the skill, then having a judge model grade both sides. It's interesting because it fills a validation gap in the Agent Skills ecosystem, combining a CLI, TypeScript SDK, and static HTML reports for rigorous, reproducible evaluation.
Target audience: backend devs, AI engineers, researchers
Repository: https://github.com/darkrishabh/agent-skills-eval · TypeScript · MIT · 261 stars
View on Hacker News