Show HN: Agent-skills-eval – Test whether Agent Skills improve outputs

Category: devtools

Tags: agent-skills, eval, llm-testing, typescript, cli

Score: 7.0/10 (Innovation: 7, Technical: 7, Documentation: 8, Utility: 6)

This is a test runner for Agent Skills (the open standard from Anthropic) that empirically measures whether an agent skill improves model outputs by running the same prompt with and without the skill, then having a judge model grade both sides. It's interesting because it fills a validation gap in the Agent Skills ecosystem, combining a CLI, TypeScript SDK, and static HTML reports for rigorous, reproducible evaluation.

Target audience: backend devs, AI engineers, researchers

Repository: https://github.com/darkrishabh/agent-skills-eval · TypeScript · MIT · 261 stars

View on Hacker News