Show HN: Claude Code skills for building LLM evals
Category: devtools
Tags: llm-evals, claude-code, ai-observability
Score: 5.5/10 (Innovation: 5, Technical: 4, Documentation: 7, Utility: 6)
This project provides a structured workflow, packaged as Claude Code skills, for building and validating LLM evaluations. It guides developers through steps like annotating logs, discovering failure patterns, and creating judge prompts, which is useful for improving LLM reliability. The project is interesting for its practical methodology that formalizes the ad-hoc process of LLM evaluation.
Target audience: backend devs, data engineers
Repository: https://github.com/latitude-dev/eval-skills ยท 9 stars
View on Hacker News