Show HN: Claude Code skills for building LLM evals

Category: devtools

Tags: llm-evals, claude-code, ai-observability

Score: 5.5/10 (Innovation: 5, Technical: 4, Documentation: 7, Utility: 6)

This project provides a structured workflow, packaged as Claude Code skills, for building and validating LLM evaluations. It guides developers through steps like annotating logs, discovering failure patterns, and creating judge prompts, which is useful for improving LLM reliability. The project is interesting for its practical methodology that formalizes the ad-hoc process of LLM evaluation.

Target audience: backend devs, data engineers

Repository: https://github.com/latitude-dev/eval-skills · 9 stars

View on Hacker News