Show HN: GEDD – A Systematic Evidence Driven LLM as a Judge Framework

Category: devtools

Tags: llm-evaluation, ai-agents, annotation-tool, llm-as-judge, aws

Score: 6.8/10 (Innovation: 7, Technical: 6, Documentation: 8, Utility: 6)

GEDD is a systematic framework for domain experts to annotate AI agent failures and convert them into executable LLM-as-a-judge prompts, bridging product and engineering workflows. It provides an annotation-first web app with demos, codebook generation, and CI handoff, making expert review reproducible and operational. The project is interesting for its structured approach to turning qualitative human evaluation into quantitative release gates.

Target audience: ml engineers, product managers, domain experts, devops

Repository: https://github.com/aws-samples/sample-GEDD · Python · MIT-0 · 5 stars

View on Hacker News