Show HN: GEDD – A Systematic Evidence Driven LLM as a Judge Framework
Category: devtools
Tags: llm-evaluation, ai-agents, annotation-tool, llm-as-judge, aws
Score: 6.8/10 (Innovation: 7, Technical: 6, Documentation: 8, Utility: 6)
GEDD is a systematic framework for domain experts to annotate AI agent failures and convert them into executable LLM-as-a-judge prompts, bridging product and engineering workflows. It provides an annotation-first web app with demos, codebook generation, and CI handoff, making expert review reproducible and operational. The project is interesting for its structured approach to turning qualitative human evaluation into quantitative release gates.
Target audience: ml engineers, product managers, domain experts, devops
Repository: https://github.com/aws-samples/sample-GEDD · Python · MIT-0 · 5 stars
View on Hacker News