Show HN: GEDD – Find what your AI agent gets wrong (before your users do)
Category: devtools
Tags: ai-agents, evaluation, llm-testing, grounded-theory, aws
Score: 7.5/10 (Innovation: 8, Technical: 7, Documentation: 8, Utility: 7)
GEDD is a tool that uses grounded theory methodology to help domain experts systematically discover and document failure modes in AI agents, producing a production-ready evaluation pipeline. Its innovative approach shifts from predefined rubrics to evidence-based error codes that evolve with the agent, addressing a critical gap in AI agent evaluation. The project is interesting because it combines qualitative research methods with practical DevOps workflows, enabling non-technical stakeholders to drive quality assurance.
Target audience: product managers, domain experts, ml engineers
Repository: https://github.com/aws-samples/sample-GEDD · Python · MIT-0 · 1 stars
View on Hacker News