Show HN: I made a small helper for checking model-graded answers
Category: devtools
Tags: llm-judge, evaluation-framework, ai-audit
Score: 6.5/10 (Innovation: 6, Technical: 7, Documentation: 7, Utility: 6)
CMG provides an audit layer for LLM-as-a-judge evaluations by forcing judges to back verdicts with explicit claims linked to evidence, then flagging inconsistencies like missing citations or rubric gaps. It offers a practical approach to increasing transparency in model grading without requiring a second model, with built-in CLI and web dashboards for review. Its integration with frameworks like DeepEval and Inspect AI adds utility for researchers and engineers running large-scale eval runs.
Target audience: AI researchers, ML engineers, backend devs
Repository: https://github.com/MatteoLeonesi/claim-memory-graph-sdk · Python · NOASSERTION · 2 stars
View on Hacker News