Show HN: I made a small helper for checking model-graded answers

Category: devtools

Tags: llm-judge, evaluation-framework, ai-audit

Score: 6.5/10 (Innovation: 6, Technical: 7, Documentation: 7, Utility: 6)

CMG provides an audit layer for LLM-as-a-judge evaluations by forcing judges to back verdicts with explicit claims linked to evidence, then flagging inconsistencies like missing citations or rubric gaps. It offers a practical approach to increasing transparency in model grading without requiring a second model, with built-in CLI and web dashboards for review. Its integration with frameworks like DeepEval and Inspect AI adds utility for researchers and engineers running large-scale eval runs.

Target audience: AI researchers, ML engineers, backend devs

Repository: https://github.com/MatteoLeonesi/claim-memory-graph-sdk · Python · NOASSERTION · 2 stars

View on Hacker News