Show HN: I made a small helper for checking model-graded answers
Category: library
Tags: llm-evaluation, model-judging, audit-tool
Score: 6.8/10 (Innovation: 6, Technical: 7, Documentation: 8, Utility: 6)
CMG (Claim Memory Graph) is a Python library that adds an audit layer to LLM-based judges by requiring them to back each verdict with explicit claims tied to evidence, flagging inconsistencies for human review. It addresses the well-known unreliability of model graders without attempting to fix biases, instead making them transparent and traceable. This is interesting for AI evaluation pipelines, especially in research and production settings where trust in automated scoring is critical.
Target audience: data engineers, ai researchers, ml engineers
Repository: https://github.com/MatteoLeonesi/claim-memory-graph-sdk · Python · NOASSERTION · 3 stars
View on Hacker News