Show HN: I made a small helper for checking model-graded answers

Category: library

Tags: llm-evaluation, model-judging, audit-tool

Score: 6.8/10 (Innovation: 6, Technical: 7, Documentation: 8, Utility: 6)

CMG (Claim Memory Graph) is a Python library that adds an audit layer to LLM-based judges by requiring them to back each verdict with explicit claims tied to evidence, flagging inconsistencies for human review. It addresses the well-known unreliability of model graders without attempting to fix biases, instead making them transparent and traceable. This is interesting for AI evaluation pipelines, especially in research and production settings where trust in automated scoring is critical.

Target audience: data engineers, ai researchers, ml engineers

Repository: https://github.com/MatteoLeonesi/claim-memory-graph-sdk · Python · NOASSERTION · 3 stars

View on Hacker News