Show HN: Rubric – test what your LLM agent did, not just what it said

Category: devtools

Tags: llm-testing, agent-evaluation, ci-integration

Score: 6.5/10 (Innovation: 6, Technical: 6, Documentation: 7, Utility: 7)

Rubric is an open-source Python library for testing LLM agent behavior by focusing on tool calls, argument trace, latency, and reasoning rather than just final output. It provides CI-native regression detection with baseline diffing and PR comments, filling a known gap in agent evaluation with zero-wiring integration for LangGraph and OpenAI-based agents.

Target audience: AI/ML engineers, LLM app developers, backend devs

Repository: https://github.com/Kareem-Rashed/rubric-eval · Python · MIT · 10 stars

View on Hacker News