Show HN: Rubric – test what your LLM agent did, not just what it said
Category: devtools
Tags: llm-testing, agent-evaluation, ci-integration
Score: 6.5/10 (Innovation: 6, Technical: 6, Documentation: 7, Utility: 7)
Rubric is an open-source Python library for testing LLM agent behavior by focusing on tool calls, argument trace, latency, and reasoning rather than just final output. It provides CI-native regression detection with baseline diffing and PR comments, filling a known gap in agent evaluation with zero-wiring integration for LangGraph and OpenAI-based agents.
Target audience: AI/ML engineers, LLM app developers, backend devs
Repository: https://github.com/Kareem-Rashed/rubric-eval · Python · MIT · 10 stars
View on Hacker News