Show HN: Reliably Incorrect – explore LLM reliability with data visualizations

Category: ai-ml

Tags: llm-reliability, agent-tuning, probabilistic-programming

Score: 6.0/10 (Innovation: 7, Technical: 5, Documentation: 8, Utility: 4)

This project explores the reliability of LLM-based coding agents (specifically Claude Code) by treating their instruction directories (.claude/) as probabilistic programs. It's interesting because it frames agent tuning as a computable probability problem and introduces concepts like context drift and self-reflection loops to make non-deterministic systems more predictable.

Target audience: ai-engineers, ml-researchers, backend-devs

Repository: https://adamsohn.com/reliably-incorrect/ · Python · 16 stars

View on Hacker News