Show HN: I benchmarked how good LLMs are at proofreading English
Category: ai-ml
Tags: llm-benchmark, proofreading, agent-loop
Score: 6.8/10 (Innovation: 6, Technical: 7, Documentation: 8, Utility: 6)
ErrataBench is a benchmark and agentic framework for evaluating how well LLMs proofread text, using tool-calling loops to find and fix errors. It provides a standardized dataset, detailed scoring, and support for custom endpoints, making it a useful tool for comparing model performance on a focused NLP task.
Target audience: ai researchers, data engineers, ml engineers
Repository: https://github.com/reviseio/errata-bench · TypeScript · 2 stars
View on Hacker News