Show HN: I benchmarked how good LLMs are at proofreading English

Category: ai-ml

Tags: llm-benchmark, proofreading, agent-loop

Score: 6.8/10 (Innovation: 6, Technical: 7, Documentation: 8, Utility: 6)

ErrataBench is a benchmark and agentic framework for evaluating how well LLMs proofread text, using tool-calling loops to find and fix errors. It provides a standardized dataset, detailed scoring, and support for custom endpoints, making it a useful tool for comparing model performance on a focused NLP task.

Target audience: ai researchers, data engineers, ml engineers

Repository: https://github.com/reviseio/errata-bench · TypeScript · 2 stars

View on Hacker News