Show HN: A 150M model that extracts verbatim evidence spans for RAG, no LLM call

Category: ai-ml

Tags: rag, hallucination-prevention, bert, evidence-extraction, ml

Score: 7.8/10 (Innovation: 7, Technical: 8, Documentation: 8, Utility: 8)

Verbatim RAG is a minimalistic retrieval-augmented generation framework that prevents hallucination by extracting and composing responses from exact verbatim text spans with citations, entirely avoiding LLM calls via a 150M-parameter ModernBERT classifier. Its innovative combination of span extraction, sparse embeddings, and CPU-only operation fills a known gap for lightweight, evidence-grounded QA, backed by strong benchmarks and datasets.

Target audience: backend devs, data engineers, ml researchers, devops

Repository: https://huggingface.co/KRLabsOrg/verbatim-rag-modern-bert-v2 · Python · MIT · 190 stars

View on Hacker News