Show HN: PDF 2 Context – Convert PDF text to JSONL files
Category: cli-tool
Tags: pdf, rag, cli-tool, jsonl, ocr
Score: 5.0/10 (Innovation: 3, Technical: 5, Documentation: 7, Utility: 5)
PDF 2 Context is a CLI tool that converts PDF directories into chunked JSONL files optimized for LLM and RAG pipelines. It combines text extraction, OCR fallback, and configurable chunking with a TUI interface, making it practical for preprocessing document corpora.
Target audience: data engineers, ML engineers, backend devs
Repository: https://github.com/EwanValentine/pdf2context · Go
View on Hacker News