Show HN: Local CPU OCR for images, PDFs, webpages
Category: cli-tool
Tags: ocr, cpu-only, offline, cli-tool, python, paddleocr, document-digitization
Score: 6.8/10 (Innovation: 6, Technical: 6, Documentation: 8, Utility: 7)
textsnap is a CLI tool that performs CPU-based OCR on images, PDFs, and webpages using a quantized vision-language model, outputting markdown or plaintext. Its innovation lies in offline, portable, and clipboard-integrated OCR without GPU or cloud dependencies, and its thorough documentation with security considerations makes it a practical choice for local document digitization.
Target audience: backend devs, devops, data engineers
Repository: https://github.com/kouhxp/textsnap · Python · MIT · 21 stars
View on Hacker News