Show HN: Udoc. Dependency-free document extraction in Rust
Category: library
Tags: document-extraction, rust, cli
Score: 7.3/10 (Innovation: 6, Technical: 8, Documentation: 8, Utility: 7)
Udoc is a dependency-free Rust library and CLI for extracting text, tables, and metadata from a wide range of document formats including PDF, Office, and Markdown. Its unique value lies in providing a unified document model, streaming processing for large files, and a hook system for OCR and layout detection, all without external dependencies.
Target audience: backend devs, data engineers, devops
Repository: https://newelh.github.io/udoc/ · Rust · Apache-2.0 · 2 stars
View on Hacker News