Show HN: Turboquant.cpp – Quantize embeddings to 1-4 bits, no training (400 LoC)
Category: library
Tags: vector-quantization, embeddings, compression
Score: 6.5/10 (Innovation: 7, Technical: 7, Documentation: 6, Utility: 6)
TurboQuant.cpp implements a novel online vector quantization algorithm that compresses high-dimensional embeddings to 1-4 bits per coordinate without training or codebook learning, preserving inner products and distances. It's interesting as a lightweight, theoretically-grounded tool for reducing memory and latency in embedding-heavy systems like retrieval-augmented generation or vector databases.
Target audience: backend devs, data engineers, ml engineers
Repository: https://github.com/RunEdgeAI/turboquant.cpp · C++ · MIT · 2 stars
View on Hacker News