Show HN: UltraCompress – first mathematically lossless 5-bit LLM compression

Category: ai-ml

Tags: llm-compression, quantization, transformers, gpu-optimization, open-source-ai

Score: 8.3/10 (Innovation: 9, Technical: 9, Documentation: 7, Utility: 8)

UltraCompress is a compression infrastructure for trained transformers that achieves mathematically lossless 5-bit compression of large language models up to 405B parameters on a single consumer GPU, with minimal perplexity degradation. It's interesting because it combines streaming compression, per-layer low-rank correction, and novel quantization techniques to make huge models deployable on limited hardware, potentially democratizing access to state-of-the-art LLMs.

Target audience: machine learning engineers, AI researchers, backend devs

Repository: https://github.com/sipsalabs/ultracompress · Python · NOASSERTION · 9 stars

View on Hacker News