Show HN: Cascade – A bare-metal C++ proxy that cuts LLM API bills by 70%
Category: infrastructure
Tags: llm-routing, ai-proxy, cost-optimization
Score: 6.8/10 (Innovation: 6, Technical: 8, Documentation: 7, Utility: 6)
Cascade is a bare-metal C++ proxy that dynamically routes LLM API requests to cost-effective models using a distilled ONNX classifier, reducing bills by up to 75% with under 5ms latency. Its innovative combination of a custom C++ runtime with local ML inference addresses the latency bottlenecks of Python-based or SaaS routing solutions. The project is interesting for its potential to optimize enterprise LLM costs without sacrificing performance.
Target audience: backend devs, devops, machine-learning engineers
Repository: https://cascade-router.github.io/cascade-router/ · Python · 1 stars
View on Hacker News