Show HN: Cascade – A bare-metal C++ proxy that cuts LLM API bills by 70%

Category: infrastructure

Tags: llm-routing, ai-proxy, cost-optimization

Score: 6.8/10 (Innovation: 6, Technical: 8, Documentation: 7, Utility: 6)

Cascade is a bare-metal C++ proxy that dynamically routes LLM API requests to cost-effective models using a distilled ONNX classifier, reducing bills by up to 75% with under 5ms latency. Its innovative combination of a custom C++ runtime with local ML inference addresses the latency bottlenecks of Python-based or SaaS routing solutions. The project is interesting for its potential to optimize enterprise LLM costs without sacrificing performance.

Target audience: backend devs, devops, machine-learning engineers

Repository: https://cascade-router.github.io/cascade-router/ · Python · 1 stars

View on Hacker News