Show HN: Ludion – routing AI inference by observed WebGPU behavior

Category: infrastructure

Tags: ai-inference, load-balancer, webgpu

Score: 7.3/10 (Innovation: 7, Technical: 8, Documentation: 7, Utility: 7)

Ludion is a TypeScript load balancer that routes AI inference requests to either the user's local GPU via WebGPU or to a cloud server, based on empirical device capability observation. It uniquely combines on-device AI execution with cloud fallback in a single API, offering an OpenAI-compatible drop-in that dynamically decides where inference runs per request. This project is interesting for its pragmatic approach to hybrid edge/cloud inference and its detailed documentation on setup and runtime configuration.

Target audience: backend devs, frontend devs, devops

Repository: https://ludion.ai/ · TypeScript · MIT · 2 stars

View on Hacker News