Show HN: Rapid-MLX – Run local LLMs on Mac, 2-3x faster than alternatives

Category: ai-ml

Tags: llm, apple-silicon, local-ai, openai-api, mlx

Score: 7.3/10 (Innovation: 6, Technical: 7, Documentation: 8, Utility: 8)

Rapid-MLX is a high-performance local LLM server for Apple Silicon Macs that provides an OpenAI-compatible API, enabling developers to run models like Qwen and Gemma locally with 2-4x faster inference than alternatives like llama.cpp. It's particularly interesting because it combines MLX's Apple-optimized backend with comprehensive tool-calling support and a detailed Model-Harness Index (MHI) that benchmarks agent framework compatibility.

Target audience: backend devs, ai engineers, mac developers

Repository: https://github.com/raullenchai/Rapid-MLX · Python · Apache-2.0 · 268 stars

View on Hacker News