Show HN: Gemma 3 inference in pure C++ with Metal acceleration

Category: library

Tags: llm-inference, metal, apple-silicon

Score: 5.3/10 (Innovation: 4, Technical: 6, Documentation: 6, Utility: 5)

MetalChat is a C++ framework for running LLM inference on Apple Silicon using Metal acceleration, supporting Meta Llama and Google Gemma models. It leverages Apple's GPU via Metal API for performance, offering both a library and CLI, though with early-stage API instability. The project is interesting for its niche focus on efficient local LLM inference on macOS.

Target audience: ml-engineers, ios-developers, backend-devs

Repository: https://github.com/ybubnov/metalchat · C++ · GPL-3.0 · 20 stars

View on Hacker News