Show HN: Gemma 3 inference in pure C++ with Metal acceleration
Category: library
Tags: llm-inference, metal, apple-silicon
Score: 5.3/10 (Innovation: 4, Technical: 6, Documentation: 6, Utility: 5)
MetalChat is a C++ framework for running LLM inference on Apple Silicon using Metal acceleration, supporting Meta Llama and Google Gemma models. It leverages Apple's GPU via Metal API for performance, offering both a library and CLI, though with early-stage API instability. The project is interesting for its niche focus on efficient local LLM inference on macOS.
Target audience: ml-engineers, ios-developers, backend-devs
Repository: https://github.com/ybubnov/metalchat · C++ · GPL-3.0 · 20 stars
View on Hacker News