Show HN: Alloy – a Torch backend and inference engine for Apple Silicon

Category: infrastructure

Tags: gpu-compute, apple-silicon, llm-inference, torch-compile, metal

Score: 7.8/10 (Innovation: 8, Technical: 9, Documentation: 7, Utility: 7)

Alloy is a compiler and runtime for GPU compute kernels on Apple Silicon, providing a `torch.compile` backend and LLM serving. It allows writing GPU kernels in Python that compile to Metal through a tile IR pipeline, enabling advanced features like cooperative tiled GEMM, automatic operator fusion, and supports inference with various model formats including GGUF and MLX.

Target audience: backend devs, machine learning engineers, devops

Repository: https://github.com/rayanht/alloy · Python · Apache-2.0 · 1 stars

View on Hacker News