Show HN: MinLlama – Llama 3.2 inference in ~100 lines of NumPy

Category: ai-ml

Tags: llm-inference, numpy, education

Score: 5.5/10 (Innovation: 4, Technical: 7, Documentation: 6, Utility: 5)

MinLlama provides a minimal, pure NumPy implementation of Llama 3.2 inference, demonstrating the core transformer architecture in ~100 lines, including variants with KV cache, PyTorch, and JAX backends. It's interesting as a learning tool for understanding how modern LLM inference works under the hood without heavy dependencies.

Target audience: ai researchers, ml engineers, students

Repository: https://github.com/timothygao8710/minLlama · Python · MIT

View on Hacker News