Show HN: NanoEuler – GPT-2 scale model in pure C/CUDA from scratch

Category: other

Tags: gpt-2, cuda, transformer, from-scratch, flash-attention, language-model, bpe-tokenizer

Score: 7.5/10 (Innovation: 7, Technical: 10, Documentation: 8, Utility: 5)

NanoEuler is a GPT-2-scale language model built entirely from scratch in C/CUDA without any ML libraries, featuring hand-written forward and backward passes, a byte-level BPE tokenizer, and a full training pipeline from pretraining to supervised fine-tuning. Its innovative combination of a complete from-scratch implementation, verified gradient checks, and hand-written FlashAttention on a consumer GPU makes it a standout educational and research artifact for understanding transformer internals.

Target audience: ML researchers, systems programmers, and engineers learning deep learning internals

Repository: https://github.com/JustVugg/nanoeuler · Cuda · MIT

View on Hacker News