Show HN: NanoEuler – GPT-2 scale model in pure C/CUDA from scratch
Category: other
Tags: gpt-2, cuda, transformer, from-scratch, flash-attention, language-model, bpe-tokenizer
Score: 7.5/10 (Innovation: 7, Technical: 10, Documentation: 8, Utility: 5)
NanoEuler is a GPT-2-scale language model built entirely from scratch in C/CUDA without any ML libraries, featuring hand-written forward and backward passes, a byte-level BPE tokenizer, and a full training pipeline from pretraining to supervised fine-tuning. Its innovative combination of a complete from-scratch implementation, verified gradient checks, and hand-written FlashAttention on a consumer GPU makes it a standout educational and research artifact for understanding transformer internals.
Target audience: ML researchers, systems programmers, and engineers learning deep learning internals
Repository: https://github.com/JustVugg/nanoeuler · Cuda · MIT
View on Hacker News