Show HN: A Transformer Is All You Need
Category: ai-ml
Tags: mechanistic-interpretability, transformer-interpretability, weight-attribution, singular-value-decomposition, llm-analysis
Score: 7.0/10 (Innovation: 8, Technical: 9, Documentation: 4, Utility: 7)
This project introduces a hybrid weight–activation probe that, for any prompt and decoder-only transformer, identifies the specific weights, layers, and singular directions responsible for the model's next-token decision, addressing a fundamental gap in mechanistic interpretability. It combines SVD-based alignment of residual-stream activations with a cross-layer transformer to produce per-layer and per-weight-family attribution, validated by gradient-attribution perturbation. The approach is demonstrated across multiple major transformer architectures (GPT-2, Pythia, Mistral, LLaMA 3), revealing uniform weight-family attribution as an unexpected emergent property.
Target audience: ai researchers, ml engineers, interpretability researchers
Repository: https://zenodo.org/records/20906443 · Python · GPL-3.0 · 263 stars
View on Hacker News