Show HN: Mapping Sonnet's thinking process via flame charts
Category: other
Tags: benchmark, lambda-calculus, ai-evaluation, type-script, reasoning-trace
Score: 6.5/10 (Innovation: 6, Technical: 8, Documentation: 7, Utility: 5)
λ-bench evaluates AI models on their ability to solve 120 pure lambda calculus programming problems, using a custom minimal lambda calculus language called Lamb. The project is interesting for its rigorous, large-scale benchmark that tests algorithmic reasoning in a purely functional paradigm, and for its analysis of reasoning trace variance via flame charts.
Target audience: ai researchers, machine learning engineers, programming language theorists
Repository: https://adamsohn.com/lambda-variance/ · TypeScript · 47 stars
View on Hacker News