Show HN: JazzBench, an LLM reasoning benchmark using jazz improvisation
Category: ai-ml
Tags: llm-benchmark, jazz-improvisation, music-ai
Score: 6.0/10 (Innovation: 7, Technical: 6, Documentation: 7, Utility: 4)
JazzBench is a niche LLM reasoning benchmark that evaluates language models on their ability to predict Charlie Parker's jazz improvisation choices given chord progressions, using formal music-theoretic metrics. It is interesting because it tests a rare form of soft, multi-constraint reasoning not covered by typical benchmarks, and includes a reproducible evaluation pipeline with baselines and Claude model results.
Target audience: AI researchers, machine learning engineers, musicologists
Repository: https://flatnine.co/blog/i-built-my-own-eval ยท Python
View on Hacker News