Show HN: JazzBench, an LLM reasoning benchmark using jazz improvisation

Category: ai-ml

Tags: llm-benchmark, jazz-improvisation, music-ai

Score: 6.0/10 (Innovation: 7, Technical: 6, Documentation: 7, Utility: 4)

JazzBench is a niche LLM reasoning benchmark that evaluates language models on their ability to predict Charlie Parker's jazz improvisation choices given chord progressions, using formal music-theoretic metrics. It is interesting because it tests a rare form of soft, multi-constraint reasoning not covered by typical benchmarks, and includes a reproducible evaluation pipeline with baselines and Claude model results.

Target audience: AI researchers, machine learning engineers, musicologists

Repository: https://flatnine.co/blog/i-built-my-own-eval · Python

View on Hacker News