Show HN: Kitchen Rush, Overcooked inspired LLM tool calling benchmark

Category: ai-ml

Tags: llm-benchmark, tool-calling, latency-testing

Score: 7.3/10 (Innovation: 7, Technical: 7, Documentation: 9, Utility: 6)

Kitchen Rush is a benchmark for LLM tool-calling that incorporates latency as a first-class metric, inspired by the game Overcooked. It evaluates both the correctness and speed of model decisions in a deterministic kitchen simulation, offering dual leaderboards for different latency budgets. Its novel approach to measuring real-world agent performance makes it interesting for developers of voice assistants and live-ops systems.

Target audience: backend devs, data engineers, ai researchers

Repository: https://github.com/bassimeledath/kitchen-rush · Python · Apache-2.0

View on Hacker News