Show HN: Kitchen Rush, Overcooked inspired LLM tool calling benchmark
Category: ai-ml
Tags: llm-benchmark, tool-calling, latency-testing
Score: 7.3/10 (Innovation: 7, Technical: 7, Documentation: 9, Utility: 6)
Kitchen Rush is a benchmark for LLM tool-calling that incorporates latency as a first-class metric, inspired by the game Overcooked. It evaluates both the correctness and speed of model decisions in a deterministic kitchen simulation, offering dual leaderboards for different latency budgets. Its novel approach to measuring real-world agent performance makes it interesting for developers of voice assistants and live-ops systems.
Target audience: backend devs, data engineers, ai researchers
Repository: https://github.com/bassimeledath/kitchen-rush · Python · Apache-2.0
View on Hacker News