Show HN: Rogue-Bench – LLMs play the game Rogue
Category: ai-ml
Tags: benchmark, llm-agents, roguelike
Score: 5.5/10 (Innovation: 5, Technical: 6, Documentation: 6, Utility: 5)
Rogue-Bench is a benchmark framework that evaluates how well LLM-based agents can play the classic dungeon crawler Rogue by interfacing with a headless executable via pipes. It provides a reproducible, metrics-driven environment for testing agent performance, which is interesting for bridging game AI and LLM evaluation.
Target audience: ai researchers, machine learning engineers, game ai enthusiasts
Repository: https://iwhalen.github.io/rogue-bench/ · Python · GPL-3.0 · 2 stars
View on Hacker News