Show HN: Rogue-Bench – LLMs play the game Rogue

Category: ai-ml

Tags: benchmark, llm-agents, roguelike

Score: 5.5/10 (Innovation: 5, Technical: 6, Documentation: 6, Utility: 5)

Rogue-Bench is a benchmark framework that evaluates how well LLM-based agents can play the classic dungeon crawler Rogue by interfacing with a headless executable via pipes. It provides a reproducible, metrics-driven environment for testing agent performance, which is interesting for bridging game AI and LLM evaluation.

Target audience: ai researchers, machine learning engineers, game ai enthusiasts

Repository: https://iwhalen.github.io/rogue-bench/ · Python · GPL-3.0 · 2 stars

View on Hacker News