Show HN: Many GPUs – A Capacity Planner for Your LLM Inference

Category: devtools

Tags: llm-inference, capacity-planning, gpu-sizing, simulation, streamlit

Score: 6.3/10 (Innovation: 5, Technical: 6, Documentation: 7, Utility: 7)

HowManyGPUs is a capacity planning tool that estimates the number of GPUs needed to serve an LLM inference workload based on throughput, memory bandwidth, and KV cache constraints. It combines analytical formulas with discrete-event simulation in SimPy and presents results via a Streamlit app, making GPU sizing more accessible to practitioners.

Target audience: ML engineers, infrastructure engineers, and devops managing LLM serving deployments

Repository: https://github.com/fhalde/howmanygpus · Python · MIT

View on Hacker News