Show HN: Many GPUs – A Capacity Planner for Your LLM Inference
Category: devtools
Tags: llm-inference, capacity-planning, gpu-sizing, simulation, streamlit
Score: 6.3/10 (Innovation: 5, Technical: 6, Documentation: 7, Utility: 7)
HowManyGPUs is a capacity planning tool that estimates the number of GPUs needed to serve an LLM inference workload based on throughput, memory bandwidth, and KV cache constraints. It combines analytical formulas with discrete-event simulation in SimPy and presents results via a Streamlit app, making GPU sizing more accessible to practitioners.
Target audience: ML engineers, infrastructure engineers, and devops managing LLM serving deployments
Repository: https://github.com/fhalde/howmanygpus · Python · MIT
View on Hacker News