Show HN: TurboPrefill – Multi-GPU prefill acceleration for llama.cpp
Category: library
Tags: llama-cpp, multi-gpu, prefill-optimization
Score: 7.0/10 (Innovation: 8, Technical: 7, Documentation: 7, Utility: 6)
TurboPrefill is an optimization overlay for llama.cpp that accelerates multi-GPU prefill by reordering ubatch execution in a pipeline fashion, inspired by production line scheduling. It achieves up to 2.23x speedup on long-context prompts without modifying model weights or outputs, targeting a niche but impactful scenario for multi-GPU layer-split inference.
Target audience: backend devs
Repository: https://github.com/sergey-automation/TurboPrefill · C++ · MIT
View on Hacker News