Show HN: Llama CPU Benchmarks
Category: other
Tags: llm-benchmarks, cpu-inference, tool-calling
Score: 5.0/10 (Innovation: 4, Technical: 5, Documentation: 6, Utility: 5)
This project benchmarks small tool-calling LLMs on CPU, comparing models like Qwen3.5-4B and Gemma-4-E4B-it with various KV-cache compression techniques. It reveals surprising results, such as Gemma outperforming speedup tricks, making it interesting for optimizing LLM inference on commodity hardware.
Target audience: backend devs, data engineers
Repository: https://deemwar-products.github.io/llama-cpu-benchmarks/ ยท Python
View on Hacker News