Show HN: Llama CPU Benchmarks

Category: other

Tags: llm-benchmarks, cpu-inference, tool-calling

Score: 5.0/10 (Innovation: 4, Technical: 5, Documentation: 6, Utility: 5)

This project benchmarks small tool-calling LLMs on CPU, comparing models like Qwen3.5-4B and Gemma-4-E4B-it with various KV-cache compression techniques. It reveals surprising results, such as Gemma outperforming speedup tricks, making it interesting for optimizing LLM inference on commodity hardware.

Target audience: backend devs, data engineers

Repository: https://deemwar-products.github.io/llama-cpu-benchmarks/ · Python

View on Hacker News