Show HN: KV-psi, using Linux PSI to to trim an LLM KV cache
Category: library
Tags: llm, kv-cache, memory-management, linux-psi, reference-implementation
Score: 5.5/10 (Innovation: 7, Technical: 6, Documentation: 5, Utility: 4)
KV-psi is a reference implementation that uses Linux Pressure Stall Information (PSI) to dynamically trim the KV cache of an LLM during inference under memory pressure. It demonstrates an innovative combination of system-level memory pressure signals with LLM cache management, though it is currently a niche experimental tool requiring specific Linux features and manual setup.
Target audience: backend devs, ml engineers, systems programmers
Repository: https://github.com/infiniteregrets/kv-psi ยท Python
View on Hacker News