Show HN: Khazad – Transparent Semantic Cache for LLM Calls on Redis Vector Sets
Category: infrastructure
Tags: semantic-cache, llm, redis, python, caching
Score: 7.5/10 (Innovation: 7, Technical: 8, Documentation: 8, Utility: 7)
Khazad is a transport-layer semantic cache for LLM API calls that intercepts HTTP traffic and serves semantically equivalent requests from a Redis vector cache with zero code changes. It is interesting because it combines model-aware and conversation-aware caching with streaming support, offering significant latency and cost savings for high-volume LLM workloads.
Target audience: backend devs, data engineers, ML engineers
Repository: https://github.com/GuglielmoCerri/khazad · Python · MIT · 2 stars
View on Hacker News