Show HN: Khazad – Transparent Semantic Cache for LLM Calls on Redis Vector Sets

Category: infrastructure

Tags: semantic-cache, llm, redis, python, caching

Score: 7.5/10 (Innovation: 7, Technical: 8, Documentation: 8, Utility: 7)

Khazad is a transport-layer semantic cache for LLM API calls that intercepts HTTP traffic and serves semantically equivalent requests from a Redis vector cache with zero code changes. It is interesting because it combines model-aware and conversation-aware caching with streaming support, offering significant latency and cost savings for high-volume LLM workloads.

Target audience: backend devs, data engineers, ML engineers

Repository: https://github.com/GuglielmoCerri/khazad · Python · MIT · 2 stars

View on Hacker News