Show HN: Thaw – Git branch for a running LLM (fork agents, skip prefill)

Category: infrastructure

Tags: llm-inference, model-forking, kv-cache-snapshots, rl-training, agent-branching

Score: 8.5/10 (Innovation: 8, Technical: 9, Documentation: 9, Utility: 8)

Thaw provides a primitive for forking and snapshotting running LLM inference sessions, enabling parallel agent branching and RL rollouts with sub-second fork times by skipping costly prefills. It combines weight freeze, KV cache serialization, and scheduler state restoration with pipelined DMA transfers, achieving bit-identical restores and 400x amortization over cold boot. This fills a critical gap for RL training and agent-based workflows where repeated prefills dominate costs.

Target audience: ML engineers, RL post-training teams, agent framework developers

Repository: https://github.com/thaw-ai/thaw · Python · Apache-2.0 · 4 stars

View on Hacker News