Show HN: Stateful Inference with 99% Token Savings

Category: library

Tags: llm-inference, stateful-memory, kv-cache, token-savings, neural-ledger

Score: 8.8/10 (Innovation: 9, Technical: 9, Documentation: 9, Utility: 8)

NLS is an inference architecture that captures and re-injects internal model states (KV tensors) from LLM processing to enable stateful, persistent memory without reprocessing chat history, achieving over 99% token savings in long conversations. It combines phantom token injection, attention-based scoring, and cross-session persistence to solve a fundamental cost and latency bottleneck in LLM deployment, with validated agentic recall and behavioral parity benchmarks. This project addresses a widely accepted limitation of transformers with a novel, technically deep approach that has significant practical implications for cost-effective, long-context AI applications.

Target audience: backend devs, data engineers, ai-ml researchers

Repository: https://github.com/umbecanessa/neural-ledger-system · NOASSERTION · 5 stars

View on Hacker News