Show HN: ChonkLM – Tiny language models running offline in the browser

Category: ai-ml

Tags: webgpu, llm-inference, offline-ai

Score: 6.8/10 (Innovation: 6, Technical: 8, Documentation: 6, Utility: 7)

ChonkLM is a browser-based inference runtime that runs small language models (135M–600M params) entirely offline using WebGPU, serving weights as tiny shards from Cloudflare Workers. Its novel approach of sharding GGUF files and leveraging Cache API persistence for offline use is interesting for pushing LLM accessibility onto end-user devices without server costs.

Target audience: frontend devs, data engineers, ai researchers

Repository: https://chonklm.com · TypeScript · NOASSERTION

View on Hacker News