Show HN: Needle: We Distilled Gemini Tool Calling into a 26M Model

Category: ai-ml

Tags: tiny-ai, tool-calling, on-device-ml

Score: 7.8/10 (Innovation: 8, Technical: 9, Documentation: 7, Utility: 7)

Needle is a 26M parameter model that distills Gemini's tool-calling capability into a tiny, efficient architecture using Simple Attention Networks, enabling on-device inference and fine-tuning. It's interesting for its novel encoder-decoder design with cross-attention, tied embeddings, and no FFN layers, achieving high performance on single-shot function calls despite its small size.

Target audience: backend devs, data engineers, ai researchers

Repository: https://github.com/cactus-compute/needle · Python · MIT · 2 stars

View on Hacker News