Show HN: Realtime voice agent that sees, hears, and interrupts – on a CPU laptop
Category: ai-ml
Tags: voice-agent, real-time, computer-vision, audio-processing, python, cpu-only
Score: 7.5/10 (Innovation: 7, Technical: 8, Documentation: 9, Utility: 6)
A CPU-only real-time voice agent that replicates the behaviors of Thinking Machines' Interaction Models demo, including speech, vision-keyed proactivity, live translation, and interruptible background tasks. It cleverly combines commodity models (Silero VAD, YOLO11-pose, Piper TTS, LLMs via API) with a Python event loop and WebRTC echo cancellation to achieve low-latency interaction on a single laptop. The pragmatic architecture and detailed performance tuning make it a fascinating DIY approach to advanced voice interfaces.
Target audience: backend devs, ai engineers, ml researchers
Repository: https://github.com/kouhxp/cheap-im · Python · 15 stars
View on Hacker News