Show HN: 150M Mandarin transcription model with real-time metadata detection
Category: ai-ml
Tags: asr, speech-recognition, ctc, multi-task-learning, mandarin, meta-tagging, nemo
Score: 7.3/10 (Innovation: 7, Technical: 8, Documentation: 7, Utility: 7)
PromptingNemo is a training toolkit that enables ASR models to simultaneously output transcriptions and structured meta-tags (like age, gender, emotion, and entity types) in a single CTC decoding pass, bypassing the need for separate downstream models. It innovatively combines CTC loss with inline tag tokenization and provides a full pipeline for data preparation, training, and evaluation, with pretrained models available for multiple languages. The project is technically robust, with support for audio-visual ASR and custom tokenizer training, and is well-documented with installation, usage guides, and architecture details.
Target audience: data engineers, machine learning engineers, speech researchers
Repository: https://huggingface.co/WhissleAI/STT-meta-ZH-150m · Python · Apache-2.0 · 11 stars
View on Hacker News