Show HN: Overlapping Speaker Transcription Model

Category: ai-ml

Tags: speech-recognition, transcription, whisper, multi-speaker, ai

Score: 7.0/10 (Innovation: 7, Technical: 7, Documentation: 8, Utility: 7)

Chorus-v1 is a fine-tuned Whisper model that transcribes overlapping two-speaker audio into separate, timestamped transcripts per speaker in a single forward pass per speaker, eliminating the need for a separate diarization step. It's interesting because it addresses a common pain point in meeting transcription—speaker overlap—with a clever token-conditioning approach that directly integrates speaker separation into the ASR model itself.

Target audience: data engineers, backend devs, devops

Repository: https://huggingface.co/Trelis/Chorus-v1

View on Hacker News