Show HN: Marlin-2B: a tiny VLM to extract structured information from videos
Category: ai-ml
Tags: video-understanding, temporal-grounding, vlm
Score: 7.3/10 (Innovation: 7, Technical: 8, Documentation: 8, Utility: 7)
Marlin-2B is a compact 2B-parameter video-language model designed to extract structured scene and event descriptions with timestamps from videos. It innovatively combines dense captioning and temporal grounding in a single deployable model, achieving competitive results against much larger models like Gemini while running on consumer GPUs.
Target audience: backend devs
Repository: https://huggingface.co/NemoStation/Marlin-2B
View on Hacker News