Show HN: 2500 vision benchmarks / evals for Vision Language Models

Category: ai-ml

Tags: ai-evaluation, vision-language-models, benchmark-dataset

Score: 7.3/10 (Innovation: 7, Technical: 6, Documentation: 8, Utility: 8)

An auto-updating catalog of 2,671 vision-language model benchmarks, automatically curated by scanning arXiv daily and classifying papers with Claude. It's interesting because it solves the discovery problem for VLM evaluation by providing a structured, programmatic dataset that tracks the rapidly evolving multimodal AI research landscape.

Target audience: ai-researchers, ml-engineers, data-scientists

Repository: https://github.com/Overshoot-ai/vlm-benchmarks · Python · MIT · 1 stars

View on Hacker News