Show HN: Autosynth – generating synthetic data with strong/weak model filtering

Category: library

Tags: synthetic-data, llm, data-augmentation

Score: 6.8/10 (Innovation: 6, Technical: 8, Documentation: 7, Utility: 6)

Autosynth is a Python framework for generating synthetic datasets using an agentic loop of LLM proposals, dual solver evaluation (weak vs strong), and judge-based filtering, inspired by Meta FAIR's Autodata paper. Its domain-agnostic plugin architecture and support for verifiable, rubric-based, or judge-based acceptance make it a flexible tool for data generation across many use cases. The project is early-stage but technically sophisticated, with an event-sourced pipeline over SQLite and support for multi-provider LLM orchestration.

Target audience: data engineers, ML researchers, AI engineers

Repository: https://github.com/Ahmad8864/autosynth · Python · MIT · 1 stars

View on Hacker News