Show HN: 97% on SWE-bench Verified with subscription-token agents
Category: ai-ml
Tags: swe-bench, agent-pipeline, reproducibility
Score: 7.5/10 (Innovation: 7, Technical: 8, Documentation: 8, Utility: 7)
A three-stage agent pipeline for SWE-bench Verified that chains recon, craft, and audit skills to achieve high pass rates on software engineering benchmarks. Its emphasis on auditability, frozen artifacts, and append-only commit history makes it a standout for reproducibility and skepticism-friendly evaluation.
Target audience: developers evaluating LLM code generation capabilities, ML researchers
Repository: https://github.com/kimjune01/swebench-verified · Shell · GPL-3.0
View on Hacker News