Show HN: 97% on SWE-bench Verified with subscription-token agents

Category: ai-ml

Tags: swe-bench, agent-pipeline, reproducibility

Score: 7.5/10 (Innovation: 7, Technical: 8, Documentation: 8, Utility: 7)

A three-stage agent pipeline for SWE-bench Verified that chains recon, craft, and audit skills to achieve high pass rates on software engineering benchmarks. Its emphasis on auditability, frozen artifacts, and append-only commit history makes it a standout for reproducibility and skepticism-friendly evaluation.

Target audience: developers evaluating LLM code generation capabilities, ML researchers

Repository: https://github.com/kimjune01/swebench-verified · Shell · GPL-3.0

View on Hacker News