Show HN: AST-guard A gradient-immune structural guard against RL reward hacking
Category: security
Tags: ai-safety, reward-hacking, ast-analysis
Score: 6.3/10 (Innovation: 7, Technical: 7, Documentation: 7, Utility: 4)
ast-guard is a deterministic AST-based pre-execution gate for LLM-generated code, designed to detect structural reward hacking in RL training loops. Its key innovation is gradient-immune structural analysis that cannot be bypassed through model reasoning, forcing attackers into detectable semantic hacks. The project is well-researched with empirical RL validation, but remains an experimental research artifact with niche utility.
Target audience: researchers, ai-engineers
Repository: https://github.com/Nick-is-building/ast-guard · Python · MIT · 1 stars
View on Hacker News