Show HN: AST-guard A gradient-immune structural guard against RL reward hacking

Category: security

Tags: ai-safety, reward-hacking, ast-analysis

Score: 6.3/10 (Innovation: 7, Technical: 7, Documentation: 7, Utility: 4)

ast-guard is a deterministic AST-based pre-execution gate for LLM-generated code, designed to detect structural reward hacking in RL training loops. Its key innovation is gradient-immune structural analysis that cannot be bypassed through model reasoning, forcing attackers into detectable semantic hacks. The project is well-researched with empirical RL validation, but remains an experimental research artifact with niche utility.

Target audience: researchers, ai-engineers

Repository: https://github.com/Nick-is-building/ast-guard · Python · MIT · 1 stars

View on Hacker News