Show HN: Ported Cerebras REAP to MLX – Prune MoE Experts on a MacBook

Category: ai-ml

Tags: mlx, model-pruning, mixture-of-experts

Score: 6.3/10 (Innovation: 5, Technical: 6, Documentation: 8, Utility: 6)

REAP MLX ports the Cerebras REAP method for pruning Mixture-of-Experts (MoE) models to run entirely on Apple Silicon using MLX-LM, enabling local MoE compression experiments without a CUDA stack. It is interesting because it brings advanced model pruning to consumer hardware, uses a clean adapter-based architecture for model family support, and provides detailed run telemetry for comparison.

Target audience: machine learning engineers, researchers interested in model compression, Apple Silicon developers

Repository: https://github.com/egesabanci/reap-mlx · Python · MIT

View on Hacker News