Dropping the experts I never use — a home MoE-pruning experiment
I cut out the 'experts' I barely use during my typical dev / project-management / family-vacation work from a 35-billion-parameter MoE model. Spoiler: 25% smaller model, 1.13% higher perplexity — and a whole night of gotcha-hunting along the way.