Muon-OGD: Muon-based Spectral Orthogonal Gradient Projection for LLM Continual Learning
Published in arXiv preprint, 2026
We introduce Muon-OGD, an optimizer that augments Muon’s matrix-aware preconditioning with a spectral orthogonal gradient projection mechanism. The projection isolates updates to directions that are orthogonal — in the appropriate spectral subspace — to representations of previously learned tasks, mitigating catastrophic forgetting in LLM continual learning while preserving the stability and efficiency benefits of Muon.

C = span{C₁,…,C_k}, (2) the spectral-norm geometry that gives Muon its update rule, and (3) a dual gradient update that produces a corrected matrix H before applying the Muon-like step.Recommended citation: Lu, B., Deng, Z., Zhang, R., Hu, B., Zhao, Y., Tian, Y., Mou, C., Lin, G., & Li, X. (2026). "Muon-OGD: Muon-based Spectral Orthogonal Gradient Projection for LLM Continual Learning." arXiv:2605.08949.
Download Paper
