Muon-OGD: Muon-based Spectral Orthogonal Gradient Projection for LLM Continual Learning

Published in arXiv preprint, 2026

We introduce Muon-OGD, an optimizer that augments Muon’s matrix-aware preconditioning with a spectral orthogonal gradient projection mechanism. The projection isolates updates to directions that are orthogonal — in the appropriate spectral subspace — to representations of previously learned tasks, mitigating catastrophic forgetting in LLM continual learning while preserving the stability and efficiency benefits of Muon.

Muon-OGD: combines continual learning with a protected subspace, projection-based Frobenius-norm geometry, and the spectral-norm geometry of the Muon update.
Figure. Muon-OGD combines (1) a projection-based continual-learning objective enforcing orthogonality to a protected subspace C = span{C₁,…,C_k}, (2) the spectral-norm geometry that gives Muon its update rule, and (3) a dual gradient update that produces a corrected matrix H before applying the Muon-like step.

Recommended citation: Lu, B., Deng, Z., Zhang, R., Hu, B., Zhao, Y., Tian, Y., Mou, C., Lin, G., & Li, X. (2026). "Muon-OGD: Muon-based Spectral Orthogonal Gradient Projection for LLM Continual Learning." arXiv:2605.08949.
Download Paper