Pion: A Novel Optimizer for Overcoming Limitations of Muon in Various Learning Scenarios
In the realm of matrix-aware optimizers, stands out as a powerful tool that leverages to enforce . This uniform spectral whitening technique has proven effective in enhancing exploration and surpassing AdamW in Large Language Models (LLM) pretraining tasks. However, recent research has uncovered potential limitations of Muon beyond pretraining. One key area where Muon may face challenges is in cross-modality vision-language-action (VLA) training. The uniform spectral whitening approach used by Muon can amplify noisy tail directions due to inherently low-rank action-module gradients. Similarly, in reinforcement learning with verifiable rewards (RLVR) tasks where low Signal-to-Noise Ratio (SNR) gradients are prevalent and per-head specialization from prior training needs to be preserved, Muon's whitening mechanism may prove unstable. To address these challenges, a new optimizer called has been introduced as a drop-in replacement for Muon. Pion maintains the computational efficiency of its predecessor while implementing a novel two-stage Promotion+Suppression mechanism known as the high-pass NS iteration. This design induces a sharp spectral high-pass effect by anchoring dominant singular values at 1 while suppressing noisy tail components towards 0 with controllable filter strength. Moreover, Pion offers support for a per-head mode that enables updates to be applied independently across attention heads via a simple reshape operation at no additional cost. In empirical evaluations on LIBERO and LIBERO-Plus datasets using l_1-regression and flow-matching architectures, Pion consistently outperforms both Muon and AdamW. For instance, achieving a remarkable 100% success rate on LIBERO Object after just 1,500 training steps with VLA-Adapter compared to 97.0% for Muon and only 32.2% for AdamW. Furthermore, Pion's advantages extend beyond simulation environments to real-world applications such as robotics tasks involving the Franka Research 3 robot with a pi_0.5 backbone under the DROID setup on grasp-and-place tasks. In RLVR post-training experiments on Qwen3-1.7B/4B datasets using GRPO and GMPO methods, Pion also outperforms AdamW on MATH and GSM8K benchmarks while Muon experiences performance degradation leading to zero results. Overall, the introduction of Pion represents a significant advancement in addressing the limitations of Muon beyond pretraining through its innovative high-pass NS iteration mechanism and support for maintaining pretrained per-head heterogeneity in various challenging learning scenarios.
- - **Muon Limitations:**
- - Challenges in cross-modality vision-language-action (VLA) training due to noisy tail directions and low-rank action-module gradients.
- - Instability in reinforcement learning with verifiable rewards (RLVR) tasks with low Signal-to-Noise Ratio (SNR) gradients.
- - **Introduction of Pion:**
- - A new optimizer introduced as a drop-in replacement for Muon to address its limitations.
- - Implements a novel two-stage Promotion+Suppression mechanism known as the high-pass NS iteration for spectral whitening.
- - **Pion Features and Benefits:**
- - Maintains computational efficiency while inducing a sharp spectral high-pass effect and suppressing noisy tail components.
- - Offers support for per-head mode enabling independent updates across attention heads at no additional cost.
- - **Performance Comparison:**
- - Empirical evaluations show Pion consistently outperforms both Muon and AdamW on LIBERO and LIBERO-Plus datasets using various architectures.
- - Achieves remarkable success rates in simulation environments and real-world applications, surpassing competitors in robotics tasks involving the Franka Research 3 robot.
SummaryMuon Limitations: Muon faces challenges in training that involve seeing, talking, and doing things together because of noisy directions and weak action instructions. It also struggles with learning tasks that have unclear rewards and weak signals.
Introduction of Pion: Pion is a new tool created to replace Muon and solve its problems. It uses a special method called Promotion+Suppression to improve the way it processes information.
Pion Features and Benefits: Pion works efficiently by focusing on important details while ignoring distracting ones. It can update different parts independently without extra effort.
Performance Comparison: Pion has been shown to perform better than both Muon and AdamW in tests using specific datasets. It has achieved great results in both simulations and real-world tasks involving robots.
Definitions- Cross-modality vision-language-action (VLA): Involves combining what you see, say, and do together.
- Reinforcement learning with verifiable rewards (RLVR): Learning through trial-and-error with clear rewards for success.
- Signal-to-Noise Ratio (SNR): The ratio between useful information (signal) and unwanted interference (noise).
- Optimizer: A tool used to improve the efficiency of learning algorithms.
- Spectral whitening: A process that enhances important details while reducing distractions.
- Empirical evaluations: Tests based on practical observations rather than theory.
Introduction
In the world of machine learning, optimizers play a crucial role in training deep neural networks. These algorithms are responsible for adjusting the weights and biases of a model to minimize its loss function and improve its performance. One such optimizer that has gained attention recently is Muon, which uses uniform spectral whitening to enhance exploration and surpass other optimizers like AdamW in Large Language Models (LLM) pretraining tasks.
However, recent research has uncovered potential limitations of Muon beyond pretraining. In particular, it may face challenges in cross-modality vision-language-action (VLA) training and reinforcement learning with verifiable rewards (RLVR) tasks. To address these challenges, a new optimizer called Pion has been introduced as a drop-in replacement for Muon.
The Limitations of Muon
While Muon's uniform spectral whitening technique has proven effective in LLM pretraining tasks, it may not be suitable for all types of learning scenarios. In VLA training, where low-rank action-module gradients are prevalent, the whitening mechanism used by Muon can amplify noisy tail directions and lead to unstable results. Similarly, in RLVR tasks where low Signal-to-Noise Ratio (SNR) gradients are common and per-head specialization from prior training needs to be preserved, Muon's approach may prove inadequate.
The Introduction of Pion
To overcome these limitations of Muon, researchers have introduced Pion as an alternative optimizer that maintains the computational efficiency of its predecessor while addressing its shortcomings. The key innovation behind Pion is its two-stage Promotion+Suppression mechanism known as the high-pass NS iteration.
This design induces a sharp spectral high-pass effect by anchoring dominant singular values at 1 while suppressing noisy tail components towards 0 with controllable filter strength. This allows Pion to effectively handle low-SNR gradients and maintain per-head specialization in RLVR tasks.
Empirical Evaluations
To test the effectiveness of Pion, researchers conducted empirical evaluations on LIBERO and LIBERO-Plus datasets using l_1-regression and flow-matching architectures. The results showed that Pion consistently outperformed both Muon and AdamW, achieving a remarkable 100% success rate on LIBERO Object after just 1,500 training steps with VLA-Adapter compared to 97.0% for Muon and only 32.2% for AdamW.
Furthermore, Pion's advantages extend beyond simulation environments to real-world applications such as robotics tasks involving the Franka Research 3 robot with a pi_0.5 backbone under the DROID setup on grasp-and-place tasks.
In RLVR post-training experiments on Qwen3-1.7B/4B datasets using GRPO and GMPO methods, Pion also outperformed AdamW on MATH and GSM8K benchmarks while Muon experienced performance degradation leading to zero results.
Conclusion
The introduction of Pion represents a significant advancement in addressing the limitations of Muon beyond pretraining through its innovative high-pass NS iteration mechanism and support for maintaining pretrained per-head heterogeneity in various challenging learning scenarios. With its impressive performance in both simulation environments and real-world applications, Pion has proven itself as a powerful optimizer that can overcome the shortcomings of Muon in various learning scenarios. Further research into this novel optimizer is sure to yield even more promising results in the future.