Rethinking Muon Beyond Pretraining: Spectral Failures and High-Pass Remedies for VLA and RLVR

AI-generated keywords: Matrix-aware optimizers Muon Pion Newton-Schulz iterations spectral gradient orthogonalization

AI-generated Key Points

**Muon Limitations:**
Challenges in cross-modality vision-language-action (VLA) training due to noisy tail directions and low-rank action-module gradients.
Instability in reinforcement learning with verifiable rewards (RLVR) tasks with low Signal-to-Noise Ratio (SNR) gradients.
**Introduction of Pion:**
A new optimizer introduced as a drop-in replacement for Muon to address its limitations.
Implements a novel two-stage Promotion+Suppression mechanism known as the high-pass NS iteration for spectral whitening.
**Pion Features and Benefits:**
Maintains computational efficiency while inducing a sharp spectral high-pass effect and suppressing noisy tail components.
Offers support for per-head mode enabling independent updates across attention heads at no additional cost.
**Performance Comparison:**
Empirical evaluations show Pion consistently outperforms both Muon and AdamW on LIBERO and LIBERO-Plus datasets using various architectures.
Achieves remarkable success rates in simulation environments and real-world applications, surpassing competitors in robotics tasks involving the Franka Research 3 robot.

Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Chongyu Fan, Gaowen Liu, Mingyi Hong, Ramana Rao Kompella, Sijia Liu

arXiv: 2605.19282v1 - DOI (cs.LG)

License: CC BY 4.0

Abstract: Muon is a matrix-aware optimizer that leverages Newton-Schulz (NS) iterations to enforce spectral gradient orthogonalization by driving all singular values of the momentum matrix toward 1. While this uniform spectral whitening enhances exploration and outperforms AdamW in LLM pretraining, we show it could lead to fundamental limitations beyond pretraining in two regimes: (i) cross-modality vision-language-action (VLA) training, where inherently low-rank action-module gradients cause amplification of noisy tail directions, and (ii) reinforcement learning with verifiable rewards (RLVR), where low-SNR gradients and the need to preserve per-head specialization from prior training make whitening unstable. To address these challenges, we propose Pion, a drop-in replacement for Muon that preserves its computational efficiency while replacing uniform spectral whitening with a two-stage Promotion+Suppression mechanism, which we call the high-pass NS iteration. This design induces a sharp spectral high-pass effect, anchoring dominant singular values at 1 while suppressing noisy tail components toward 0, with controllable filter strength. To preserve pretrained per-head heterogeneity, Pion also supports a per-head mode that applies updates independently across attention heads via a simple reshape, at no extra cost. In VLA training on LIBERO and LIBERO-Plus, Pion consistently outperforms both baselines across l_1-regression (VLA-Adapter) and flow-matching (VLANeXt) architectures, e.g., reaching 100% success rate on LIBERO Object after 1,500 training steps with VLA-Adapter, vs. 97.0% for Muon and only 32.2% for AdamW. The advantage of Pion further extends to a real Franka Research 3 robot with a pi_0.5 backbone under the DROID setup on three grasp-and-place tasks. In RLVR post-training on Qwen3-1.7B/4B with GRPO and GMPO, Pion also outperforms AdamW on MATH and GSM8K while Muon collapses to zero.

Submitted to arXiv on 19 May. 2026

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2605.19282v1

Comprehensive Summary
Key points
Layman's Summary
Blog article

Pion: A Novel Optimizer for Overcoming Limitations of Muon in Various Learning Scenarios In the realm of matrix-aware optimizers, stands out as a powerful tool that leverages to enforce . This uniform spectral whitening technique has proven effective in enhancing exploration and surpassing AdamW in Large Language Models (LLM) pretraining tasks. However, recent research has uncovered potential limitations of Muon beyond pretraining. One key area where Muon may face challenges is in cross-modality vision-language-action (VLA) training. The uniform spectral whitening approach used by Muon can amplify noisy tail directions due to inherently low-rank action-module gradients. Similarly, in reinforcement learning with verifiable rewards (RLVR) tasks where low Signal-to-Noise Ratio (SNR) gradients are prevalent and per-head specialization from prior training needs to be preserved, Muon's whitening mechanism may prove unstable. To address these challenges, a new optimizer called has been introduced as a drop-in replacement for Muon. Pion maintains the computational efficiency of its predecessor while implementing a novel two-stage Promotion+Suppression mechanism known as the high-pass NS iteration. This design induces a sharp spectral high-pass effect by anchoring dominant singular values at 1 while suppressing noisy tail components towards 0 with controllable filter strength. Moreover, Pion offers support for a per-head mode that enables updates to be applied independently across attention heads via a simple reshape operation at no additional cost. In empirical evaluations on LIBERO and LIBERO-Plus datasets using l_1-regression and flow-matching architectures, Pion consistently outperforms both Muon and AdamW. For instance, achieving a remarkable 100% success rate on LIBERO Object after just 1,500 training steps with VLA-Adapter compared to 97.0% for Muon and only 32.2% for AdamW. Furthermore, Pion's advantages extend beyond simulation environments to real-world applications such as robotics tasks involving the Franka Research 3 robot with a pi_0.5 backbone under the DROID setup on grasp-and-place tasks. In RLVR post-training experiments on Qwen3-1.7B/4B datasets using GRPO and GMPO methods, Pion also outperforms AdamW on MATH and GSM8K benchmarks while Muon experiences performance degradation leading to zero results. Overall, the introduction of Pion represents a significant advancement in addressing the limitations of Muon beyond pretraining through its innovative high-pass NS iteration mechanism and support for maintaining pretrained per-head heterogeneity in various challenging learning scenarios.

- **Muon Limitations:**
- Challenges in cross-modality vision-language-action (VLA) training due to noisy tail directions and low-rank action-module gradients.
- Instability in reinforcement learning with verifiable rewards (RLVR) tasks with low Signal-to-Noise Ratio (SNR) gradients.
- **Introduction of Pion:**
- A new optimizer introduced as a drop-in replacement for Muon to address its limitations.
- Implements a novel two-stage Promotion+Suppression mechanism known as the high-pass NS iteration for spectral whitening.
- **Pion Features and Benefits:**
- Maintains computational efficiency while inducing a sharp spectral high-pass effect and suppressing noisy tail components.
- Offers support for per-head mode enabling independent updates across attention heads at no additional cost.
- **Performance Comparison:**
- Empirical evaluations show Pion consistently outperforms both Muon and AdamW on LIBERO and LIBERO-Plus datasets using various architectures.
- Achieves remarkable success rates in simulation environments and real-world applications, surpassing competitors in robotics tasks involving the Franka Research 3 robot.

SummaryMuon Limitations: Muon faces challenges in training that involve seeing, talking, and doing things together because of noisy directions and weak action instructions. It also struggles with learning tasks that have unclear rewards and weak signals. Introduction of Pion: Pion is a new tool created to replace Muon and solve its problems. It uses a special method called Promotion+Suppression to improve the way it processes information. Pion Features and Benefits: Pion works efficiently by focusing on important details while ignoring distracting ones. It can update different parts independently without extra effort. Performance Comparison: Pion has been shown to perform better than both Muon and AdamW in tests using specific datasets. It has achieved great results in both simulations and real-world tasks involving robots. Definitions- Cross-modality vision-language-action (VLA): Involves combining what you see, say, and do together. - Reinforcement learning with verifiable rewards (RLVR): Learning through trial-and-error with clear rewards for success. - Signal-to-Noise Ratio (SNR): The ratio between useful information (signal) and unwanted interference (noise). - Optimizer: A tool used to improve the efficiency of learning algorithms. - Spectral whitening: A process that enhances important details while reducing distractions. - Empirical evaluations: Tests based on practical observations rather than theory.

Introduction

In the world of machine learning, optimizers play a crucial role in training deep neural networks. These algorithms are responsible for adjusting the weights and biases of a model to minimize its loss function and improve its performance. One such optimizer that has gained attention recently is Muon, which uses uniform spectral whitening to enhance exploration and surpass other optimizers like AdamW in Large Language Models (LLM) pretraining tasks. However, recent research has uncovered potential limitations of Muon beyond pretraining. In particular, it may face challenges in cross-modality vision-language-action (VLA) training and reinforcement learning with verifiable rewards (RLVR) tasks. To address these challenges, a new optimizer called Pion has been introduced as a drop-in replacement for Muon.

The Limitations of Muon

While Muon's uniform spectral whitening technique has proven effective in LLM pretraining tasks, it may not be suitable for all types of learning scenarios. In VLA training, where low-rank action-module gradients are prevalent, the whitening mechanism used by Muon can amplify noisy tail directions and lead to unstable results. Similarly, in RLVR tasks where low Signal-to-Noise Ratio (SNR) gradients are common and per-head specialization from prior training needs to be preserved, Muon's approach may prove inadequate.

The Introduction of Pion

To overcome these limitations of Muon, researchers have introduced Pion as an alternative optimizer that maintains the computational efficiency of its predecessor while addressing its shortcomings. The key innovation behind Pion is its two-stage Promotion+Suppression mechanism known as the high-pass NS iteration. This design induces a sharp spectral high-pass effect by anchoring dominant singular values at 1 while suppressing noisy tail components towards 0 with controllable filter strength. This allows Pion to effectively handle low-SNR gradients and maintain per-head specialization in RLVR tasks.

Empirical Evaluations

To test the effectiveness of Pion, researchers conducted empirical evaluations on LIBERO and LIBERO-Plus datasets using l_1-regression and flow-matching architectures. The results showed that Pion consistently outperformed both Muon and AdamW, achieving a remarkable 100% success rate on LIBERO Object after just 1,500 training steps with VLA-Adapter compared to 97.0% for Muon and only 32.2% for AdamW. Furthermore, Pion's advantages extend beyond simulation environments to real-world applications such as robotics tasks involving the Franka Research 3 robot with a pi_0.5 backbone under the DROID setup on grasp-and-place tasks. In RLVR post-training experiments on Qwen3-1.7B/4B datasets using GRPO and GMPO methods, Pion also outperformed AdamW on MATH and GSM8K benchmarks while Muon experienced performance degradation leading to zero results.

Conclusion

The introduction of Pion represents a significant advancement in addressing the limitations of Muon beyond pretraining through its innovative high-pass NS iteration mechanism and support for maintaining pretrained per-head heterogeneity in various challenging learning scenarios. With its impressive performance in both simulation environments and real-world applications, Pion has proven itself as a powerful optimizer that can overcome the shortcomings of Muon in various learning scenarios. Further research into this novel optimizer is sure to yield even more promising results in the future.

Created on 25 May. 2026

Assess the quality of the AI-generated content by voting

Score: 0

Similar papers summarized with our AI tools

54.3%

Nested Learning: The Illusion of Deep Learning Architectures

cs.LG

54.2%

An Adaptive Tangent Feature Perspective of Neural Networks

cs.LG

52.9%

Approaching Deep Learning through the Spectral Dynamics of Weights

cs.LG

52.5%

KAN: Kolmogorov-Arnold Networks

cs.LG

52.1%

A Hierarchical Bayesian Model for Deep Few-Shot Meta Learning

cs.LG

52.0%

Revisiting Group Relative Policy Optimization: Insights into On-Policy and Of…

cs.LG

51.6%

Scaling Exponents Across Parameterizations and Optimizers

cs.LG

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.