MoBA: Mixture of Block Attention for Long-Context LLMs

AI-generated keywords: Large Language Models Mixture of Block Attention Long-Context Tasks Efficient Attention Computation Artificial Intelligence

AI-generated Key Points

The license of the paper does not allow us to build upon its content and the key points are generated using the paper metadata rather than the full article.

  • Paper titled "MoBA: Mixture of Block Attention for Long-Context LLMs" introduces a solution to scaling effective context length in large language models (LLMs) without high computational complexity
  • Proposes an approach that leverages the principles of mixture of block attention, allowing models to autonomously determine where to attend without predefined biases
  • Offers a novel architecture that excels in long-context tasks and can seamlessly transition between full and sparse attention, enhancing efficiency without compromising performance
  • Successfully deployed to support Kimi's long-context requests with superior performance compared to existing approaches
  • Code available at https://github.com/MoonshotAI/MoBA for further exploration and implementation
  • Represents a significant advancement in efficient attention computation for LLMs, revolutionizing the field of artificial intelligence by enabling effective handling of complex reasoning tasks with extended context lengths
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Enzhe Lu, Zhejun Jiang, Jingyuan Liu, Yulun Du, Tao Jiang, Chao Hong, Shaowei Liu, Weiran He, Enming Yuan, Yuzhi Wang, Zhiqi Huang, Huan Yuan, Suting Xu, Xinran Xu, Guokun Lai, Yanru Chen, Huabin Zheng, Junjie Yan, Jianlin Su, Yuxin Wu, Neo Y. Zhang, Zhilin Yang, Xinyu Zhou, Mingxing Zhang, Jiezhong Qiu

15 pages
License: CC BY-NC-ND 4.0

Abstract: Scaling the effective context length is essential for advancing large language models (LLMs) toward artificial general intelligence (AGI). However, the quadratic increase in computational complexity inherent in traditional attention mechanisms presents a prohibitive overhead. Existing approaches either impose strongly biased structures, such as sink or window attention which are task-specific, or radically modify the attention mechanism into linear approximations, whose performance in complex reasoning tasks remains inadequately explored. In this work, we propose a solution that adheres to the ``less structure'' principle, allowing the model to determine where to attend autonomously, rather than introducing predefined biases. We introduce Mixture of Block Attention (MoBA), an innovative approach that applies the principles of Mixture of Experts (MoE) to the attention mechanism. This novel architecture demonstrates superior performance on long-context tasks while offering a key advantage: the ability to seamlessly transition between full and sparse attention, enhancing efficiency without the risk of compromising performance. MoBA has already been deployed to support Kimi's long-context requests and demonstrates significant advancements in efficient attention computation for LLMs. Our code is available at https://github.com/MoonshotAI/MoBA.

Submitted to arXiv on 18 Feb. 2025

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

The license of the paper does not allow us to build upon its content and the AI assistant only knows about the paper metadata rather than the full article.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2502.13189v1

This paper's license doesn't allow us to build upon its content and the summarizing process is here made with the paper's metadata rather than the article.

In their paper titled "MoBA: Mixture of Block Attention for Long-Context LLMs," Enzhe Lu, Zhejun Jiang, Jingyuan Liu, Yulun Du, Tao Jiang, Chao Hong, Shaowei Liu, Weiran He, Enming Yuan, Yuzhi Wang, Zhiqi Huang, Huan Yuan, Suting Xu, Xinran Xu, Guokun Lai,Yanru Chen,Huabin Zheng,Junjie Yan,Jianlin Su,Yuxin Wu Neo Y. Zhang,Zhilin Yang,Xinyu Zhou,Mingxing Zhang,Jiezhong Qiu introduce a groundbreaking solution to the challenge of scaling effective context length in large language models (LLMs) without incurring prohibitive computational complexity. They address the limitations of traditional attention mechanisms by proposing the approach which leverages the principles of . The authors emphasize the importance of allowing models to autonomously determine where to attend rather than imposing predefined biases. offers a novel architecture that excels in long-context tasks while providing the flexibility to seamlessly transition between full and sparse attention. This capability enhances efficiency without compromising performance and represents a significant advancement in efficient attention computation for LLMs. Notably,has already been successfully deployed to support Kimi's long-context requests and has demonstrated superior performance compared to existing approaches. The authors make their code available at https://github.com/MoonshotAI/MoBA for further exploration and implementation. Overall,this innovative work showcases how can revolutionize the field of artificial intelligence by enabling large language models to effectively handle complex reasoning tasks with extended context lengths.
Created on 08 Apr. 2025

Assess the quality of the AI-generated content by voting

Score: 0

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

Similar papers summarized with our AI tools

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.