HELM: Hyperbolic Large Language Models via Mixture-of-Curvature Experts

AI-generated keywords: HELM

AI-generated Key Points

  • HELM is a family of Hyperbolic Large Language Models (LLMs) that operate fully in hyperbolic space
  • HELM models leverage the properties of hyperbolic space to capture semantic hierarchies and geometric structures in natural language
  • HELM-MICE is a Mixture-of-Curvature Experts model where each expert operates in a distinct curvature space to encode fine-grained geometric structures from text
  • HELM-D is a dense model that enhances the representational flexibility and scalability of existing hyperbolic LMs
  • The researchers developed hyperbolic Multi-Head Latent Attention (HMLA) for HELM-MICE to improve training efficiency and inference accuracy
  • Fully hyperbolic LLMs trained at billion-parameter scale showed consistent performance gains of up to 4% over popular Euclidean architectures like LLaMA and DeepSeek on benchmark datasets such as MMLU and ARC
  • HELM models offer enhanced reasoning capabilities by embracing non-Euclidean geometries like hyperbolic space for large-scale LM pretraining
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Neil He, Rishabh Anand, Hiren Madhu, Ali Maatouk, Smita Krishnaswamy, Leandros Tassiulas, Menglin Yang, Rex Ying

License: CC BY 4.0

Abstract: Large language models (LLMs) have shown great success in text modeling tasks across domains. However, natural language exhibits inherent semantic hierarchies and nuanced geometric structure, which current LLMs do not capture completely owing to their reliance on Euclidean operations. Recent studies have also shown that not respecting the geometry of token embeddings leads to training instabilities and degradation of generative capabilities. These findings suggest that shifting to non-Euclidean geometries can better align language models with the underlying geometry of text. We thus propose to operate fully in Hyperbolic space, known for its expansive, scale-free, and low-distortion properties. We thus introduce HELM, a family of HypErbolic Large Language Models, offering a geometric rethinking of the Transformer-based LLM that addresses the representational inflexibility, missing set of necessary operations, and poor scalability of existing hyperbolic LMs. We additionally introduce a Mixture-of-Curvature Experts model, HELM-MICE, where each expert operates in a distinct curvature space to encode more fine-grained geometric structure from text, as well as a dense model, HELM-D. For HELM-MICE, we further develop hyperbolic Multi-Head Latent Attention (HMLA) for efficient, reduced-KV-cache training and inference. For both models, we develop essential hyperbolic equivalents of rotary positional encodings and RMS normalization. We are the first to train fully hyperbolic LLMs at billion-parameter scale, and evaluate them on well-known benchmarks such as MMLU and ARC, spanning STEM problem-solving, general knowledge, and commonsense reasoning. Our results show consistent gains from our HELM architectures -- up to 4% -- over popular Euclidean architectures used in LLaMA and DeepSeek, highlighting the efficacy and enhanced reasoning afforded by hyperbolic geometry in large-scale LM pretraining.

Submitted to arXiv on 30 May. 2025

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2505.24722v1

Researchers have introduced HELM, a family of Hyperbolic Large Language Models (LLMs) that operate fully in hyperbolic space. This approach aims to address the limitations of current LLMs and their reliance on Euclidean operations, which fail to capture the inherent semantic hierarchies and geometric structure of natural language. By leveraging the expansive, scale-free, and low-distortion properties of hyperbolic space, HELM models offer a novel geometric rethinking of Transformer-based LLMs. The researchers also propose HELM-MICE, a Mixture-of-Curvature Experts model where each expert operates in a distinct curvature space to encode fine-grained geometric structures from text. Additionally, they introduce HELM-D, a dense model that enhances the representational flexibility and scalability of existing hyperbolic LMs. To improve training efficiency and inference accuracy, they develop hyperbolic Multi-Head Latent Attention (HMLA) for HELM-MICE. Furthermore, the researchers pioneer the training of fully hyperbolic LLMs at billion-parameter scale and evaluate them on benchmark datasets such as MMLU and ARC. Their results demonstrate consistent performance gains of up to 4% over popular Euclidean architectures like LLaMA and DeepSeek. This highlights the efficacy and enhanced reasoning capabilities afforded by hyperbolic geometry in large-scale LM pretraining. In conclusion, this study showcases the potential of HELM models in capturing complex semantic relationships and geometric structures within text data. By embracing non-Euclidean geometries like hyperbolic space, these models offer a promising avenue for advancing natural language processing tasks across various domains.
Created on 04 Mar. 2026

Assess the quality of the AI-generated content by voting

Score: 0

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

Similar papers summarized with our AI tools

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.