PRAGMA: Revolut Foundation Model

AI-generated keywords: PRAGMA financial systems banking event sequences Transformer-based architecture multi-source events

AI-generated Key Points

Vast amounts of transactional and event-level data are generated in modern financial systems, encoding valuable economic signals.
PRAGMA is a series of foundational models designed for analyzing multi-source banking event sequences.
The methodology involves pre-training a Transformer-based architecture using masked modeling on a diverse banking event dataset.
The PRAGMA model is tailored to the discrete and variable-length nature of financial records, supporting tasks such as credit scoring, fraud detection, and lifetime value prediction.
PRAGMA achieves strong performance across multiple domains directly from raw event sequences by training a simple linear model on top of extracted embeddings and further enhancing it through lightweight fine-tuning.
Banking event sequences present unique challenges due to their variable-length records with mixed categorical, numerical, and free-text fields, as well as long-tailed patterns in length and irregular time intervals.
PRAGMA fills the gap by offering an encoder-style foundation model that combines multi-source events with static profile state through masked modeling on a large-scale user history corpus.
The architecture of PRAGMA includes two encoder branches for profile state and events fused by a history encoder, allowing tokens to attend to both past and future context during reconstruction tasks or learning record-level representations from complete histories.
After pre-training, PRAGMA can be adapted efficiently through embedding probe setting or LoRA fine-tuning methods for fast specialization while maintaining shared backbone parameters across tasks.

Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Maxim Ostroukhov, Ruslan Mikhailov, Vladimir Iashin, Artem Sokolov, Andrei Akshonov, Vitaly Protasov, Dmitrii Beloborodov, Vince Mullin, Roman Yokunda Enzmann, Georgios Kolovos, Jason Renders, Pavel Nesterov, Anton Repushko

arXiv: 2604.08649v1 - DOI (cs.LG)

License: CC BY 4.0

Abstract: Modern financial systems generate vast quantities of transactional and event-level data that encode rich economic signals. This paper presents PRAGMA, a family of foundation models for multi-source banking event sequences. Our approach pre-trains a Transformer-based architecture with masked modelling on a large-scale, heterogeneous banking event corpus using a self-supervised objective tailored to the discrete, variable-length nature of financial records. The resulting model supports a wide range of downstream tasks such as credit scoring, fraud detection, and lifetime value prediction: strong performance can be achieved by training a simple linear model on top of the extracted embeddings and can be further improved with lightweight fine-tuning. Through extensive evaluation on downstream tasks, we demonstrate that PRAGMA achieves superior performance across multiple domains directly from raw event sequences, providing a general-purpose representation layer for financial applications.

Submitted to arXiv on 09 Apr. 2026

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2604.08649v1

Comprehensive Summary
Key points
Layman's Summary
Blog article

In the realm of modern financial systems, vast amounts of transactional and event-level data are generated, encoding valuable economic signals. This paper introduces PRAGMA, a series of foundational models designed for analyzing multi-source banking event sequences. The methodology involves pre-training a Transformer-based architecture using masked modeling on a diverse banking event dataset. This approach is tailored to the discrete and variable-length nature of financial records, enabling the model to support various downstream tasks such as credit scoring, fraud detection, and lifetime value prediction. The PRAGMA model proves its versatility by achieving strong performance across multiple domains directly from raw event sequences. By training a simple linear model on top of the extracted embeddings and further enhancing it through lightweight fine-tuning, superior results are obtained in tasks related to risk management, product analytics, and operational efficiency within the financial sector. present unique challenges compared to traditional text data due to their variable-length records with mixed categorical, numerical, and free-text fields. These histories also exhibit long-tailed patterns in length and irregular time intervals with daily and weekly cycles. Moreover, limit what can be reported and utilized for decision-making in practical deployments. Existing solutions have addressed specific aspects of this complex problem but fall short in providing a comprehensive solution for PRAGMA fills this gap by offering an encoder-style foundation model that combines multi-source events with static profile state through masked modeling on a large-scale user history corpus. The architecture of PRAGMA includes two encoder branches for profile state and events that are fused by a history encoder. This bidirectional design allows tokens to attend to both past and future context during reconstruction tasks or learning record-level representations from complete histories. After pre-training, can be adapted efficiently through embedding probe setting or LoRA fine-tuning methods. The former involves training a lightweight head on top of extracted embeddings while the latter utilizes Low-Rank Adaptation (LoRA) to update only a small fraction of parameters for fast specialization while maintaining shared backbone parameters across tasks. Overall, stands out as a versatile and effective solution for handling complex banking event sequences by providing transferable representations for discriminative financial tasks across various domains within the industry.

- Vast amounts of transactional and event-level data are generated in modern financial systems, encoding valuable economic signals.
- PRAGMA is a series of foundational models designed for analyzing multi-source banking event sequences.
- The methodology involves pre-training a Transformer-based architecture using masked modeling on a diverse banking event dataset.
- The PRAGMA model is tailored to the discrete and variable-length nature of financial records, supporting tasks such as credit scoring, fraud detection, and lifetime value prediction.
- PRAGMA achieves strong performance across multiple domains directly from raw event sequences by training a simple linear model on top of extracted embeddings and further enhancing it through lightweight fine-tuning.
- Banking event sequences present unique challenges due to their variable-length records with mixed categorical, numerical, and free-text fields, as well as long-tailed patterns in length and irregular time intervals.
- PRAGMA fills the gap by offering an encoder-style foundation model that combines multi-source events with static profile state through masked modeling on a large-scale user history corpus.
- The architecture of PRAGMA includes two encoder branches for profile state and events fused by a history encoder, allowing tokens to attend to both past and future context during reconstruction tasks or learning record-level representations from complete histories.
- After pre-training, PRAGMA can be adapted efficiently through embedding probe setting or LoRA fine-tuning methods for fast specialization while maintaining shared backbone parameters across tasks.

Summary- Lots of important information is created in financial systems. - PRAGMA is a special model for studying banking events from different sources. - The method involves training a special computer program using hidden information on various banking events. - PRAGMA helps with things like deciding if someone can borrow money, finding fraud, and predicting how valuable customers are. - PRAGMA works well by training a simple model on top of key information and making it better with some adjustments. Definitions- Vast: Very big or huge - Transactional: Involving buying, selling, or exchanging goods or services - Event-level data: Information about specific happenings or occurrences - Encoding: Turning information into a coded form - Valuable: Something that is very useful or important

In the ever-evolving world of finance, vast amounts of transactional and event-level data are generated on a daily basis. These data hold valuable economic signals that can provide insights into customer behavior, risk management, and operational efficiency within the financial sector. However, analyzing this complex and diverse dataset poses unique challenges compared to traditional text data. To address these challenges, a team of researchers has developed PRAGMA - a series of foundational models designed for analyzing multi-source banking event sequences. This paper introduces the methodology behind PRAGMA and highlights its versatility in achieving strong performance across multiple domains directly from raw event sequences. The Need for PRAGMA Financial records present unique challenges due to their variable-length nature with mixed categorical, numerical, and free-text fields. These histories also exhibit long-tailed patterns in length and irregular time intervals with daily and weekly cycles. This complexity limits what can be reported and utilized for decision-making in practical deployments. Existing solutions have addressed specific aspects of this problem but fall short in providing a comprehensive solution for handling complex banking event sequences. This is where PRAGMA comes in - offering an encoder-style foundation model that combines multi-source events with static profile state through masked modeling on a large-scale user history corpus. The Architecture of PRAGMA The architecture of PRAGMA includes two encoder branches for profile state and events that are fused by a history encoder. This bidirectional design allows tokens to attend to both past and future context during reconstruction tasks or learning record-level representations from complete histories. Pre-Training Process The pre-training process involves using Transformer-based architecture with masked modeling on a diverse banking event dataset. The use of masking allows the model to handle discrete and variable-length financial records efficiently while supporting various downstream tasks such as credit scoring, fraud detection, and lifetime value prediction. Adaptation Techniques After pre-training, the model can be adapted efficiently through two techniques - embedding probe setting or LoRA fine-tuning methods. The embedding probe setting involves training a lightweight head on top of the extracted embeddings, while LoRA utilizes Low-Rank Adaptation to update only a small fraction of parameters for fast specialization while maintaining shared backbone parameters across tasks. Benefits of PRAGMA The PRAGMA model proves its versatility by achieving strong performance across multiple domains directly from raw event sequences. By training a simple linear model on top of the extracted embeddings and further enhancing it through lightweight fine-tuning, superior results are obtained in tasks related to risk management, product analytics, and operational efficiency within the financial sector. Conclusion In conclusion, PRAGMA stands out as a versatile and effective solution for handling complex banking event sequences. Its ability to provide transferable representations for discriminative financial tasks across various domains within the industry makes it an invaluable tool for financial institutions looking to gain insights from their vast amounts of data. With its pre-training process and adaptation techniques, PRAGMA offers a comprehensive solution that addresses the challenges posed by analyzing multi-source banking event sequences.

Created on 30 Jun. 2026

Assess the quality of the AI-generated content by voting

Score: 0

Similar papers summarized with our AI tools

55.7%

Time-LLM: Time Series Forecasting by Reprogramming Large Language Models

cs.LG

55.3%

Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient L…

cs.LG

54.6%

Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context

cs.LG

53.5%

Language Models Represent Space and Time

cs.LG

53.0%

Titans: Learning to Memorize at Test Time

cs.LG

52.9%

Tranception: protein fitness prediction with autoregressive transformers and …

cs.LG

52.7%

Pretrained Transformers as Universal Computation Engines

cs.LG

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.