A New DAPO Algorithm for Stock Trading

AI-generated keywords: Reinforcement learning Financial trading DAPO algorithm Large Language Models (LLMs) Automated trading strategies

AI-generated Key Points

Ruijian Zha and Bojun Liu from Columbia University explore reinforcement learning techniques in financial trading
Focus on Dynamic Sampling Policy Optimization (DAPO) algorithm combined with Large Language Models (LLMs)
Introduce a novel trading agent integrating Group Relative Policy Optimization (GRPO) with insights from DAPO
Agent incorporates LLM-based signals related to risk and sentiment from financial news sources
Performance evaluated on NASDAQ-100 index using FNSPID dataset
Improved DAPO algorithm achieves cumulative return of 230.49% and Information Ratio of 0.37, outperforming CPPO-DeepSeek baseline
Training time reduced from 8 hours to 2.5 hours over 100 epochs, along with decreased RAM usage
Research demonstrates potential for data-efficient trading agents by combining reinforcement learning algorithms like DAPO with LLM-based signals from financial news sources

Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Ruijian Zha, Bojun Liu

arXiv: 2505.06408v1 - DOI (cs.CE)

Accepted to IEEE IDS 2025 Special Track: Financial Reinforcement Learning and Foundation Models (FinRLFM). 3 pages, 2 figures, 3 tables

License: CC BY 4.0

Abstract: Recent advances in reinforcement learning, such as Dynamic Sampling Policy Optimization (DAPO), show strong performance when paired with large language models (LLMs). Motivated by this success, we ask whether similar gains can be realized in financial trading. We design a trading agent that combines an improved Group Relative Policy Optimization (GRPO) algorithm, augmented with ideas from DAPO, with LLM-based risk and sentiment signals extracted from financial news. On the NASDAQ-100 index (FNSPID dataset), our agent attains a cumulative return of 230.49 percent and an information ratio of 0.37, outperforming the CPPO-DeepSeek baseline. It also cuts training time from about 8 hours to 2.5 hours over 100 epochs while markedly reducing RAM usage. The proposed RL-LLM framework offers a scalable path toward data-efficient trading agents. Code: https://github.com/Ruijian-Zha/FinRL-DAPO-SR/

Submitted to arXiv on 09 May. 2025

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2505.06408v1

Comprehensive Summary
Key points
Layman's Summary
Blog article

In their paper titled "A New DAPO Algorithm for Stock Trading," Ruijian Zha and Bojun Liu from the Department of Computer Science at Columbia University explore the application of reinforcement learning techniques in financial trading. They focus on the Dynamic Sampling Policy Optimization (DAPO) algorithm, which has shown promising results when combined with Large Language Models (LLMs). The study introduces a novel trading agent that integrates an enhanced Group Relative Policy Optimization (GRPO) algorithm with insights from DAPO. Additionally, the agent incorporates LLM-based signals related to risk and sentiment extracted from financial news sources. The performance of this approach is evaluated on the NASDAQ-100 index using the FNSPID dataset. Results indicate that the improved DAPO algorithm achieves a cumulative return of 230.49% and an Information Ratio of 0.37, surpassing the CPPO-DeepSeek baseline. Furthermore, the proposed method significantly reduces training time from approximately 8 hours to 2.5 hours over 100 epochs while also decreasing RAM usage. Overall, this research showcases a scalable path towards developing data-efficient trading agents by leveraging reinforcement learning algorithms like DAPO in conjunction with LLM-based signals derived from financial news sources. The findings highlight the potential for advancements in automated trading strategies through innovative combinations of machine learning techniques and real-world financial data analysis.

- Ruijian Zha and Bojun Liu from Columbia University explore reinforcement learning techniques in financial trading
- Focus on Dynamic Sampling Policy Optimization (DAPO) algorithm combined with Large Language Models (LLMs)
- Introduce a novel trading agent integrating Group Relative Policy Optimization (GRPO) with insights from DAPO
- Agent incorporates LLM-based signals related to risk and sentiment from financial news sources
- Performance evaluated on NASDAQ-100 index using FNSPID dataset
- Improved DAPO algorithm achieves cumulative return of 230.49% and Information Ratio of 0.37, outperforming CPPO-DeepSeek baseline
- Training time reduced from 8 hours to 2.5 hours over 100 epochs, along with decreased RAM usage
- Research demonstrates potential for data-efficient trading agents by combining reinforcement learning algorithms like DAPO with LLM-based signals from financial news sources

Summary- Two researchers from Columbia University studied how computers can learn to make better decisions in finance. - They used a special algorithm called DAPO along with language models to help the computer learn. - They created a new trading program that combines different techniques to make smart choices. - The program uses signals from news about risks and feelings in the financial world. - They tested the program on a stock market index and found it did really well. Definitions- Reinforcement learning: A type of machine learning where a computer learns by trying different actions and receiving rewards or punishments based on its decisions. - Algorithm: A set of instructions or rules followed by a computer to solve a problem or perform a task. - Policy Optimization: Refers to improving decision-making strategies in reinforcement learning algorithms. - Language Models: Programs that understand and generate human language text, often used for tasks like translation or summarization.

Introduction

The world of finance has always been a hotbed for innovation and technological advancements. In recent years, there has been a growing interest in the application of machine learning techniques in financial trading. This trend is driven by the potential to develop more efficient and profitable trading strategies through the use of data-driven algorithms. In their paper titled "A New DAPO Algorithm for Stock Trading," Ruijian Zha and Bojun Liu from Columbia University's Department of Computer Science delve into this topic by exploring the use of reinforcement learning techniques in stock trading. The study focuses on the Dynamic Sampling Policy Optimization (DAPO) algorithm, which has shown promising results when combined with Large Language Models (LLMs).

The DAPO Algorithm

Reinforcement learning is a branch of machine learning that involves training an agent to make decisions based on its interactions with an environment. In financial trading, this means developing an agent that can learn from historical market data and make profitable trades. The DAPO algorithm is a type of reinforcement learning approach that utilizes dynamic sampling to improve sample efficiency. This means that instead of using all available data points for training, it selectively samples data points based on their importance in improving performance. This technique reduces training time and improves overall performance.

Enhancing GRPO with Insights from DAPO

To further improve upon the DAPO algorithm, Zha and Liu propose integrating it with insights from Group Relative Policy Optimization (GRPO). GRPO is another reinforcement learning method that uses group-based policy optimization to achieve better performance compared to traditional methods. By combining these two approaches, the researchers were able to create a novel trading agent that outperforms both individual algorithms. The enhanced GRPO algorithm showed significant improvements over baseline methods such as CPPO-DeepSeek.

LLM-Based Signals

In addition to incorporating insights from GRPO, the proposed trading agent also utilizes LLM-based signals derived from financial news sources. These signals provide valuable information related to risk and sentiment in the market. LLMs are large neural networks that have been trained on vast amounts of text data, allowing them to understand language at a deeper level. By analyzing financial news articles, LLMs can extract important signals that can be used to inform trading decisions.

Evaluation and Results

To evaluate the performance of their approach, Zha and Liu tested it on the NASDAQ-100 index using the FNSPID dataset. The results were impressive, with the improved DAPO algorithm achieving a cumulative return of 230.49% and an Information Ratio of 0.37 over a period of 100 epochs. Compared to baseline methods, this represents a significant improvement in both profitability and risk management. Furthermore, the proposed method reduced training time from approximately 8 hours to just 2.5 hours while also decreasing RAM usage.

Conclusion

The research conducted by Zha and Liu highlights the potential for advancements in automated trading strategies through innovative combinations of machine learning techniques and real-world financial data analysis. By leveraging reinforcement learning algorithms like DAPO in conjunction with LLM-based signals derived from financial news sources, they were able to develop a highly efficient trading agent that outperforms traditional methods. This study showcases a scalable path towards developing data-efficient trading agents that can adapt to changing market conditions quickly. As technology continues to advance, we can expect further developments in this field as researchers continue to explore new ways of utilizing machine learning techniques in finance.

Created on 16 May. 2025

Assess the quality of the AI-generated content by voting

Score: 0

Similar papers summarized with our AI tools

33.6%

AI-based Personalization and Trust in Digital Finance

cs.CE

31.8%

MASTER: Market-Guided Stock Transformer for Stock Price Forecasting

cs.CE

31.7%

AI-powered Fraud Detection in Decentralized Finance: A Project Life Cycle Per…

cs.CE

30.2%

Reinforcement Learning Based Gasoline Blending Optimization: Achieving More E…

cs.CE

29.6%

Understanding stock market instability via graph auto-encoders

cs.CE

29.5%

Advanced LSTM Neural Networks for Predicting Directional Changes in Sector-Sp…

cs.CE

29.0%

Optimization decision model of vegetable stock and pricing based on TCN-Atten…

cs.CE

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.