Combining Financial Data and News Articles for Stock Price Movement Prediction Using Large Language Models

AI-generated keywords: Financial Markets

AI-generated Key Points

  • Factors to consider for predicting financial markets and stock price movements:
  • Company's performance
  • Historic price movements
  • Industry-specific events
  • Influence of human factors like social media and press coverage
  • Methodology used for analysis:
  • Combining structured financial data with unstructured textual news articles using Large Language Models (LLMs)
  • Layered summarization approach for handling vast amount of information in news articles
  • Utilizing advanced language models such as GPT-3 and GPT-4 for classification tasks
  • Key financial variables extracted from income statements, balance sheets, cash flow statements, and historical pricing data:
  • Total revenue
  • Net income
  • Free cash flow
  • Total assets
  • Price momentum
  • Forward return
  • Achievements in predicting stock price movements:
  • Promising results through retrieval augmentation techniques and LLMs in zero-shot, two-shot, and four-shot settings
  • Overall methodology aims to enhance accuracy by combining advanced language models with structured financial data and unstructured textual news articles.
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Ali Elahi, Fatemeh Taghvaei

9 pages, 5 figures
License: CC BY 4.0

Abstract: Predicting financial markets and stock price movements requires analyzing a company's performance, historic price movements, industry-specific events alongside the influence of human factors such as social media and press coverage. We assume that financial reports (such as income statements, balance sheets, and cash flow statements), historical price data, and recent news articles can collectively represent aforementioned factors. We combine financial data in tabular format with textual news articles and employ pre-trained Large Language Models (LLMs) to predict market movements. Recent research in LLMs has demonstrated that they are able to perform both tabular and text classification tasks, making them our primary model to classify the multi-modal data. We utilize retrieval augmentation techniques to retrieve and attach relevant chunks of news articles to financial metrics related to a company and prompt the LLMs in zero, two, and four-shot settings. Our dataset contains news articles collected from different sources, historic stock price, and financial report data for 20 companies with the highest trading volume across different industries in the stock market. We utilized recently released language models for our LLM-based classifier, including GPT- 3 and 4, and LLaMA- 2 and 3 models. We introduce an LLM-based classifier capable of performing classification tasks using combination of tabular (structured) and textual (unstructured) data. By using this model, we predicted the movement of a given stock's price in our dataset with a weighted F1-score of 58.5% and 59.1% and Matthews Correlation Coefficient of 0.175 for both 3-month and 6-month periods.

Submitted to arXiv on 02 Nov. 2024

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2411.01368v1

, , , , In order to accurately predict financial markets and stock price movements, it is essential to analyze various factors such as a company's performance, historic price movements, industry-specific events, and the influence of human factors like social media and press coverage. This involves examining financial reports (such as income statements, balance sheets, and cash flow statements), historical price data, and recent news articles. Researchers have found success in this analysis by combining structured financial data with unstructured textual news articles using pre-trained Large Language Models (LLMs) for prediction tasks. To handle the vast amount of information available in news articles about each company on a given date, a layered summarization approach has been implemented. Initially, news articles are filtered based on key metadata such as title, subtitle, publication date, and keywords related to the company. The selected articles are then summarized to extract the most relevant information for further analysis. The dataset used for this research includes news articles from various sources, historic stock price data, and financial report data for 20 companies with high trading volume across different industries in the stock market. Advanced language models such as GPT-3 and GPT-4 have been utilized for classification tasks using both tabular and textual data. By employing retrieval augmentation techniques to attach relevant chunks of news articles to financial metrics related to a company and prompting LLMs in zero-shot, two-shot, and four-shot settings, researchers have achieved promising results in predicting stock price movements. Furthermore,<br/> detailed descriptions of financial variables extracted from income statements, balance sheets,<br/> cash flow statements along with historical pricing data have been provided in the study. These variables include total revenue generated by a company before expenses deduction,<br/> net income after deducting all expenses, free cash flow representing cash generated by operations,<br/> total assets owned or controlled by a company,<br/> price momentum measuring relative strength and direction of stock's price movement over past months,<br/> and forward return expected return on an investment over a future period. Overall, the refined methodology outlined in this study combines advanced language models with structured financial data and unstructured textual news articles to enhance the accuracy of predicting market movements. Through meticulous filtering and summarizing processes coupled with innovative model implementations like LLM-based classifiers capable of handling multi-modal data inputs effectively, researchers aim to provide valuable insights into stock price predictions for investors in the financial market.
Created on 12 Nov. 2024

Assess the quality of the AI-generated content by voting

Score: 0

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

Similar papers summarized with our AI tools

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.