In their paper titled "Dynamic Pricing on E-commerce Platform with Deep Reinforcement Learning," authors Jiaxi Liu, Yidong Zhang, Xiaoqing Wang, Yuming Deng, and Xingyu Wu introduce an innovative framework for addressing dynamic pricing challenges in E-commerce using deep reinforcement learning (DRL) techniques. The study focuses on modeling the dynamic pricing problem as a Markov Decision Process (MDP) by utilizing four distinct groups of business data to represent different states during each time period. The researchers propose three key contributions compared to existing DRL-based dynamic pricing algorithms. Firstly, they extend the traditional discrete set problem to a continuous price set, enhancing the flexibility and accuracy of pricing decisions. Secondly, instead of directly using revenue as the reward function, they introduce a novel metric called the difference of revenue conversion rates (DRCR), which proves to be more effective in optimizing pricing strategies. Thirdly, they address the cold-start issue of MDP by pre-training and evaluating models with carefully selected historical sales data. To evaluate their approach, the team conducts offline assessments using real datasets from Alibaba Inc. and online field experiments on Tmall.com, a prominent online shopping platform owned by Alibaba Inc. The results indicate that DRCR outperforms traditional revenue-based metrics commonly used in literature. Furthermore, extensive field experiments conducted over several months on 1000 stock keeping units (SKUs) demonstrate that continuous price sets yield superior performance compared to discrete sets. Ultimately, the researchers show that their framework significantly surpasses manual pricing strategies implemented by operational experts. Overall, this study provides valuable insights into leveraging deep reinforcement learning for dynamic pricing optimization in E-commerce settings and highlights the importance of innovative reward functions and continuous price sets for achieving superior performance in online retail environments.
- - Authors introduce a framework for dynamic pricing in E-commerce using deep reinforcement learning (DRL)
- - Model dynamic pricing as a Markov Decision Process (MDP) with four groups of business data representing different states
- - Three key contributions compared to existing DRL-based dynamic pricing algorithms:
- - Extend discrete set problem to continuous price set for more flexibility and accuracy
- - Introduce novel metric called Difference of Revenue Conversion Rates (DRCR) as reward function
- - Address cold-start issue of MDP through pre-training with historical sales data
- - Offline assessments on real datasets from Alibaba Inc. and online field experiments on Tmall.com show superiority of DRCR over traditional revenue-based metrics
- - Continuous price sets outperform discrete sets in extensive field experiments on 1000 stock keeping units (SKUs)
- - Framework surpasses manual pricing strategies implemented by operational experts, showcasing superior performance in online retail environments
Summary- Authors created a new way to decide prices for online shopping using smart learning.
- They used different groups of business information to make decisions about prices.
- Their method is better than other similar methods in three main ways - It can handle more price options for better results.
- It uses a new way to measure success called DRCR.
- It starts working faster by using old sales data first.
- Tests on real data and online shops proved their method works best.
- Trying many different prices online worked better than just a few.
Definitions- Framework: A basic structure or plan that helps organize things.
- Dynamic pricing: Changing the price of something based on different factors like demand or competition.
- Deep reinforcement learning (DRL): A type of smart computer system that learns from its own experiences to make decisions.
- Markov Decision Process (MDP): A way to model decision-making as a series of steps where each step depends only on the current situation, not past ones.
- Revenue Conversion Rates: The percentage of people who buy something after seeing it.
- Cold-start issue: Starting a process without enough information or experience.
- Stock keeping units (SKUs): Unique codes used to identify products in stores.
Introduction
Dynamic pricing is a common practice in the E-commerce industry, where prices of products are adjusted based on various factors such as demand, competition, and inventory levels. With the rise of online shopping platforms, there has been an increasing need for efficient and effective dynamic pricing strategies to stay competitive in the market. Traditional methods of manual pricing have proven to be inadequate in keeping up with the constantly changing market conditions. This has led researchers to explore new approaches for dynamic pricing optimization.
In their paper titled "Dynamic Pricing on E-commerce Platform with Deep Reinforcement Learning," Jiaxi Liu et al. introduce a novel framework that utilizes deep reinforcement learning techniques to address dynamic pricing challenges in E-commerce settings. The study focuses on modeling the problem as a Markov Decision Process (MDP) and proposes three key contributions compared to existing DRL-based algorithms.
The Framework
The proposed framework by Liu et al. consists of four main components: state representation, action selection, reward function, and pre-training strategy.
State Representation
To model the dynamic pricing problem as an MDP, four distinct groups of business data are used to represent different states during each time period: historical sales data, product information data (e.g., category and brand), competitor price data, and external factors (e.g., holidays or promotions). These states are updated at each time step based on real-time information from the E-commerce platform.
Action Selection
One significant contribution of this study is extending the traditional discrete set problem to a continuous price set. This allows for more flexibility and accuracy in selecting optimal prices for products. The team uses a neural network-based approach called Deep Deterministic Policy Gradient (DDPG) algorithm to learn continuous actions from continuous states.
Reward Function
Instead of directly using revenue as the reward function, the researchers propose a novel metric called the difference of revenue conversion rates (DRCR). This metric takes into account both the revenue generated and the number of conversions, making it more effective in optimizing pricing strategies. The DRCR is calculated by comparing the current conversion rate with the baseline conversion rate, which is determined by historical data.
Pre-training Strategy
One major challenge in using MDP for dynamic pricing is the cold-start issue, where there is limited or no historical data available for new products. To address this issue, Liu et al. propose a pre-training strategy that uses carefully selected historical sales data to train and evaluate models before deploying them on new products.
Evaluation
To evaluate their approach, Liu et al. conduct offline assessments using real datasets from Alibaba Inc., one of China's largest E-commerce companies. They also perform online field experiments on Tmall.com, a prominent online shopping platform owned by Alibaba Inc.
The results from offline assessments show that DRCR outperforms traditional revenue-based metrics commonly used in literature such as average profit per order (APO) and expected profit per order (EPO). Furthermore, extensive field experiments conducted over several months on 1000 stock keeping units (SKUs) demonstrate that continuous price sets yield superior performance compared to discrete sets.
In addition to these evaluations, the team also compares their framework against manual pricing strategies implemented by operational experts. The results show that their approach significantly surpasses manual strategies in terms of both revenue and conversion rates.
Conclusion
The study by Liu et al. provides valuable insights into leveraging deep reinforcement learning for dynamic pricing optimization in E-commerce settings. Their proposed framework addresses key challenges faced in traditional methods and offers significant improvements in performance compared to existing DRL-based algorithms.
One key takeaway from this research paper is the importance of innovative reward functions and continuous price sets for achieving superior performance in online retail environments. The use of DRCR as a reward function and continuous price sets has proven to be more effective in optimizing pricing strategies compared to traditional methods.
Overall, this study highlights the potential of deep reinforcement learning techniques in addressing dynamic pricing challenges in E-commerce and provides a strong foundation for future research in this area.