Learning Robust Policies via Interpretable Hamilton-Jacobi Reachability-Guided Disturbances

AI-generated keywords: Robotics Deep Reinforcement Learning Adversarial Attacks Hamilton-Jacobi Reachability Resilience

AI-generated Key Points

The license of the paper does not allow us to build upon its content and the key points are generated using the paper metadata rather than the full article.

  • Deep Reinforcement Learning (RL) successful in navigating complex dynamics
  • Challenge: susceptibility to unknown disturbances and adversarial attacks
  • Innovative framework integrates model-based control with adversarial RL training
  • Utilization of Hamilton-Jacobi reachability-guided disturbances for training
  • Aim: fortify system against unforeseen challenges, enhance robustness
  • Evaluation across three tasks: reach-avoid game in simulation and real-world, quadrotor stabilization task in virtual setting
  • Learned critic network aligns closely with ground-truth HJ value function
  • Policy network performance comparable to other learning-based methods
  • Promising avenue for enhancing resilience of deep RL systems in robotics
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Hanyang Hu, Xilun Zhang, Xubo Lyu, Mo Chen

Abstract: Deep Reinforcement Learning (RL) has shown remarkable success in robotics with complex and heterogeneous dynamics. However, its vulnerability to unknown disturbances and adversarial attacks remains a significant challenge. In this paper, we propose a robust policy training framework that integrates model-based control principles with adversarial RL training to improve robustness without the need for external black-box adversaries. Our approach introduces a novel Hamilton-Jacobi reachability-guided disturbance for adversarial RL training, where we use interpretable worst-case or near-worst-case disturbances as adversaries against the robust policy. We evaluated its effectiveness across three distinct tasks: a reach-avoid game in both simulation and real-world settings, and a highly dynamic quadrotor stabilization task in simulation. We validate that our learned critic network is consistent with the ground-truth HJ value function, while the policy network shows comparable performance with other learning-based methods.

Submitted to arXiv on 29 Sep. 2024

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

The license of the paper does not allow us to build upon its content and the AI assistant only knows about the paper metadata rather than the full article.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2409.19746v1

This paper's license doesn't allow us to build upon its content and the summarizing process is here made with the paper's metadata rather than the article.

In the field of robotics, Deep Reinforcement Learning (RL) has proven to be highly successful in navigating complex and diverse dynamics. However, a persistent challenge lies in its susceptibility to unknown disturbances and adversarial attacks. To address this issue, an innovative framework has been developed that integrates model-based control principles with adversarial RL training. This approach aims to enhance robustness without relying on external black-box adversaries. The key concept introduced is the utilization of Hamilton-Jacobi reachability-guided disturbances for adversarial RL training. By employing interpretable worst-case or near-worst-case disturbances as adversaries against the robust policy, the researchers aim to fortify the system against unforeseen challenges. This advancement holds significant potential for bolstering the adaptability and reliability of robotic systems operating in dynamic and uncertain environments. The effectiveness of this approach was evaluated across three distinct tasks: a reach-avoid game conducted in both simulation and real-world environments, as well as a dynamic quadrotor stabilization task simulated in a virtual setting. Through rigorous testing and analysis, it was confirmed that the learned critic network aligns closely with the ground-truth HJ value function. Furthermore, the policy network demonstrated performance levels comparable to other learning-based methods. Overall, this research showcases a promising avenue for enhancing the resilience of deep RL systems in robotics by leveraging interpretable disturbances guided by Hamilton-Jacobi reachability principles.
Created on 07 May. 2025

Assess the quality of the AI-generated content by voting

Score: 0

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

Similar papers summarized with our AI tools

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.