DriveGPT4: Interpretable End-to-end Autonomous Driving via Large Language Model

AI-generated keywords: Autonomous driving Interpretability Large language models DriveGPT4 End-to-end system

AI-generated Key Points

The license of the paper does not allow us to build upon its content and the key points are generated using the paper metadata rather than the full article.

  • Significant advancements in autonomous driving in the past decade in both academic research and industry applications
  • Lack of interpretability in decision-making processes is a major challenge hindering widespread adoption and further development
  • DriveGPT4 is an interpretable end-to-end autonomous driving system leveraging multimodal large language models (LLMs)
  • DriveGPT4 can interpret vehicle actions, provide transparent reasoning, engage in dialogue with users, and predict low-level control signals
  • Utilizes a custom visual instruction tuning dataset tailored for autonomous driving tasks to learn from visual cues and textual instructions simultaneously
  • Demonstrates superior performance compared to traditional methods and video understanding LLMs across multiple evaluation tasks
  • Capable of zero-shot learning, seamlessly generalizing to new scenarios without additional training data
  • Sets a new standard for future developments in enhancing interpretability in autonomous driving systems
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Zhenhua Xu, Yujia Zhang, Enze Xie, Zhen Zhao, Yong Guo, Kenneth K. Y. Wong, Zhenguo Li, Hengshuang Zhao

The project page is available at https://tonyxuqaq.github.io/projects/DriveGPT4/

Abstract: In the past decade, autonomous driving has experienced rapid development in both academia and industry. However, its limited interpretability remains a significant unsolved problem, severely hindering autonomous vehicle commercialization and further development. Previous approaches utilizing small language models have failed to address this issue due to their lack of flexibility, generalization ability, and robustness. Recently, multimodal large language models (LLMs) have gained considerable attention from the research community for their capability to process and reason non-text data (e.g., images and videos) by text. In this paper, we present DriveGPT4, an interpretable end-to-end autonomous driving system utilizing LLMs. DriveGPT4 is capable of interpreting vehicle actions and providing corresponding reasoning, as well as answering diverse questions posed by human users for enhanced interaction. Additionally, DriveGPT4 predicts vehicle low-level control signals in an end-to-end fashion. These capabilities stem from a customized visual instruction tuning dataset specifically designed for autonomous driving. To the best of our knowledge, DriveGPT4 is the first work focusing on interpretable end-to-end autonomous driving. When evaluated on multiple tasks alongside conventional methods and video understanding LLMs, DriveGPT4 demonstrates superior qualitative and quantitative performance. Additionally, DriveGPT4 can be generalized in a zero-shot fashion to accommodate more unseen scenarios. The project page is available at https://tonyxuqaq.github.io/projects/DriveGPT4/ .

Submitted to arXiv on 02 Oct. 2023

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

The license of the paper does not allow us to build upon its content and the AI assistant only knows about the paper metadata rather than the full article.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2310.01412v1

This paper's license doesn't allow us to build upon its content and the summarizing process is here made with the paper's metadata rather than the article.

In the past decade, significant advancements have been made in both academic research and industry applications for autonomous driving. However, a major challenge that continues to hinder its widespread adoption and further development is the lack of interpretability in decision-making processes. Previous attempts using small language models have not been successful due to limitations in flexibility, generalization capabilities, and robustness. Recently, there has been a growing interest in multimodal large language models (LLMs) within the research community. Building on this trend, a team of researchers led by Zhenhua Xu, Yujia Zhang, Enze Xie, Zhen Zhao, Yong Guo, Kenneth K. Y. Wong, Zhenguo Li and Hengshuang Zhao introduced DriveGPT4 - an interpretable end-to-end autonomous driving system that leverages LLMs. <br> DriveGPT4 stands out for its ability to interpret vehicle actions and provide transparent reasoning behind its decisions. Additionally,<br> it can engage in interactive dialogue with human users by answering diverse questions related to its driving behavior.<br> The system also excels at predicting low-level control signals for vehicles in an end-to-end manner.<br> The key strength of DriveGPT4 lies in its utilization of a custom visual instruction tuning dataset tailored specifically for autonomous driving tasks.<br> This dataset enables the model to effectively learn from visual cues and textual instructions simultaneously.<br> Notably,<br> DriveGPT4 represents a pioneering effort in developing an interpretable end-to-end autonomous driving system.<br> When compared against traditional methods as well as video understanding LLMs across multiple evaluation tasks,<br> DriveGPT4 demonstrates superior performance both qualitatively and quantitatively.<br> One particularly noteworthy feature of DriveGPT4 is its ability to seamlessly generalize to new scenarios without requiring additional training data - a capability known as zero-shot learning.<br> For more information on DriveGPT4 and access to the project page, visit https://tonyxuqaq.github.io/projects/DriveGPT4/.<br> With its innovative approach towards enhancing interpretability in autonomous driving systems,<br> DriveGPT4 sets a new standard for future developments in this field.
Created on 19 Mar. 2024

Assess the quality of the AI-generated content by voting

Score: 0

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

The previous summary was created more than a year ago and can be re-run (if necessary) by clicking on the Run button below.

The license of this specific paper does not allow us to build upon its content and the summarizing tools will be run using the paper metadata rather than the full article. However, it still does a good job, and you can also try our tools on papers with more open licenses.

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.