Executable Code Actions Elicit Better LLM Agents

AI-generated keywords: Large Language Model (LLM) agents CodeAct Python interpreter API-Bank CodeActInstruct

AI-generated Key Points

  • The study explores the potential of Large Language Model (LLM) agents in addressing real-world challenges by leveraging executable Python code to enhance their actions.
  • LLM agents are capable of a wide range of tasks but are often limited by constrained action spaces and restricted flexibility.
  • The proposed approach, CodeAct, consolidates LLM agents' actions into a unified action space by integrating with a Python interpreter to execute code actions dynamically.
  • Extensive analysis involving 17 LLMs on API-Bank and a newly curated benchmark demonstrates that CodeAct outperforms existing alternatives by up to 20% in terms of success rate.
  • A dataset called CodeActInstruct consisting of 7k multi-turn interactions using CodeAct has been collected to facilitate interaction between users and the open-source LLM agent developed through this study.
  • CodeAct can be utilized with existing data to enhance models in agent-oriented tasks without compromising their overall capability.
  • The CodeActAgent, fine-tuned from Llama2 and Mistral models, is integrated with a Python interpreter and specialized for performing complex tasks like model training using existing libraries while autonomously self-debugging.
  • Experiments conducted with various LLMs validate the benefits of CodeAct in improving agent performance on basic tasks involving atomic tool use.
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Xingyao Wang, Yangyi Chen, Lifan Yuan, Yizhe Zhang, Yunzhu Li, Hao Peng, Heng Ji

Code, data, model, and demo are available at https://github.com/xingyaoww/code-act
License: CC BY 4.0

Abstract: Large Language Model (LLM) agents, capable of performing a broad range of actions, such as invoking tools and controlling robots, show great potential in tackling real-world challenges. LLM agents are typically prompted to produce actions by generating JSON or text in a pre-defined format, which is usually limited by constrained action space (e.g., the scope of pre-defined tools) and restricted flexibility (e.g., inability to compose multiple tools). This work proposes to use executable Python code to consolidate LLM agents' actions into a unified action space (CodeAct). Integrated with a Python interpreter, CodeAct can execute code actions and dynamically revise prior actions or emit new actions upon new observations through multi-turn interactions. Our extensive analysis of 17 LLMs on API-Bank and a newly curated benchmark shows that CodeAct outperforms widely used alternatives (up to 20% higher success rate). The encouraging performance of CodeAct motivates us to build an open-source LLM agent that interacts with environments by executing interpretable code and collaborates with users using natural language. To this end, we collect an instruction-tuning dataset CodeActInstruct that consists of 7k multi-turn interactions using CodeAct. We show that it can be used with existing data to improve models in agent-oriented tasks without compromising their general capability. CodeActAgent, finetuned from Llama2 and Mistral, is integrated with Python interpreter and uniquely tailored to perform sophisticated tasks (e.g., model training) using existing libraries and autonomously self-debug.

Submitted to arXiv on 01 Feb. 2024

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2402.01030v1

The study "Executable Code Actions Elicit Better LLM Agents" explores the potential of Large Language Model (LLM) agents in addressing real-world challenges by leveraging executable Python code to enhance their actions. These LLM agents are capable of a wide range of tasks, such as invoking tools and controlling robots, but are often limited by constrained action spaces and restricted flexibility. The proposed approach, CodeAct, consolidates LLM agents' actions into a unified action space by integrating with a Python interpreter to execute code actions dynamically. Extensive analysis involving 17 LLMs on API-Bank and a newly curated benchmark demonstrates that CodeAct outperforms existing alternatives by up to 20% in terms of success rate. This performance improvement has led to the development of an open-source LLM agent that interacts with environments through interpretable code execution and collaborates with users using natural language. To facilitate this interaction, the researchers have collected a dataset called CodeActInstruct consisting of 7k multi-turn interactions using CodeAct. Furthermore, the study showcases how CodeAct can be utilized with existing data to enhance models in agent-oriented tasks without compromising their overall capability. The CodeActAgent, fine-tuned from Llama2 and Mistral models, is integrated with a Python interpreter and specialized for performing complex tasks like model training using existing libraries while autonomously self-debugging. The experiments conducted with various LLMs validate the benefits of CodeAct in improving agent performance on basic tasks involving atomic tool use. By enabling more flexible and dynamic actions through executable code, CodeAct offers promising advancements in enhancing the capabilities of LLM agents for tackling diverse real-world challenges effectively.
Created on 09 Apr. 2025

Assess the quality of the AI-generated content by voting

Score: 0

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

Similar papers summarized with our AI tools

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.