AgentFly: Fine-tuning LLM Agents without Fine-tuning LLMs

AI-generated keywords: Learning paradigm

AI-generated Key Points

  • Introduces a novel learning paradigm for adaptive Large Language Model (LLM) agents that eliminates the need for fine-tuning
  • Enables low-cost continual adaptation through memory-based online reinforcement learning
  • Formulated as a Memory-augmented Markov Decision Process (M-MDP) with a neural case-selection policy
  • Agent model, AgentFly, achieves top performance on various datasets representing different research challenges
  • Outperforms existing methods on open-domain QA datasets
  • Explores integration of external tools into language agents for multi-hop tool calls in long-horizon tasks
  • Proposes Agentic Reinforcement Learning as a training paradigm to enable dynamic interactions with external tool environments
  • Incorporates case-based reasoning into planning to facilitate strategic tool calls and improve performance in web research tasks
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Huichi Zhou, Yihang Chen, Siyuan Guo, Xue Yan, Kin Hei Lee, Zihan Wang, Ka Yiu Lee, Guchun Zhang, Kun Shao, Linyi Yang, Jun Wang

License: CC BY 4.0

Abstract: In this paper, we introduce a novel learning paradigm for adaptive Large Language Model (LLM) agents that eliminates the need for fine-tuning the underlying LLMs. Existing approaches are often either rigid, relying on static, handcrafted reflection workflows, or computationally intensive, requiring gradient updates of LLM model parameters. In contrast, our method enables low-cost continual adaptation via memory-based online reinforcement learning. We formalise this as a Memory-augmented Markov Decision Process (M-MDP), equipped with a neural case-selection policy to guide action decisions. Past experiences are stored in an episodic memory, either differentiable or non-parametric. The policy is continually updated based on environmental feedback through a memory rewriting mechanism, whereas policy improvement is achieved through efficient memory reading (retrieval). We instantiate our agent model in the deep research setting, namely AgentFly, which attains top-1 on GAIA validation ($87.88\%$ Pass@$3$) and $79.40\%$ on the test set. It reaches $66.6\%$ F1 and $80.4\%$ PM on the DeepResearcher dataset, outperforming the state-of-the-art training-based method, while case-based memory adds $4.7\%$ to $9.6\%$ absolute points on out-of-distribution tasks. Our approach offers a scalable and efficient pathway for developing generalist LLM agents capable of continuous, real-time learning without gradient updates, advancing machine learning towards open-ended skill acquisition and deep research scenarios. The code is available at https://github.com/Agent-on-the-Fly/AgentFly.

Submitted to arXiv on 22 Aug. 2025

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2508.16153v1

, , , , In this paper, we introduce a novel learning paradigm for adaptive Large Language Model (LLM) agents that eliminates the need for fine-tuning. Our method enables low-cost continual adaptation through memory-based online reinforcement learning, formalized as a Memory-augmented Markov Decision Process (M-MDP) with a neural case-selection policy. Past experiences are stored in an episodic memory, and the policy is continually updated based on environmental feedback. Our agent model, AgentFly, achieves top performance on various datasets representing different research challenges. We evaluate our approach on four datasets: GAIA for long-horizon planning, DeepResearcher for real-time web-based research, SimpleQA for factual accuracy, and HLE for exploration at the frontier of human knowledge. Performance comparisons show that AgentFly outperforms existing methods on open-domain QA datasets. We also explore the integration of external tools into language agents in the context of multi-hop tool calls for long-horizon tasks. We propose Agentic Reinforcement Learning as a training paradigm to enable dynamic interactions with external tool environments. By incorporating case-based reasoning into planning, strategic tool calls are facilitated leading to consistently strong performance. Experimental results on the Deep Researcher dataset demonstrate that AgentFly augmented with MCP tools achieves significant improvements in F1 scores compared to baseline methods like CoT + RAG. This highlights the effectiveness of real-time online retrieval tools in enhancing agent performance in web research tasks. Overall, our study presents a scalable and efficient approach for developing generalist LLM agents capable of continuous learning without gradient updates. Our findings contribute to advancing machine learning towards open-ended skill acquisition and deep research scenarios.
Created on 25 Aug. 2025

Assess the quality of the AI-generated content by voting

Score: 0

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

Similar papers summarized with our AI tools

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.