WebExplorer: Explore and Evolve for Training Long-Horizon Web Agents

AI-generated keywords: WebExplorer

AI-generated Key Points

WebExplorer is a systematic data generation approach that utilizes model-based exploration and iterative query evolution to address limited data challenges in information seeking.
The method creates challenging query-answer pairs requiring multi-step reasoning and complex web navigation.
WebExplorer-8B, developed through supervised fine-tuning and reinforcement learning on a high-quality dataset, supports a context length of 128K and up to 100 tool calling turns for long-horizon problem solving.
Achieves state-of-the-art performance across various information-seeking benchmarks, outperforming larger models on tasks like BrowseComp-en/zh and WebWalkerQA.
Demonstrates strong generalization on the HLE benchmark despite being trained on knowledge-intensive QA data.
Proposes a different model-based exploration approach with WebExplorer, leveraging powerful LLMs for autonomous construction of the information space through prompting.
Showcases exceptional performance in information-seeking tasks and sets a new standard for long-horizon web agents through innovative data generation and modeling techniques.

Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Junteng Liu, Yunji Li, Chi Zhang, Jingyang Li, Aili Chen, Ke Ji, Weiyu Cheng, Zijia Wu, Chengyu Du, Qidi Xu, Jiayuan Song, Zhengmao Zhu, Wenhu Chen, Pengyu Zhao, Junxian He

arXiv: 2509.06501v1 - DOI (cs.CL)

License: CC BY 4.0

Abstract: The paradigm of Large Language Models (LLMs) has increasingly shifted toward agentic applications, where web browsing capabilities are fundamental for retrieving information from diverse online sources. However, existing open-source web agents either demonstrate limited information-seeking abilities on complex tasks or lack transparent implementations. In this work, we identify that the key challenge lies in the scarcity of challenging data for information seeking. To address this limitation, we introduce WebExplorer: a systematic data generation approach using model-based exploration and iterative, long-to-short query evolution. This method creates challenging query-answer pairs that require multi-step reasoning and complex web navigation. By leveraging our curated high-quality dataset, we successfully develop advanced web agent WebExplorer-8B through supervised fine-tuning followed by reinforcement learning. Our model supports 128K context length and up to 100 tool calling turns, enabling long-horizon problem solving. Across diverse information-seeking benchmarks, WebExplorer-8B achieves the state-of-the-art performance at its scale. Notably, as an 8B-sized model, WebExplorer-8B is able to effectively search over an average of 16 turns after RL training, achieving higher accuracy than WebSailor-72B on BrowseComp-en/zh and attaining the best performance among models up to 100B parameters on WebWalkerQA and FRAMES. Beyond these information-seeking tasks, our model also achieves strong generalization on the HLE benchmark even though it is only trained on knowledge-intensive QA data. These results highlight our approach as a practical path toward long-horizon web agents.

Submitted to arXiv on 08 Sep. 2025

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2509.06501v1

Comprehensive Summary
Key points
Layman's Summary
Blog article

, , , , The authors present WebExplorer, a systematic data generation approach that utilizes model-based exploration and iterative query evolution to address the challenge of limited data for information seeking. This method creates challenging query-answer pairs that require multi-step reasoning and complex web navigation. Through leveraging a high-quality dataset, the authors develop the advanced web agent WebExplorer-8B via supervised fine-tuning and reinforcement learning. The model supports a context length of 128K and up to 100 tool calling turns, enabling long-horizon problem solving. WebExplorer-8B achieves state-of-the-art performance across various information-seeking benchmarks, outperforming models of larger sizes on tasks such as BrowseComp-en/zh and WebWalkerQA. Additionally, the model demonstrates strong generalization on the HLE benchmark despite being trained on knowledge-intensive QA data. The results highlight WebExplorer-8B as a practical approach towards developing long-horizon web agents. Furthermore, the authors propose a different model-based exploration approach with WebExplorer, leveraging powerful LLMs to autonomously construct the information space through prompting. By providing a seed entity and example QA pairs, the model conducts iterative search and browsing actions to synthesize challenging query-answer pairs that require deep reasoning across multiple connections. Overall, WebExplorer showcases exceptional performance in information-seeking tasks and sets a new standard for long-horizon web agents through its innovative data generation approach and advanced modeling techniques.

- WebExplorer is a systematic data generation approach that utilizes model-based exploration and iterative query evolution to address limited data challenges in information seeking.
- The method creates challenging query-answer pairs requiring multi-step reasoning and complex web navigation.
- WebExplorer-8B, developed through supervised fine-tuning and reinforcement learning on a high-quality dataset, supports a context length of 128K and up to 100 tool calling turns for long-horizon problem solving.
- Achieves state-of-the-art performance across various information-seeking benchmarks, outperforming larger models on tasks like BrowseComp-en/zh and WebWalkerQA.
- Demonstrates strong generalization on the HLE benchmark despite being trained on knowledge-intensive QA data.
- Proposes a different model-based exploration approach with WebExplorer, leveraging powerful LLMs for autonomous construction of the information space through prompting.
- Showcases exceptional performance in information-seeking tasks and sets a new standard for long-horizon web agents through innovative data generation and modeling techniques.

SummaryWebExplorer is a smart way to find information online by asking questions and exploring websites. It helps solve problems when there isn't much data available. WebExplorer-8B is a special version that can handle big challenges and complex tasks on the internet. It does really well on different tests for finding information and learning new things. Even though it's trained on specific knowledge, it can still do a great job on other tasks too. Definitions- Systematic: Done in an organized and planned way. - Model-based: Using a specific plan or idea as a guide. - Exploration: Looking around or searching for something. - Iterative: Doing something repeatedly to improve it. - Query: Asking a question or looking for information. - Evolution: Gradual development or change over time. - Supervised fine-tuning: Making small adjustments with guidance from someone else. - Reinforcement learning: Learning through trial and error with rewards for success. - Benchmark: A standard or point of reference for comparison. - Generalization: Applying knowledge to new situations beyond what was learned. - Autonomous: Able to work independently without constant help.

Introduction

In today's digital age, the internet has become an essential tool for information seeking. With a vast amount of data available online, it can be challenging to find relevant and accurate information efficiently. This challenge is further compounded by the limited data available for training intelligent web agents. To address this issue, a team of researchers from Microsoft Research Asia and Tsinghua University have developed WebExplorer, a systematic data generation approach that utilizes model-based exploration and iterative query evolution. In their research paper titled "WebExplorer: A Systematic Approach towards Long-Horizon Information Seeking," they present their findings on how this method can improve the performance of long-horizon web agents.

The Challenge of Limited Data in Information Seeking

The authors highlight the challenge of limited data in developing effective web agents for information seeking tasks. Traditional methods rely on manually curated datasets or crowdsourcing to generate training data, which can be time-consuming and costly. Moreover, these datasets may not cover all possible scenarios, leading to poor generalization when applied to real-world problems. To overcome these limitations, the authors propose WebExplorer as a solution that leverages existing high-quality datasets while also generating new challenging query-answer pairs through model-based exploration.

The WebExplorer Approach

WebExplorer consists of two main components: WebExplorer-8B and LLM-based exploration.

WebExplorer-8B

WebExplorer-8B is an advanced web agent trained via supervised fine-tuning and reinforcement learning using a high-quality dataset called BrowseComp-en/zh. This dataset contains over 1 million QA pairs with complex reasoning required for multi-step problem-solving. The model supports a context length of 128K (the number of tokens used as input) and up to 100 tool calling turns (the maximum number of actions taken by the agent). These capabilities enable WebExplorer-8B to handle long-horizon information-seeking tasks effectively. The authors evaluated the performance of WebExplorer-8B on various benchmarks, including BrowseComp-en/zh and WebWalkerQA. The results showed that WebExplorer-8B outperformed models of larger sizes, demonstrating its effectiveness in handling complex information-seeking tasks. Furthermore, the model also demonstrated strong generalization on the HLE benchmark despite being trained on knowledge-intensive QA data. This highlights the robustness and practicality of WebExplorer-8B in real-world scenarios.

LLM-based Exploration

In addition to WebExplorer-8B, the authors also propose a different approach to model-based exploration using powerful language representation models (LLMs). This method involves autonomously constructing an information space through prompting. By providing a seed entity and example QA pairs, LLM-based exploration conducts iterative search and browsing actions to synthesize challenging query-answer pairs that require deep reasoning across multiple connections. This approach showcases exceptional performance in information-seeking tasks and sets a new standard for long-horizon web agents.

Conclusion

In conclusion, the research paper "WebExplorer: A Systematic Approach towards Long-Horizon Information Seeking" presents an innovative solution to address the challenge of limited data for developing effective web agents. Through their systematic data generation approach and advanced modeling techniques, the authors have developed WebExplorer as a practical tool for long-horizon information seeking. The results demonstrate that WebExplorer outperforms existing models on various benchmarks while also showcasing strong generalization capabilities. Furthermore, their proposed LLM-based exploration approach shows promising results in generating challenging query-answer pairs for training intelligent web agents. Overall, this research paper contributes significantly to advancing the field of long-horizon web agents and provides valuable insights into how model-based exploration can improve performance in information seeking tasks.

Created on 11 Sep. 2025

Assess the quality of the AI-generated content by voting

Score: 0

Similar papers summarized with our AI tools

62.5%

WebGLM: Towards An Efficient Web-Enhanced Question Answering System with Huma…

cs.CL

60.7%

MindSearch: Mimicking Human Minds Elicits Deep AI Searcher

cs.CL

60.1%

AutoWebGLM: Bootstrap And Reinforce A Large Language Model-based Web Navigati…

cs.CL

58.2%

ReST meets ReAct: Self-Improvement for Multi-Step Reasoning LLM Agent

cs.CL

57.9%

ScribeAgent: Towards Specialized Web Agents Using Production-Scale Workflow D…

cs.CL

56.3%

A Comprehensive Survey on Long Context Language Modeling

cs.CL

56.0%

Generate rather than Retrieve: Large Language Models are Strong Context Gener…

cs.CL

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.