Recursive Agent Harnesses

AI-generated keywords: Recursive Agent Harnesses PricewaterhouseCoopers Anthropic RAH framework long-context reasoning

AI-generated Key Points

  • Recursive Agent Harness (RAH) introduced as a code-first extension to model recursion in Recursive Language Models (RLMs)
  • RAH framework involves parent agent generating and executing executable script to spawn multiple subagent harnesses for efficient workload handling
  • Subagents use structured function calls for smaller tasks within larger context
  • Controlled evaluation using GPT-5 backbone model shows significant improvements in Codex coding-agent baseline performance with RAH, increasing accuracy from 71.75% to 81.36%
  • RAH design achieves impressive accuracy of 89.77% when using more robust backbone model like Claude Sonnet 4.5
  • Research highlights potential of harness recursion for enhancing long-context reasoning capabilities in coding agents and improving performance through innovative recursive frameworks like RAH
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Elias Lumer, Sahil Sen, Kevin Paul, Vamse Kumar Subbiah

License: CC BY 4.0

Abstract: Recursive language models (RLMs) showed that recursion over model calls is an effective strategy for long-context reasoning, and production coding agents have begun to write code that spawns subagents at scale, most recently in Anthropic's dynamic workflows. We name and study the pattern between these two lines of work, where the recursive unit is a full agent harness with filesystem tools, code execution, and planning rather than a model call with no tools. We call this the Recursive Agent Harness (RAH) and frame it as harness recursion, the code-first extension to the model recursion of RLMs. A parent agent generates and runs an executable script that spawns subagent harnesses in parallel for fine-grained workloads and uses structured function calls for small subtasks. We provide a controlled evaluation on long-context reasoning. With the backbone held fixed at GPT-5 to match the published Codex and RLM baselines, RAH improves the Codex coding-agent baseline from 71.75% to 81.36% on Oolong-Synthetic (199 samples, 13 context-length buckets up to 4M tokens), a gain attributable to the harness rather than the model. With a stronger backbone, Claude Sonnet 4.5, the same design reaches 89.77%.

Submitted to arXiv on 11 Jun. 2026

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2606.13643v1

In their paper titled "Recursive Agent Harnesses," Elias Lumer, Sahil Sen, Kevin Paul, and Vamse Kumar Subbiah from PricewaterhouseCoopers explore the concept of harness recursion as a code-first extension to the model recursion seen in Recursive Language Models (RLMs). They highlight how production coding agents are now utilizing recursive strategies to spawn subagents at scale, particularly evident in Anthropic's dynamic workflows. The authors introduce the Recursive Agent Harness (RAH) as a full agent harness equipped with filesystem tools, code execution capabilities, and planning functionalities. The RAH framework involves a parent agent generating and executing an executable script that spawns multiple subagent harnesses in parallel to handle fine-grained workloads efficiently. These subagents utilize structured function calls for smaller tasks within the larger context. The study includes a controlled evaluation focusing on long-context reasoning using GPT-5 as the backbone model to align with existing Codex and RLM baselines. Through experiments on the Oolong-Synthetic dataset comprising 199 samples across 13 context-length buckets up to 4 million tokens, the authors demonstrate significant improvements in the Codex coding-agent baseline performance. Specifically, RAH enhances the baseline accuracy from 71.75% to 81.36%, showcasing the effectiveness of harness recursion over traditional model-based approaches. Furthermore, when employing a more robust backbone model like Claude Sonnet 4.5, the same RAH design achieves an impressive accuracy of 89.77%. Overall, this research sheds light on the potential of harness recursion as a powerful strategy for enhancing long-context reasoning capabilities in coding agents and showcases promising results in improving coding-agent performance through innovative recursive frameworks like RAH.
Created on 13 Jun. 2026

Assess the quality of the AI-generated content by voting

Score: 0

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

Similar papers summarized with our AI tools

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.