Recursive Agent Harnesses

AI-generated keywords: Recursive Agent Harnesses PricewaterhouseCoopers Anthropic RAH framework long-context reasoning

AI-generated Key Points

Recursive Agent Harness (RAH) introduced as a code-first extension to model recursion in Recursive Language Models (RLMs)
RAH framework involves parent agent generating and executing executable script to spawn multiple subagent harnesses for efficient workload handling
Subagents use structured function calls for smaller tasks within larger context
Controlled evaluation using GPT-5 backbone model shows significant improvements in Codex coding-agent baseline performance with RAH, increasing accuracy from 71.75% to 81.36%
RAH design achieves impressive accuracy of 89.77% when using more robust backbone model like Claude Sonnet 4.5
Research highlights potential of harness recursion for enhancing long-context reasoning capabilities in coding agents and improving performance through innovative recursive frameworks like RAH

Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Elias Lumer, Sahil Sen, Kevin Paul, Vamse Kumar Subbiah

arXiv: 2606.13643v1 - DOI (cs.CL)

License: CC BY 4.0

Abstract: Recursive language models (RLMs) showed that recursion over model calls is an effective strategy for long-context reasoning, and production coding agents have begun to write code that spawns subagents at scale, most recently in Anthropic's dynamic workflows. We name and study the pattern between these two lines of work, where the recursive unit is a full agent harness with filesystem tools, code execution, and planning rather than a model call with no tools. We call this the Recursive Agent Harness (RAH) and frame it as harness recursion, the code-first extension to the model recursion of RLMs. A parent agent generates and runs an executable script that spawns subagent harnesses in parallel for fine-grained workloads and uses structured function calls for small subtasks. We provide a controlled evaluation on long-context reasoning. With the backbone held fixed at GPT-5 to match the published Codex and RLM baselines, RAH improves the Codex coding-agent baseline from 71.75% to 81.36% on Oolong-Synthetic (199 samples, 13 context-length buckets up to 4M tokens), a gain attributable to the harness rather than the model. With a stronger backbone, Claude Sonnet 4.5, the same design reaches 89.77%.

Submitted to arXiv on 11 Jun. 2026

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2606.13643v1

Comprehensive Summary
Key points
Layman's Summary
Blog article

In their paper titled "Recursive Agent Harnesses," Elias Lumer, Sahil Sen, Kevin Paul, and Vamse Kumar Subbiah from PricewaterhouseCoopers explore the concept of harness recursion as a code-first extension to the model recursion seen in Recursive Language Models (RLMs). They highlight how production coding agents are now utilizing recursive strategies to spawn subagents at scale, particularly evident in Anthropic's dynamic workflows. The authors introduce the Recursive Agent Harness (RAH) as a full agent harness equipped with filesystem tools, code execution capabilities, and planning functionalities. The RAH framework involves a parent agent generating and executing an executable script that spawns multiple subagent harnesses in parallel to handle fine-grained workloads efficiently. These subagents utilize structured function calls for smaller tasks within the larger context. The study includes a controlled evaluation focusing on long-context reasoning using GPT-5 as the backbone model to align with existing Codex and RLM baselines. Through experiments on the Oolong-Synthetic dataset comprising 199 samples across 13 context-length buckets up to 4 million tokens, the authors demonstrate significant improvements in the Codex coding-agent baseline performance. Specifically, RAH enhances the baseline accuracy from 71.75% to 81.36%, showcasing the effectiveness of harness recursion over traditional model-based approaches. Furthermore, when employing a more robust backbone model like Claude Sonnet 4.5, the same RAH design achieves an impressive accuracy of 89.77%. Overall, this research sheds light on the potential of harness recursion as a powerful strategy for enhancing long-context reasoning capabilities in coding agents and showcases promising results in improving coding-agent performance through innovative recursive frameworks like RAH.

- Recursive Agent Harness (RAH) introduced as a code-first extension to model recursion in Recursive Language Models (RLMs)
- RAH framework involves parent agent generating and executing executable script to spawn multiple subagent harnesses for efficient workload handling
- Subagents use structured function calls for smaller tasks within larger context
- Controlled evaluation using GPT-5 backbone model shows significant improvements in Codex coding-agent baseline performance with RAH, increasing accuracy from 71.75% to 81.36%
- RAH design achieves impressive accuracy of 89.77% when using more robust backbone model like Claude Sonnet 4.5
- Research highlights potential of harness recursion for enhancing long-context reasoning capabilities in coding agents and improving performance through innovative recursive frameworks like RAH

Summary- Recursive Agent Harness (RAH) is a special tool that helps models understand and solve problems step by step. - RAH works by having a main agent create smaller agents to handle different tasks efficiently. - These smaller agents use specific instructions to complete their tasks within a bigger picture. - Testing showed that using RAH with certain models improved accuracy in coding tasks. - RAH can work even better with more advanced models, achieving high accuracy rates. Definitions- Recursive Agent Harness (RAH): A tool used to help models solve problems by breaking them down into smaller steps. - Recursion: The process of solving a problem by breaking it down into smaller parts and solving each part individually.

Introduction: In recent years, there has been a significant increase in the use of artificial intelligence (AI) and machine learning (ML) techniques to automate various tasks. One area that has seen rapid growth is coding agents, which are AI-powered tools designed to assist developers in writing code more efficiently. These agents use natural language processing (NLP) algorithms to understand human-written code and generate suggestions or even complete pieces of code. However, one major challenge faced by coding agents is their ability to handle long-context reasoning. Long-context reasoning refers to the ability to understand and process large amounts of information within a given context accurately. This is crucial for coding agents as they need to consider not only the current line of code but also its relationship with previous lines and overall project structure. To address this issue, Elias Lumer, Sahil Sen, Kevin Paul, and Vamse Kumar Subbiah from PricewaterhouseCoopers have published a research paper titled "Recursive Agent Harnesses." In this paper, they explore the concept of harness recursion as a potential solution for enhancing long-context reasoning capabilities in coding agents. Harness Recursion: The authors introduce Recursive Agent Harness (RAH) as a full agent harness equipped with filesystem tools, code execution capabilities, and planning functionalities. The RAH framework involves a parent agent generating an executable script that spawns multiple subagent harnesses in parallel. These subagents utilize structured function calls for smaller tasks within the larger context. This approach differs from traditional model-based approaches used by most coding agents where all tasks are handled by a single model. By using recursive strategies like RAH, production coding agents can spawn subagents at scale while maintaining efficiency. Evaluation: To evaluate their proposed approach's effectiveness, the authors conducted experiments on the Oolong-Synthetic dataset comprising 199 samples across 13 context-length buckets up to 4 million tokens. They used GPT-5 as the backbone model to align with existing Codex and Recursive Language Model (RLM) baselines. The results of the experiments showed a significant improvement in the baseline accuracy when using RAH. The baseline accuracy increased from 71.75% to 81.36%, showcasing the effectiveness of harness recursion over traditional model-based approaches. Furthermore, when employing a more robust backbone model like Claude Sonnet 4.5, the same RAH design achieved an impressive accuracy of 89.77%. This further highlights the potential of harness recursion as a powerful strategy for enhancing long-context reasoning capabilities in coding agents. Conclusion: In conclusion, "Recursive Agent Harnesses" by Lumer et al., presents an innovative approach to address one of the major challenges faced by coding agents – long-context reasoning. By introducing harness recursion through their RAH framework, the authors have demonstrated promising results in improving coding-agent performance. This research sheds light on how production coding agents are now utilizing recursive strategies to handle fine-grained workloads efficiently. The study also highlights how recursive frameworks like RAH can significantly enhance long-context reasoning capabilities and improve overall coding-agent performance. Future Work: While this paper showcases promising results, there is still room for further research and improvements in this area. One potential direction for future work could be exploring different backbone models and evaluating their performance with RAH compared to traditional model-based approaches. Additionally, it would be interesting to see how other factors such as programming languages or project complexity affect RAH's performance and whether it can adapt effectively in these scenarios. Conclusion: In summary, "Recursive Agent Harnesses" is a well-researched paper that introduces an innovative approach for enhancing long-context reasoning capabilities in coding agents through harness recursion. The authors' controlled evaluation demonstrates significant improvements over existing baselines, highlighting the effectiveness of this approach. This research has important implications not only for coding agents but also for other AI-powered tools that require efficient handling of large amounts of information. Harness recursion has the potential to revolutionize how these tools operate and improve their performance significantly. As such, this paper is a valuable contribution to the field of AI and machine learning and opens up new avenues for future research in this area.

Created on 13 Jun. 2026

Assess the quality of the AI-generated content by voting

Score: 0

Similar papers summarized with our AI tools

59.4%

Enterprise Deep Research: Steerable Multi-Agent Deep Research for Enterprise …

cs.CL

59.1%

Towards Agentic RAG with Deep Reasoning: A Survey of RAG-Reasoning Systems in…

cs.CL

58.0%

Code as Agent Harness

cs.CL

57.9%

RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori…

cs.CL

57.9%

LiveMCP-101: Stress Testing and Diagnosing MCP-enabled Agents on Challenging …

cs.CL

57.5%

Exploring Advanced Large Language Models with LLMsuite

cs.CL

57.3%

Retrieval Augmented Generation or Long-Context LLMs? A Comprehensive Study an…

cs.CL

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.