Code as Agent Harness

AI-generated keywords: Large Language Models Agent Harnesses Code as Infrastructure Multi-Agent Environments Agentic AI Systems

AI-generated Key Points

Recent advancements in large language models (LLMs) have showcased remarkable ability to understand and generate code across various domains
Code is now a fundamental operational substrate for agent reasoning, action execution, environment modeling, and verification processes in emerging agentic systems
The concept of agent harnesses highlights code as the central component for building robust agent infrastructure
Three interconnected layers are explored in this survey:
Harness interface facilitates connection between agents and key components like reasoning mechanisms, action execution strategies, and environment modeling techniques
Harness mechanisms include planning methodologies, memory management strategies, tool selection processes, feedback-driven control mechanisms for adaptability and reliability enhancement
Scaling the harness from single-agent setups to multi-agent environments with shared code artifacts facilitating coordination among multiple agents
Representative methods and practical applications of code as an agent harness span diverse domains including coding assistants, robotics applications, recommendation systems, DevOps practices, enterprise workflows
Open challenges in harness engineering include evaluating performance beyond task completion metrics, verifying system behavior with incomplete feedback, regression-free improvement of harness functionality, ensuring consistent shared state across multiple agents, incorporating human oversight for safety-critical actions, extending capabilities to multimodal environments

Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Xuying Ning, Katherine Tieu, Dongqi Fu, Tianxin Wei, Zihao Li, Yuanchen Bei, Jiaru Zou, Mengting Ai, Zhining Liu, Ting-Wei Li, Lingjie Chen, Yanjun Zhao, Ke Yang, Bingxuan Li, Cheng Qian, Gaotang Li, Xiao Lin, Zhichen Zeng, Ruizhong Qiu, Sirui Chen, Yifan Sun, Xiyuan Yang, Ruida Wang, Rui Pan, Chenyuan Yang, Dylan Zhang, Liri Fang, Zikun Cui, Yang Cao, Pan Chen, Dorothy Sun, Ren Chen, Mahesh Srinivasan, Nipun Mathur, Yinglong Xia, Hong Li, Hong Yan, Pan Lu, Lingming Zhang, Tong Zhang, Hanghang Tong, Jingrui He

arXiv: 2605.18747v1 - DOI (cs.CL)

GitHub: https://github.com/YennNing/Awesome-Code-as-Agent-Harness-Papers

License: CC BY 4.0

Abstract: Recent large language models (LLMs) have demonstrated strong capabilities in understanding and generating code, from competitive programming to repository-level software engineering. In emerging agentic systems, code is no longer only a target output. It increasingly serves as an operational substrate for agent reasoning, acting, environment modeling, and execution-based verification. We frame this shift through the lens of agent harnesses and introduce code as agent harness: a unified view that centers code as the basis for agent infrastructure. To systematically study this perspective, we organize the survey around three connected layers. First, we study the harness interface, where code connects agents to reasoning, action, and environment modeling. Second, we examine harness mechanisms: planning, memory, and tool use for long-horizon execution, together with feedback-driven control and optimization that make harness reliable and adaptive. Third, we discuss scaling the harness from single-agent systems to multi-agent settings, where shared code artifacts support multi-agent coordination, review, and verification. Across these layers, we summarize representative methods and practical applications of code as agent harness, spanning coding assistants, GUI/OS automation, embodied agents, scientific discovery, personalization and recommendation, DevOps, and enterprise workflows. We further outline open challenges for harness engineering, including evaluation beyond final task success, verification under incomplete feedback, regression-free harness improvement, consistent shared state across multiple agents, human oversight for safety-critical actions, and extensions to multimodal environments. By centering code as the harness of agentic AI, this survey provides a unified roadmap toward executable, verifiable, and stateful AI agent systems.

Submitted to arXiv on 18 May. 2026

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2605.18747v1

Comprehensive Summary
Key points
Layman's Summary
Blog article

Recent advancements in large language models (LLMs) have showcased their remarkable ability to understand and generate code across various domains. From competitive programming to high-level software engineering tasks, LLMs have proven to be a powerful tool. However, in the realm of emerging agentic systems, code is no longer just a final output but rather a fundamental operational substrate that underpins agent reasoning, action execution, environment modeling, and verification processes. This paradigm shift is encapsulated through the concept of agent harnesses where code serves as the central component for building robust agent infrastructure. To delve deeper into this transformative perspective, this survey delves into three interconnected layers. Firstly, it explores the harness interface and highlights how code facilitates the connection between agents and key components such as reasoning mechanisms, action execution strategies, and environment modeling techniques. Secondly, it delves into harness mechanisms that encompass planning methodologies, memory management strategies for long-horizon execution tasks, tool selection processes for optimal performance, feedback-driven control mechanisms for adaptability and reliability enhancement. Furthermore, the survey discusses scaling the harness from single-agent setups to multi-agent environments where shared code artifacts play a crucial role in facilitating coordination among multiple agents while enabling efficient review and verification processes. The exploration of representative methods and practical applications of code as an agent harness spans diverse domains including coding assistants, GUI/OS automation tools, embodied agents in robotics applications scientific discovery processes personalization algorithms recommendation systems DevOps practices enterprise workflows. Moreover open challenges are outlined in harness engineering such as evaluating performance beyond task completion metrics verifying system behavior with incomplete feedback regression-free improvement of harness functionality ensuring consistent shared state across multiple agents incorporating human oversight for safety-critical actions extending capabilities to multimodal environments. By emphasizing code as the foundational element of agentic AI systems this survey provides a comprehensive roadmap towards developing executable verifiable and stateful AI agent systems that can effectively navigate complex real-world scenarios.

- Recent advancements in large language models (LLMs) have showcased remarkable ability to understand and generate code across various domains
- Code is now a fundamental operational substrate for agent reasoning, action execution, environment modeling, and verification processes in emerging agentic systems
- The concept of agent harnesses highlights code as the central component for building robust agent infrastructure
- Three interconnected layers are explored in this survey:
- Harness interface facilitates connection between agents and key components like reasoning mechanisms, action execution strategies, and environment modeling techniques
- Harness mechanisms include planning methodologies, memory management strategies, tool selection processes, feedback-driven control mechanisms for adaptability and reliability enhancement
- Scaling the harness from single-agent setups to multi-agent environments with shared code artifacts facilitating coordination among multiple agents
- Representative methods and practical applications of code as an agent harness span diverse domains including coding assistants, robotics applications, recommendation systems, DevOps practices, enterprise workflows
- Open challenges in harness engineering include evaluating performance beyond task completion metrics, verifying system behavior with incomplete feedback, regression-free improvement of harness functionality, ensuring consistent shared state across multiple agents, incorporating human oversight for safety-critical actions, extending capabilities to multimodal environments

SummaryRecent advancements in big language models have shown that they can understand and create code in different areas. Code is now very important for making things work and understanding how things happen. Agents use code to think, do things, understand the world, and check if things are correct. There are three main parts to look at: connecting agents with important parts like thinking, doing things, and understanding the world; planning how to do things, remembering important information, choosing tools, and using feedback to make sure everything works well; making sure that many agents can work together by sharing code and coordinating with each other. Code is used in many different areas like helping with coding tasks, robots, giving recommendations, managing business processes efficiently. Definitions- Advancements: Improvements or progress made in something. - Large Language Models (LLMs): Advanced computer programs that can understand and generate human languages. - Code: Instructions written for computers to perform specific tasks. - Agent: A program or system that acts on behalf of a person or another program. - Infrastructure: The basic physical systems needed for an organization or society to function. - Interface: A point where two systems meet and interact with each other. - Mechanisms: Systems or processes designed to achieve a particular result. - Scalability: The ability of a system to handle growth without losing performance or efficiency. - Artifacts: Objects created by humans that have some kind of value or meaning. - Coordination: Working together smoothly towards a common goal.

Recent advancements in large language models (LLMs) have revolutionized the field of artificial intelligence, particularly in their ability to understand and generate code across various domains. From competitive programming to high-level software engineering tasks, LLMs have proven to be a powerful tool. However, with the emergence of agentic systems, code is no longer just a final output but rather a fundamental operational substrate that underpins agent reasoning, action execution, environment modeling, and verification processes. This paradigm shift is encapsulated through the concept of agent harnesses where code serves as the central component for building robust agent infrastructure. To delve deeper into this transformative perspective, a recent research paper titled "Harnessing Code: A Survey on Code-Centric Approaches for Building Agentic AI Systems" explores three interconnected layers of harness engineering. Firstly, it delves into the harness interface and highlights how code facilitates the connection between agents and key components such as reasoning mechanisms, action execution strategies, and environment modeling techniques. Secondly, it discusses harness mechanisms that encompass planning methodologies, memory management strategies for long-horizon execution tasks, tool selection processes for optimal performance, feedback-driven control mechanisms for adaptability and reliability enhancement. The survey also addresses scaling the harness from single-agent setups to multi-agent environments where shared code artifacts play a crucial role in facilitating coordination among multiple agents while enabling efficient review and verification processes. This is especially important in complex real-world scenarios where multiple agents need to work together seamlessly. One of the key takeaways from this survey is that code plays a critical role in shaping agentic AI systems by providing an executable and verifiable foundation for these systems. By emphasizing code as the foundational element of agentic AI systems instead of treating it as an afterthought or mere output result of machine learning algorithms like LLMs do currently - this survey provides a comprehensive roadmap towards developing more robust and reliable AI agent systems. The exploration of representative methods and practical applications of code as an agent harness spans diverse domains including coding assistants, GUI/OS automation tools, embodied agents in robotics applications, scientific discovery processes, personalization algorithms, recommendation systems, DevOps practices, and enterprise workflows. This highlights the versatility and potential impact of code-centric approaches in building agentic AI systems. However, the survey also outlines some open challenges in harness engineering that need to be addressed for further advancements in this field. These include evaluating performance beyond task completion metrics, verifying system behavior with incomplete feedback, regression-free improvement of harness functionality, ensuring consistent shared state across multiple agents, incorporating human oversight for safety-critical actions and extending capabilities to multimodal environments. In conclusion, "Harnessing Code: A Survey on Code-Centric Approaches for Building Agentic AI Systems" sheds light on the crucial role of code in shaping agentic AI systems and provides a comprehensive overview of current research and practical applications in this field. By emphasizing the importance of code as a foundational element rather than just a final output result of machine learning algorithms like LLMs do currently - this survey paves the way towards developing more robust and reliable AI agent systems that can effectively navigate complex real-world scenarios.

Created on 13 Jun. 2026

Assess the quality of the AI-generated content by voting

Score: 0

Similar papers summarized with our AI tools

61.2%

Towards Agentic RAG with Deep Reasoning: A Survey of RAG-Reasoning Systems in…

cs.CL

57.2%

Large Language Model Agent: A Survey on Methodology, Applications and Challen…

cs.CL

56.2%

Latent Collaboration in Multi-Agent Systems

cs.CL

55.1%

AgentSquare: Automatic LLM Agent Search in Modular Design Space

cs.CL

54.8%

The Prompt Report: A Systematic Survey of Prompting Techniques

cs.CL

54.6%

LiveMCP-101: Stress Testing and Diagnosing MCP-enabled Agents on Challenging …

cs.CL

54.5%

OpenAgents: An Open Platform for Language Agents in the Wild

cs.CL

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.