CooperBench: Why Coding Agents Cannot be Your Teammates Yet

AI-generated keywords: Collaborative coding AI agents Coordination capabilities CooperBench Social intelligence

AI-generated Key Points

  • In collaborative coding, AI agents need effective communication skills and conflict resolution abilities to function successfully as teammates.
  • Current AI agents are lacking coordination capabilities for seamless collaboration, as shown by the CooperBench benchmark.
  • CooperBench consists of over 600 tasks in 4 programming languages, highlighting the challenges faced by AI agents in coordinating their efforts.
  • The "curse of coordination" phenomenon was observed, with a 30% decrease in success rates when AI agents worked together compared to working individually.
  • Three key issues hindering effective collaboration among AI agents were identified: congested communication channels, deviations from commitments, and incorrect expectations about others' plans and strategies.
  • Rare emergent coordination behaviors were observed through large-scale simulations, including role division, resource allocation, and negotiation tactics.
  • The study advocates for prioritizing the development of social intelligence in AI agents over individual capability enhancement to improve collaborative capabilities.
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Arpandeep Khatua, Hao Zhu, Peter Tran, Arya Prabhudesai, Frederic Sadrieh, Johann K. Lieberwirth, Xinkai Yu, Yicheng Fu, Michael J. Ryan, Jiaxin Pei, Diyi Yang

https://cooperbench.com First two authors contribute equally. The 3th - 6th authors contribute equally
License: CC BY-SA 4.0

Abstract: Resolving team conflicts requires not only task-specific competence, but also social intelligence to find common ground and build consensus. As AI agents increasingly collaborate on complex work, they must develop coordination capabilities to function as effective teammates. Yet we hypothesize that current agents lack these capabilities. To test this, we introduce CooperBench, a benchmark of over 600 collaborative coding tasks across 12 libraries in 4 programming languages. Each task assigns two agents different features that can be implemented independently but may conflict without proper coordination. Tasks are grounded in real open-source repositories with expert-written tests. Evaluating state-of-the-art coding agents, we observe the curse of coordination: agents achieve on average 30% lower success rates when working together compared to performing both tasks individually. This contrasts sharply with human teams, where adding teammates typically improves productivity. Our analysis reveals three key issues: (1) communication channels become jammed with vague, ill-timed, and inaccurate messages; (2) even with effective communication, agents deviate from their commitments; and (3) agents often hold incorrect expectations about others' plans and communication. Through large-scale simulation, we also observe rare but interesting emergent coordination behavior including role division, resource division, and negotiation. Our research presents a novel benchmark for collaborative coding and calls for a shift from pursuing individual agent capability to developing social intelligence.

Submitted to arXiv on 19 Jan. 2026

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2601.13295v2

In the realm of collaborative coding, it is crucial for AI agents to possess effective communication skills and the ability to resolve team conflicts in order to function successfully as teammates. However, current AI agents are believed to lack the necessary coordination capabilities for seamless collaboration. To investigate this hypothesis, a benchmark known as CooperBench was introduced. It consists of over 600 collaborative coding tasks spanning 12 libraries in 4 programming languages. Each task assigns two agents distinct features that can be implemented independently but may clash without proper coordination. These tasks are rooted in real open-source repositories with expertly crafted tests to ensure validity. Upon evaluating state-of-the-art coding agents on CooperBench, a phenomenon termed the "curse of coordination" was observed: agents exhibited an average 30% decrease in success rates when working together compared to completing tasks individually. This stark contrast with human teams highlights the existing challenges faced by AI agents in collaborative settings where adding teammates typically enhances productivity. An analysis of the findings revealed three key issues hindering effective collaboration among AI agents: (1) communication channels becoming congested with vague, ill-timed, and inaccurate messages; (2) deviations from commitments even with effective communication; and (3) incorrect expectations regarding others' plans and communication strategies. Through large-scale simulations, rare yet intriguing emergent coordination behaviors were observed including role division, resource allocation, and negotiation tactics. This research not only presents a novel benchmark for assessing collaborative coding proficiency but also advocates for a shift towards prioritizing the development of social intelligence in AI agents over individual capability enhancement. The study sheds light on the complexities inherent in fostering effective teamwork among AI entities and underscores the importance of addressing these challenges to enhance their collaborative capabilities moving forward.
Created on 01 Jul. 2026

Assess the quality of the AI-generated content by voting

Score: 0

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

Similar papers summarized with our AI tools

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.