Uber's Failover Architecture: Reconciling Reliability and Efficiency in Hyperscale Microservice Infrastructure

AI-generated keywords: Global platform Uber Failover Architecture Service-Level Agreements (SLAs) Automated safeguards

AI-generated Key Points

  • Operating a global, real-time platform at Uber's scale requires resilient and cost-efficient infrastructure
  • Uber's Failover Architecture (UFA) replaces the costly 2x capacity model with a differentiated architecture aligned to business criticality
  • Critical services retain failover guarantees, while non-critical services opportunistically use failover buffer capacity reserved for critical services during steady state
  • UFA reduces steady-state provisioning from 2x to 1.3x, raising utilization from around 20% to approximately 30% while maintaining an availability rate of 99.97%
  • UFA has hardened over 4,000 unsafe dependencies and eliminated over one million CPU cores
  • Future extensions of UFA will expand beyond stateless services to offer differentiated SLAs for stateful services
  • Open directions include combining static analysis with generative AI, developing tools for certifying fail-open behavior at scale, and collaborating with cloud providers towards guaranteed elastic capacity at hyperscale
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Mayank Bansal, Milind Chabbi, Kenneth Bogh, Srikanth Prodduturi, Kevin Xu, Amit Kumar, David Bell, Ranjib Dey, Yufei Ren, Sachin Sharma, Juan Marcano, Shriniket Kale, Subhav Pradhan, Ivan Beschastnikh, Miguel Covarrubias, Chien-Chih Liao, Sandeep Koushik Sheshadri, Wen Luo, Kai Song, Ashish Samant, Sahil Rihan, Nimish Sheth, Uday Kiran Medisetty

License: CC BY 4.0

Abstract: Operating a global, real-time platform at Uber's scale requires infrastructure that is both resilient and cost-efficient. Historically, reliability was ensured through a costly 2x capacity model--each service provisioned to handle global traffic independently across two regions--leaving half the fleet idle. We present Uber's Failover Architecture (UFA), which replaces the uniform 2x model with a differentiated architecture aligned to business criticality. Critical services retain failover guarantees, while non-critical services opportunistically use failover buffer capacity reserved for critical services during steady state. During rare "full-peak" failovers, non-critical services are selectively preempted and rapidly restored, with differentiated Service-Level Agreements (SLAs) using on-demand capacity. Automated safeguards, including dependency analysis and regression gates, ensure critical services continue to function even while non-critical services are unavailable. The quantitative impact is significant: UFA reduces steady-state provisioning from 2x to 1.3x, raising utilization from ~20% to ~30% while sustaining 99.97% availability. To date, UFA has hardened over 4,000 unsafe dependencies, eliminated over one million CPU cores from a baseline of about four million cores.

Submitted to arXiv on 07 Mar. 2026

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2603.07345v1

Operating a global, real-time platform at Uber's scale requires infrastructure that is both resilient and cost-efficient. Historically, reliability was ensured through a costly 2x capacity model--each service provisioned to handle global traffic independently across two regions--leaving half the fleet idle. We present Uber's Failover Architecture (UFA), which replaces the uniform 2x model with a differentiated architecture aligned to business criticality. Critical services retain failover guarantees, while non-critical services opportunistically use failover buffer capacity reserved for critical services during steady state. During rare "full-peak" failovers, non-critical services are selectively preempted and rapidly restored, with differentiated Service-Level Agreements (SLAs) using on-demand capacity. Automated safeguards, including dependency analysis and regression gates, ensure critical services continue to function even while non-critical services are unavailable. The quantitative impact of UFA is significant: it reduces steady-state provisioning from 2x to 1.3x, raising utilization from around 20% to approximately 30% while sustaining an impressive availability rate of 99.97%. To date, UFA has hardened over 4,000 unsafe dependencies and eliminated over one million CPU cores from a baseline of about four million cores. Looking towards the future, UFA extensions will expand beyond stateless services to offer differentiated SLAs for stateful services. Several open directions remain in this space: combining static analysis with generative AI to automatically fix fail-close issues, developing general-purpose tools for certifying fail-open behavior at scale, and collaborating with cloud providers towards guaranteed elastic capacity at hyperscale. We would like to express our gratitude to the numerous individuals from various teams at Uber who have made invaluable contributions to this project. Special thanks go out to Abhishek Jha, Aditya Jain, Albert Greenberg, Arturo Bravo Rovirosa, Arun Krishnan, Christoffer Hansen, Darshil Kapadia, Deepanker Sachdeva, Egor Grishechko, Eric Chin and many others for their dedication and support throughout this endeavor. Additionally we extend our appreciation to David A. Maltz for his feedback and guidance as our paper shepherd. Source: from High Availability Architecture --- to --- in this space.
Created on 10 Mar. 2026

Assess the quality of the AI-generated content by voting

Score: 0

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

Similar papers summarized with our AI tools

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.