📈High Scalability·March 14, 2024

Scaling Uber: From Monolith to Microservices and Global Scale

This article details Uber's architectural evolution from a LAMP stack monolith to a globally scaled microservices platform, addressing challenges in real-time data processing, concurrency, and data storage. It highlights key technological decisions and custom-built tools for managing complexity inherent in massive distributed systems.

Distributed Systems Microservices Databases & Storage

Read original on High Scalability

Uber's scaling journey began with a simple LAMP stack in 2009, which quickly faced limitations in handling concurrency and real-time demands. The initial move towards scalability involved adopting Node.js for real-time dispatch and Python for business logic, creating a 'two monolith' architecture with MongoDB/Redis and PostgreSQL. This setup allowed for initial growth and team separation, but the 'API' monolith eventually became a bottleneck due to growing features and conflicting deployments.

The Shift to Microservices

Around 2013, Uber embraced a microservice architecture to support its next phase of rapid growth. This pattern allowed for dedicated services per domain, each potentially using different languages and databases. While solving feature conflicts and enabling faster development, microservices introduced significant operational complexity. To mitigate this, Uber developed several internal tools:

Clay: A Python wrapper on Flask for standardized service frameworks, ensuring consistent monitoring, logging, and deployments.
TChannel over Hyperbahn: An in-house bi-directional RPC protocol for service discovery, resilience (fault tolerance, rate limiting, circuit breaking), and performance.
Apache Thrift (later gRPC and Protobuf): For well-defined RPC interfaces and stronger contracts between services.
Flipr: For feature flagging, controlled rollouts, and configuration management.
M3: A metrics platform for observability and Grafana dashboards, complemented by Nagios for alerting.
Merckx (later Jaeger): For distributed tracing across services, using Kafka for instrumentation data.

Database Scaling and Specialization

A major scaling challenge arose from their single PostgreSQL database, which became a bottleneck for performance, scalability, and availability, especially for rapidly growing trip data. To address this existential threat, Uber developed Schemaless, an append-only, sparse three-dimensional persistent hash map built on MySQL, similar to Google's Bigtable. This model facilitated horizontal scaling by partitioning rows across shards, crucial for handling massive trip data volumes. For other data needs, Cassandra was adopted.

💡

Architectural Insight: Data Specialization

The decision to move from a single relational database to specialized data stores (Schemaless for trips, Cassandra for others) highlights a critical system design principle: matching the data store technology to the specific access patterns and scalability requirements of the data. Relational databases excel for transactional integrity, but distributed NoSQL solutions are often superior for massive scale and high write throughput, especially with append-only models.

Evolving Real-time Systems and Geospatial Matching

The original dispatch monolith, handling both matching logic and acting as a proxy, was split into a Real-Time API Gateway (RTAPI) and a re-written dispatch service. RTAPI, built with Node.js, provided a flexible real-time layer for mobile requests. The new dispatch system was designed for complex matching scenarios, incorporating advanced optimizations and a geospatial index using Google's S2 library (later H3) for sharding by city. For stateful Node.js services, Ringpop, a gossip-protocol based system, was developed to share geospatial and supply positioning data efficiently across a distributed system.

UberScalabilityMicroservicesMonolithNode.jsPostgreSQLSchemalessDistributed Tracing

Comments

Loading comments...

Architecture Design

View Architecture

Design a globally scaled ride-sharing platform like Uber, detailing the architectural evolution from an initial monolith to a microservices-based system. Focus on how to handle real-time dispatching, geospatial indexing and matching, distributed data storage (including specialized data stores for trips), API gateway design, and the tools/patterns needed to manage operational complexity in a large-scale distributed environment.

Other design angles

· Design the real-time dispatch and geospatial matching subsystem for a ride-sharing platform, including considerations for dynamic pricing, driver availability, and surge detection.· Design a data storage strategy for a ride-sharing platform that must handle billions of trip records, real-time updates, and complex analytical queries, using both SQL and NoSQL databases.· Design an observability and resilience framework for a large microservices architecture, drawing lessons from Uber's tools like TChannel, M3, and Jaeger.