This article details Uber's architectural evolution from a LAMP stack monolith to a globally scaled microservices platform, addressing challenges in real-time data processing, concurrency, and data storage. It highlights key technological decisions and custom-built tools for managing complexity inherent in massive distributed systems.
Read original on High ScalabilityUber's scaling journey began with a simple LAMP stack in 2009, which quickly faced limitations in handling concurrency and real-time demands. The initial move towards scalability involved adopting Node.js for real-time dispatch and Python for business logic, creating a 'two monolith' architecture with MongoDB/Redis and PostgreSQL. This setup allowed for initial growth and team separation, but the 'API' monolith eventually became a bottleneck due to growing features and conflicting deployments.
Around 2013, Uber embraced a microservice architecture to support its next phase of rapid growth. This pattern allowed for dedicated services per domain, each potentially using different languages and databases. While solving feature conflicts and enabling faster development, microservices introduced significant operational complexity. To mitigate this, Uber developed several internal tools:
A major scaling challenge arose from their single PostgreSQL database, which became a bottleneck for performance, scalability, and availability, especially for rapidly growing trip data. To address this existential threat, Uber developed Schemaless, an append-only, sparse three-dimensional persistent hash map built on MySQL, similar to Google's Bigtable. This model facilitated horizontal scaling by partitioning rows across shards, crucial for handling massive trip data volumes. For other data needs, Cassandra was adopted.
Architectural Insight: Data Specialization
The decision to move from a single relational database to specialized data stores (Schemaless for trips, Cassandra for others) highlights a critical system design principle: matching the data store technology to the specific access patterns and scalability requirements of the data. Relational databases excel for transactional integrity, but distributed NoSQL solutions are often superior for massive scale and high write throughput, especially with append-only models.
The original dispatch monolith, handling both matching logic and acting as a proxy, was split into a Real-Time API Gateway (RTAPI) and a re-written dispatch service. RTAPI, built with Node.js, provided a flexible real-time layer for mobile requests. The new dispatch system was designed for complex matching scenarios, incorporating advanced optimizations and a geospatial index using Google's S2 library (later H3) for sharding by city. For stateful Node.js services, Ringpop, a gossip-protocol based system, was developed to share geospatial and supply positioning data efficiently across a distributed system.