📈High Scalability·March 26, 2024

Hotstar's Real-time Emoji and Voting System Architecture

This article details Hotstar's journey in building an in-house, scalable system for real-time emoji reactions and live voting, moving away from a third-party service. It highlights architectural decisions around asynchronous processing, message queuing with Kafka, and stream processing with Spark to handle billions of user interactions during live events.

Distributed Systems Performance & Scaling Case Studies & Postmortems

Read original on High Scalability

Hotstar, a major streaming platform, faced the challenge of capturing and displaying billions of real-time emoji reactions and votes during live events. Initially relying on a third-party service, they encountered performance, stability, and cost issues, leading to the decision to build an in-house solution. This case study demonstrates how a highly interactive social feed component can be scaled for massive concurrent user engagement.

Key Design Principles for High-Scale Interactive Features

Scalability: The system was designed for horizontal scalability using load balancers and auto-scaling groups, enabling it to handle fluctuating traffic from millions of simultaneous users.
Decomposition: Breaking the system into independent, smaller components allowed for individual scaling and easier management of specific functionalities.
Asynchronous Processing: Crucial for high concurrency, asynchronous processing prevents resource blocking, ensuring that client requests are handled efficiently without waiting for heavy backend operations to complete.

Asynchronous API Handling and Message Queues

Client-submitted emojis arrive via HTTP API. To maintain low latency and prevent client connections from being held up, heavy processing is offloaded to an asynchronous pipeline. Hotstar leverages a custom data platform, Knol (built on Kafka), for high-throughput, low-latency, and highly available message queuing.

💡

Synchronous vs. Asynchronous Writes to Queue

The article highlights a critical trade-off: synchronous writes guarantee data persistence but introduce higher latency, while asynchronous writes (buffering locally before flushing to Kafka) offer extremely low latency but carry a small risk of data loss if not handled robustly. Hotstar prioritized low latency for emojis, accepting the rare data loss possibility, while acknowledging that transactional data would require a synchronous approach.

For asynchronous message production, Golang's Goroutines and Channels are used. Messages are buffered in a channel, and a background Goroutine periodically flushes them to Kafka. Configurations like flush interval (e.g., 500ms) and max messages per request (e.g., 20,000) are tuned for optimal performance.

Real-time Stream Processing with Spark

To aggregate emoji data and derive audience mood in near real-time, Hotstar implemented a Spark streaming job. This job consumes data from Kafka, computes aggregates over small micro-batches (e.g., 2 seconds), and writes the computed results to another Kafka queue. Spark was chosen for its strong support for micro-batching and aggregations, along with robust community support.

The final step involves a Python-based Kafka consumer which processes the aggregated data, normalizes it, and sends the top (most popular) emojis to Hotstar's in-house real-time PubSub infrastructure. This PubSub system then delivers the updates to client applications, enabling dynamic emoji animations and real-time social feed updates.

KafkaSpark StreamingGoAsynchronous ProcessingReal-time DataScalabilityMicroservicesSystem Architecture

Comments

Loading comments...

Architecture Design

Design this yourself

Design a real-time interactive social feed system capable of handling billions of emoji reactions and votes during live events, similar to Hotstar's architecture. Your design should emphasize horizontal scalability, asynchronous processing with message queues (Kafka), real-time stream processing for aggregations (Spark), and a PubSub mechanism for delivering updates to clients with low latency and high availability.

Other design angles

· Design only the real-time emoji aggregation and distribution service, assuming existing API and PubSub layers.· Design a generic voting and polling platform that can be integrated into various applications, focusing on data integrity and high throughput.· Compare and contrast an event-driven, serverless architecture for this system against the described Kafka/Spark-based approach.