Pinterest engineered a significant upgrade to their ads lightweight ranking system by migrating two-tower models to GPU serving. This shift enabled the adoption of a more complex MMOE-DCN architecture, improving prediction accuracy and efficiency. The article details the architectural evolution, optimizations for GPU training, and the observed performance gains in both offline and online metrics.
Read original on Pinterest EngineeringPinterest's ads recommendation system utilizes a lightweight ranking stage to efficiently filter candidate ads before more complex downstream models process them. This stage is critical for balancing prediction accuracy with serving latency. Historically, their two-tower models for engagement prediction, which compute Pin (ad) embeddings offline and query (user) embeddings in real-time, were served entirely on CPUs. The recent migration to GPU serving marks a significant evolution in their machine learning infrastructure.
The core of the architectural upgrade involves transitioning from a Multi-Task Multi-Domain (MTMD) model to a more sophisticated Multi-gate Mixture-of-Experts (MMOE) with Deep & Cross Networks (DCN) design. The MTMD model relied on domain-specific modules, whereas the MMOE architecture effectively handles multi-domain and multi-task challenges without explicit domain modules by employing multiple 'experts' with MLP gating. Each expert within their MMOE model incorporates both full-rank and low-rank DCN layers, allowing for deeper feature interactions while managing model complexity.
Why MMOE-DCN?
MMOE is particularly effective in multi-task learning scenarios where different tasks (e.g., click prediction, conversion prediction) might share some underlying features but also require task-specific modeling. By using multiple experts and a gating mechanism, it allows the model to learn both shared and task-specific patterns more effectively than a single-expert approach. DCN layers, on the other B hand, are designed to capture explicit and implicit feature interactions, which are crucial for high-dimensional sparse data typical in recommendation systems.
The increased complexity and size of the new MMOE-DCN model, coupled with large training datasets, necessitated significant optimizations to maintain training efficiency. Key improvements included:
These optimizations were crucial in achieving the reported 5-10% reduction in offline loss and substantial improvements in online metrics like Cost-Per-Click (CPC) and Click-Through Rate (CTR). The segmentation of standard and shopping ad scenarios, along with training on relevant data, further reduced loss and doubled model iteration speed, highlighting the importance of data strategy alongside model and infrastructure improvements.