🔵Meta Engineering·February 9, 2026

Meta's Backend Aggregation (BAG) for Gigawatt-Scale AI Clusters

Meta's Backend Aggregation (BAG) is a super spine network layer designed to interconnect thousands of GPUs across multiple data centers and regions, enabling gigawatt-scale AI clusters like Prometheus. This architecture facilitates high-capacity, resilient networking, allowing for the creation of massive, distributed compute resources for AI workloads. The design emphasizes modularity, advanced routing, and robust topologies to achieve unprecedented scale and reliability.

Cloud & Infrastructure Distributed Systems Performance & Scaling

Read original on Meta Engineering

The article details Backend Aggregation (BAG), a critical networking component in Meta's strategy to build and operate immense AI clusters. BAG functions as a centralized Ethernet-based super spine network layer, primarily responsible for interconnecting multiple spine layer fabrics across various data centers and regions. This design enables the pooling of thousands of GPUs into a single, logical, gigawatt-scale AI cluster, such as Prometheus.

The Role of Backend Aggregation in Mega AI Clusters

BAG acts as the aggregation point between regional networks and Meta’s backbone, essential for creating 'mega' AI clusters. It's engineered to support immense bandwidth, with inter-BAG capacities reaching the petabit range (e.g., 16-48 Pbps per region pair). The distributed nature of BAG layers regionally allows for the interconnection of tens of thousands of GPUs, addressing the critical challenge of scaling compute resources across geographical boundaries.

Interconnection Topologies and Fabrics

BAG layers are distributed strategically across regions, connecting to different L2 fabrics like Disaggregated Schedule Fabric (DSF) and Non-Scheduled Fabric (NSF). Inter-BAG connectivity uses either planar or spread connection topologies. Planar offers simplified management but concentrates failure domains, while spread enhances path diversity and resilience by distributing links across multiple BAG switches/planes. Careful oversubscription management (e.g., 4.5:1 from L2 to BAG) balances scale and performance.

💡

Design Consideration: Topology Choice

The choice between planar and spread topologies highlights a common system design trade-off: simplicity vs. resilience. Planar is easier to manage but less fault-tolerant, whereas spread offers greater resilience at the cost of increased complexity in setup and management. Architects must weigh these factors based on criticality and operational capabilities.

Modular Chassis with Jericho3 (J3) ASICs: Provides high-capacity (432x800G ports) and scalability.
eBGP with Link Bandwidth Attributes: Enables Unequal Cost Multipath (UCMP) for efficient load balancing and robust failure handling.
MACsec Security: Secures BAG-to-BAG connections.
Deep Buffer Switches: Utilized for longer BAG-to-BAG cable distances to support lossless congestion control (e.g., PFC), crucial for shallow buffer NSF switches at the L2 edge.

Resilience and Failure Mitigation

The network design for BAG meticulously addresses resilience through port striping, IP addressing schemes, and comprehensive failure domain analysis at various levels (BAG, data hall, power distribution). Strategies like draining affected BAG planes and conditional route aggregation are employed to mitigate risks such as blackholing, ensuring high availability even at extreme scales.

AI InfrastructureNetworkingData CenterScalabilityInterconnectionResilienceGigawatt-ScaleMeta

Comments

Loading comments...

Architecture Design

Design this yourself

Design a highly scalable and resilient network architecture to interconnect tens of thousands of GPUs across multiple data centers, similar to Meta's Backend Aggregation (BAG) system. Consider the trade-offs between different interconnection topologies, strategies for load balancing, failure mitigation, and hardware selection for petabit-scale bandwidth requirements.