OpenAI opens up MRC networking design for large AI training clusters

AMD data center image used for OpenAI MRC networking coverage.
MRC networking

OpenAI has published a new networking design called MRC and contributed it through the Open Compute Project, arguing that multi-path routing and simpler control planes can make giant GPU clusters more resilient and efficient.

# OpenAI opens up MRC networking design for large AI training clusters

## Opening summary

OpenAI has published a detailed look at MRC, short for Multipath Reliable Connection, a networking design the company says helps giant AI training clusters stay productive even when congestion or hardware failures would normally stall synchronized GPU workloads. The company says MRC is already in use across some of its largest NVIDIA GB200 systems and is being contributed through the Open Compute Project so partners and other builders can work from the same design.

## Main article

The core pitch is that conventional single-path networking becomes too fragile and too wasteful at the scale of modern frontier training. OpenAI says MRC spreads a single transfer across many paths, reacts to packet loss or failure in microseconds, and reduces reliance on dynamic routing by using source-routing techniques instead. In practical terms, the goal is not just peak bandwidth, but more predictable step times so expensive accelerators do not sit idle while the network recovers from routine failures.

OpenAI also argues that the design helps simplify the shape of very large clusters. By splitting high-bandwidth links across multiple planes and using path diversity aggressively, the company says it can reduce tiers, power draw, and component count while preserving resilience. AMD’s parallel post supports that framing, presenting MRC as a production-minded networking approach for AI infrastructure where congestion control, adaptability, and real-world operational behavior matter more than idealized benchmark numbers alone.

That makes the story notable beyond OpenAI itself. Frontier AI competition is increasingly constrained by how efficiently companies can turn giant pools of GPUs into a reliable training machine, and networking is now one of the places where that battle is being fought in the open. MRC is less about consumer-facing AI features than about the plumbing that decides whether the next generation of models can be trained economically at all.

## Why it matters

This matters because AI scaling bottlenecks are shifting from chips alone to the systems that connect and coordinate them. If OpenAI and its partners can make multi-hundred-thousand-GPU training fabrics more resilient with fewer power and routing penalties, that changes both the cost curve and the operational tempo of frontier-model development.

## Source notes

- Verified against OpenAI’s official MRC post, including the protocol description, deployment claims, and OCP contribution language. - Verified against AMD’s companion post, which confirms MRC positioning as an open, production-oriented AI networking contribution and describes AMD’s implementation role. - The article keeps deployment and standardization claims scoped to what the two source documents actually say.

Sources: https://openai.com/index/mrc-supercomputer-networking/ · https://www.amd.com/en/blogs/2026/amd-advances-ai-networking-at-scale-with-mrc.html
SEO keyphrases: OpenAI MRC, AI supercomputer networking, Multipath Reliable Connection

Join the conversation