3DSTOR-News

Home News From SLI to NVLink: The Evolution of Multi-GPU Interconnect Technologies for Gaming and AI

From SLI to NVLink: The Evolution of Multi-GPU Interconnect Technologies for Gaming and AI

From its inception, the GPU has been a "lone warrior," but as game graphics skyrocketed and "computing power monsters" like AI and large models emerged, people gradually realized that one card just wasn't enough anymore!

Thus, figuring out how to make multiple GPUs "work as a team" became a crucial frontier in technological evolution.

The first to tackle this was 3DFx, which developed a technology called SLI, allowing two graphics cards to work together. Unfortunately, the timing wasn't right—the company didn't survive and went bankrupt in 2000.

However, NVIDIA took over this technology and officially launched SLI in 2004, allowing two GeForce 6800 Ultra cards to work together for gaming, delivering outstanding performance. It quickly became a hit among gamers.

But SLI was quite picky about its partners: it required GPUs of the same model, a high-power power supply, and suffered from noticeable communication latency and inflexible data sharing.

Seeing this, rival AMD immediately introduced CrossFire to compete. It had one appealing advantage: it didn’t force users to use the same model of AMD GPUs, saving money! Unfortunately, its software experience was often criticized as less stable than NVIDIA’s, and the setup was slightly more complicated.

In addition to this "multi-card collaboration," there was also a cult-favorite approach: dual-GPU graphics cards—soldering two GPUs onto the same board.

Space-saving, no need for a bridge—sounds perfect, doesn't it? Unfortunately, the heat output was staggering, earning them the nickname "desktop mini-suns," requiring powerful cooling solutions and driving up electricity bills. Eventually, due to high costs and technical challenges, they gradually became seen as "white elephant" products.

Although these technologies differ in approach, they share the same goal: to make graphics and gaming experiences more explosive. And their limitations have paved the way for the more powerful NVLink to emerge.

So, why was NVLink created?

The root cause lies in the classic "memory wall" problem in the von Neumann architecture: CPUs compute at lightning speed, but memory access lags behind.

Especially after GPU performance surged by a thousandfold within eight years, the traditional CPU-centric interconnect methods simply couldn't keep up.

AI training requires massive computing power, which a single GPU simply can't handle—it takes hundreds or even thousands of GPUs working together.

Thus, NVIDIA, struggling with bandwidth limitations, turned to IBM, which had an advantage in CPU bandwidth at the time, and together they developed the first generation of NVLink.

NVLink is not just a simple upgrade to SLI, but a complete overhaul of "how GPUs communicate." Compared to traditional PCIe, it has three standout features:

First, it supports mesh connectivity, allowing GPUs to connect directly in multiple points, making it more suitable for the complex data flows in data centers.

Second, it enables unified memory management, where multiple GPUs can share a memory pool without the need to constantly transfer data back and forth—particularly beneficial for large-scale model training.

Third, it offers ultra-low latency, as GPUs can directly read and write each other's memory without needing the CPU to "relay messages," significantly improving synchronization efficiency.

Since its debut in 2014, NVLink has evolved to its fifth generation. The bandwidth has surged from the initial 160GB/s to 1.8TB/s, and the number of linked GPUs has expanded from 4 to 18.

What's even more considerate is that while NVLink's bandwidth far surpasses PCIe, it's also more energy-efficient.

High-speed interconnectivity alone is not enough; efficiently organizing multiple GPUs is also a challenge.

To address this, NVIDIA introduced the NVSwitch chip in 2018, acting like a "GPU social hub" that enables full connectivity among 16 GPUs within a server, allowing each to communicate directly.

Later, they also launched the standalone NVLink switch, connecting GPUs across multiple servers into a high-speed network.

With solid capabilities under its belt, NVIDIA has started selling "complete system packages."

In 2016, NVIDIA gifted OpenAI its first DGX-1 supercomputer, equipped with 8 GPUs interconnected via NVLink, which directly accelerated the development of early large-scale models.

The DGX is a "turnkey" solution, ideal for major clients who prefer hassle-free deployment.

On the other hand, the HGX is like "graphics card LEGO," allowing manufacturers to customize configurations, making it a favorite among large cloud providers.

Currently, the most powerful GPU system is the NVIDIA GB300 NVL72: equipped with 72 Blackwell GPUs and 36 Grace CPUs, it delivers 10x the inference performance of its predecessor and even incorporates liquid cooling. Internally connected via fifth-gen NVLink, it boasts a staggering total bandwidth of 130TB/s, earning it the title of the "super engine" of AI computing.

Looking back, the evolution of GPU interconnectivity is a hardcore journey from "smooth gaming" to "supporting trillion-parameter large models."

The story of NVLink also teaches us: sometimes, no matter how strong an individual is, it's not as good as a team that connects well and communicates quickly.

·END·

Previous page Next page