Second-Gen Xe-HPC Accelerator to Succeed Ponte Vecchio
With ISC Excessive Efficiency 2022 going down this week in Hamburg, Germany, Intel is utilizing the primary in-person model of the occasion in 3 years to supply an replace to the state of their excessive efficiency/supercomputer silicon plans. The large information out of the present this yr is that Intel is naming the successor to the Ponte Vecchio accelerator, which the corporate is now disclosing as Rialto Bridge.
Beforehand showing on Intel’s roadmaps as “Ponte Vecchio Subsequent”, Intel’s GPU groups have been pipelining the event of Ponte’s successor at the same time as the primary giant set up of Ponte itself (the Aurora Supercomputer) continues to be being stood up. As a part of the corporate’s 3 yr (ish) roadmap that results in CPUs and accelerators converging with the Falcon Shores XPU, Rialto Bridge is the half that may, in the event you’ll pardon the pun, bridge the hole between Ponte and Falcon, providing an evolution of Ponte’s design that’s making use of newer applied sciences and manufacturing processes.
Whereas Intel isn’t providing a totally detailed technical breakdown this early within the course of, at a excessive stage the corporate is speaking a bit about specs, in addition to offering a render of the long run chip that removes all doubt that it’s a Ponte successor, showcasing that it’s comprised of dozens of tiles/chiplets in the identical structure as Ponte. The largest change that Intel is speaking about as we speak is that they’ll be increasing the entire variety of Xe compute cores from 128 on Ponte to a most of 160 on Rialto Bridge – presumably by growing the variety of Xe cores in every compute tile.
Absent any concrete particulars on the manufacturing aspect of issues, Intel is at the very least confirming that Rialto will use newer manufacturing nodes for its development, changing its present mixture of TSMC N7 (Hyperlink Tile), TSMC N5 (Compute), and Intel 7 (Cache & Base) components. The Intel 4 course of is anticipated to return on-line this yr, so utilizing that to improve the Base and Cache would make sense. Ideally, Intel would additionally like to leap ahead on course of nodes for the compute tiles as effectively, probably through the use of this chance to maneuver manufacturing of these tiles to Intel 4 – although we wouldn’t rely out TSMC N4, both.
With that mentioned, on the threat of studying an excessive amount of right into a single renderer, Rialto has one noticeable distinction from Ponte in relation to the compute cores: whereas Ponte used pairs of compute cores with a cache tile in between, Rialto at first look would appear to be utilizing monolithic slabs. This means that Intel has opted to combine the Rambo cache on-die with the compute tiles, and that they’re prepared to fab fewer, bigger compute tiles. This does lend some credence to the concept Intel is taking up compute tile manufacturing (since they already make the cache tiles), however we’ll must see simply what Intel pronounces afterward.
Apparently, Intel can also be promising extra I/O bandwidth for Rialto – although once more, this can be a very high-level (and unspecific) element. Ponte is already one of many first merchandise delivery with PCIe 5.0 connectivity, and with PCIe 6.0 {hardware} nonetheless a bit off, this can be extra about on-chip bandwidth than off-chip bandwidth, or concerning the quantity of bandwidth out there between accelerators utilizing Intel’s Xe Hyperlink interconnect.
HBM3 can also be a shoe-in for Intel’s next-generation accelerator, on condition that it’s already going into accelerators delivery this yr. HPC accelerators nearly dwell and die based mostly on reminiscence bandwidth, so we anticipate that it might be the very first thing Intel checked out for Rialto. And it might be in line with Intel’s awkwardly phrased “Extra GT/s” since reminiscence bandwidth is usually measured in gigatransfers.
Lastly, Intel is stating that Rialto can be based mostly round a more recent model of the Open Accelerator Module (OAM) socket specification, which is especially notable because the subsequent model of OAM has but to be introduced. Absent extra particulars, the most important differentiating issue appears to be supported energy – whereas OAM 1.x permits for modules to attract as much as 700 Watts, Intel is speaking about doing as much as 800 Watts on a Rialto module. Which, for higher or worse, is in line with the rise in energy consumption for the best performing variations of the following era of HPC accelerators, and is an enormous issue within the shift to liquid and immersion cooling for high-end {hardware}.
Compute GPU Accelerator Comparability | |||
AnandTech | Intel | Intel | NVIDIA |
Product | Rialto Bridge | Ponte Vecchio | H100 80GB |
Structure | Xe-HPC | Xe-HPC | Ampere |
Transistors | ? | 100 B | 80 B |
Tiles (inc HBM) | 31? | 47 | 6 + 1 spare |
Compute Items | 160 | 128 | 132 |
Matrix Cores | 1280? | 1024 | 528 |
L2 / L3 | ? | 2 x 204MB | 50MB |
VRAM Capability | ? | 128 GB | 80 GB |
VRAM Sort | HBM3? | 8 x HBM2e | 5 x HBM3 |
VRAM Width | ? | 8192-bit | 5120-bit |
VRAM Bandwidth | ? | ? | 3.0 TB/s |
Chip-to-Chip Whole BW | ? | 64 x 11.25 GB/s (4×16 90G SERDES) |
18 x 50 GB/s |
CPU Coherency | Sure | Sure | With NVLink 4 |
Manufacturing | ? | Intel 7 TSMC N7 TSMC N5 |
TSMC N4 |
Type Components | OAM 2.0 (800W) | OAM (600W) | SXM4 (400W*) |
Launch Date | Mid-2023 (Sampling) | 2022 | 2022 |
*Some Customized deployments go as much as 600W |
Total, Intel is concentrating on a 30% enhance in “software stage” efficiency with Rialto bridge. Which at first blush isn’t an enormous achieve, however it’s additionally for an element that’s popping out round a yr after the unique Ponte Vecchio. The 25% enhance within the variety of Xe cores signifies that most of this efficiency uplift must be delivered by the extra {hardware} versus clockspeed adjustments, however since Intel is quoting real-world efficiency expectations versus simply theoretical throughput, we wouldn’t be too shocked if Rialto’s on-paper specs had been a bit richer nonetheless. Intel can also be promising that Rialto must be extra environment friendly than Ponte, which at face worth is an inexpensive declare since efficiency must be going up quicker than energy consumption.
Per Intel’s roadmap, the plan is to have Rialto Bridge begin sampling in mid-2023. Given Intel’s troubles getting Ponte Vecchio out on time – you continue to can’t get it until you’re Aurora – this is able to be a surprisingly fast turnaround time for Intel. However on the identical time, since these are pipelined designs with a really sturdy architectural similarity, ideally Intel won’t expertise practically as many teething issues with Rialto as they’ve Ponte. However as at all times, we’ll see what really occurs subsequent yr when Intel is nearer to delivering their subsequent accelerator.
All Roads Result in Falcon Shores
With the addition of Rialto Bridge to Intel’s HPC plans, the corporate’s present silicon roadmap seems like the next:
Each the HBM-equipped Xeon and HPC accelerator traces are set to merge in 2024 with Intel’s first versatile XPU, Falcon Shores. Falcon Shores was first introduced at Intel’s winter investor assembly earlier this yr, and can be Intel’s first product that takes high-performance CPU and GPU tiles to their logical conclusion by permitting for a configurable variety of every tile kind. In consequence, Falcon Shores encompasses not solely combined CPU/GPU designs, but additionally (comparatively) pure CPU and GPU designs, which is why it’s the successor to each Intel’s HPC CPUs and HPC GPUs.
For as we speak’s occasion, Intel isn’t providing any additional particulars on Falcon Shores – so the corporate continues to be speaking about concentrating on 5x will increase in every part from vitality effectivity to compute density and reminiscence bandwidth. How they intend to perform that, apart from counting on their deliberate packaging and shared reminiscence applied sciences, stays to be seen. However this replace does provide a greater image of the place Falcon Shores will match into Intel’s product roadmaps, by offering a take a look at how the present HBM-Xeon and Xe-HPC merchandise will merge into it.
Finally, Falcon Shores stays as Intel’s energy play for the HPC business. The corporate is betting that with the ability to ship a tightly built-in (however nonetheless tiled and versatile) expertise with a singular API for all can be what provides them an edge within the HPC market, placing them forward of conventional GPU-based accelerators. And, if they will ship on these plans, then 2024 is shaping as much as be a really fascinating yr within the high-performance computing business.