With Nvidia’s RTX 50 sequence launch at CES earlier this month and all the following critiques of the RTX 5090, particularly in-depth ones like our Dave’s, laying out all of the specs, everyone knows that the largest Blackwell chip is one critically large GPU. And because of a brand new die shot of the processor, we are able to now feast our eyes on all these shaders and cache.
Creating an in depth die shot of any processor is not easy. It takes many failed makes an attempt, involving quite a few cracked chips and pores and skin burns, to good the method. If that chip simply so occurs to be an Nvidia GB202, the GPU powering the GeForce RTX 5090, then there are just a few different boundaries to beat, particularly getting your fingers on one and being keen to sacrifice a $2,000+ graphics card for the sake of an image.
GB202 Dieshot/5090 DieshotThanks By@ASUS Tony 俞元麟 by Chip@万扯淡 by Dieshot@Kurnalsalts StructurePhoto1 GB202 DieshotPhoto2 AD102 vs GB202 full Pixel Photo pls take part Kurnal’s Telegram workforce pic.twitter.com/pny7bvCs5jJanuary 25, 2025
Enter Tony Yu, common supervisor of Asus China and all-round high chap, to save lots of the day, sharing a high-resolution picture of the GB202 die thank X consumer Kurnal managed to seize maintain of after which helpfully label all the important thing elements (by way of Tom’s Hardware).
While the picture itself does not reveal any main surprises, as Nvidia has caught with the identical elementary design structure for a few years now, it does that the engineers needed to make some fascinating choices with the intention to get all the things to suit inside the die’s bodily dimensions.
For instance, should you take a look at the GB202 and examine it to the AD102 (the RTX 4090’s GPU), you may see that the entire logic blocks for the NVENC video encoders and decoders have moved from the underside to the very center of the chip.
The cause for that is twofold: firstly, the GB202 sports activities three encoders and two decoders, to the AD102’s two and one respectively, and it additionally has an aggregated 512-bit reminiscence bus. If Nvidia had saved the NVENC blocks all on the backside, then these 16 bodily reminiscence interfaces (PHYs) would have made the die very lengthy/tall. Perhaps too tall.
Something else we are able to clearly see is the all that L2 cache within the very centre of the die. Where AMD makes use of a quick however advanced multi-level cache hierarchy, Nvidia takes an easier strategy, leading to a predominant L1 cache for every SM (Streaming Multiprocessor) after which a hulking L2 (last-level) cache, in addition to some smaller ones dotted about within the SMs.
Other than that, the design is fairly easy. The full die contains 12 GPCs (Graphics Processing Cluster), every sporting its personal ‘Raster Engine’, also called a ROPs cluster. Those GPCs are organised into eight TPCs (Texture Processing Units) and inside every of these, you may discover two SMs—these home 128 CUDA cores apiece, for a grand complete of 24,576 shader items.
But for all its massiveness, it isn’t the largest chip Nvidia has ever stuffed right into a gaming graphics card. It is by way of transistor and shader depend however not by way of bodily dimensions. With an space of 750 mm2, the GB202 is 23% bigger than the AD102 (609 mm2) and 19% bigger than the GA102 (RTX 3090, 628 mm2).
However, it is 0.5% smaller than the TU102 (754 mm2), the GPU within the RTX 2080 Ti, and eight% smaller than the GV100 (815 mm2). The latter, based mostly on the Volta structure, is not actually a gaming GPU, however for some time, Nvidia marketed it as such. The Titan V was very a lot the ‘RTX 5090’ of its period (2017), not least due to its $2,999 price ticket.
A bit over 800 mm2 is about as giant as a single die can go, as a result of reticule restrict, however the GB202 is not all that far off. Whether the subsequent generations of GPUs are this large stays to be seen but when that is the final hurrah for monstrously large, monolithic chips in gaming graphics playing cards, earlier than switching to tiled or stacked chiplets, then not less than we’ve this beautiful die shot to stare at and attempt to spot when the RT and Tensor cores may be amidst within the ocean of transistors.