Today is AMD’s second big news event of the month (third if you count their record earnings call this week), and one that is actually in many ways more interesting than the Ryzen 5000 lineup announcement earlier this month.
While the Ryzen announcement was a continuation of the last several years of the CPU market, with AMD finally climbing atop the mountain (allegedly, still have to wait until next week for third-party benchmarks), the Radeon announcement is far more interesting because the Radeon Technologies Group has a…well, a bad history as of late.
AMD (and ATI before their acquisition) was a strong player in the graphics market for decades. In the nineties, they made excellent cards under the Rage branding, and then competed with and often beat Nvidia in the GPU space in the early 2000s with the Radeon lineup – the branding, much like Geforce, continuing from that era through to today. What happened to AMD’s graphics division is basically a story that amounts to shifting priorities, mismanagement, and a rebuilding phase that has seen the division starved of research and development funds as the CPU division has rebuilt.
And well, Ryzen has shown that AMD’s CPU team is back to the Athlon form, firing on all cylinders and producing compelling products that trade blows with Intel and sometimes even come out on top, and with that has come a cash infusion from happy consumers at all tiers – DIY enthusiast spending, workstation and datacenter customers, all the way up to a large number of supercomputer contracts for AMD silicon, both CPU and GPU.
AMD’s gaming GPU lineup has been hampered since around 2011 by the use of Graphics Core Next, or GCN. When AMD was winning efficiency wars with Nvidia around this era, they made a choice to use a standard design with GCN and moved to smaller silicon dies, producing, usually, two silicon designs targeted at their best market segments and then offering cut-downs for segmentation (a standard practice in all computer chips) or, at the high-end, offering dual-GPU solutions via either dual-GPU cards (like the Radeon HD 6990) or Crossfire solutions with multiple of their cards. In theory, it was smart, but it also left a huge loophole – Nvidia could simply produce larger chips and fight back at the high-end, a market segment that isn’t as popularly sold to, but one that often creates influence which taints comparisons all the way down. This is, essentially, what has doomed AMD for much of the last decade. Nvidia almost always wins the high-end, high-price segment, and then people think that the same is true all the way down the stack, when it clearly isn’t based on benchmarks, buy Nvidia, and they win. AMD had some bangers in this era too (the Fury cards were excellent and the top of that stack could win against the 980 from Nvidia) but Nvidia always had a higher card in reserve – this is how the Titan lineup started, in fact.
For much of the last 5 years, AMD has basically bowed out of competing at the high end. Most of their graphics sales are mid-range parts (the entire Polaris lineup of the RX 480 > 580 > 590 was this, as is the current RX 5700 XT) and rare attempts at fighting the high-end which, plagued with problems, fail to win (Vega 64, Radeon VII). Those high-end competitors have fascinating stories I don’t want to clutter this post with, so you should definitely look at some stories and videos on them!
Today’s announcement is supposed to mark AMD’s return to high-end competition. At the end of the Ryzen event earlier this month, AMD CEO Dr. Lisa Su showed some benchmarks and gameplay footage of current titles running on an unnamed Radeon RX 6000 card, and the numbers provided were great. For the most part, the card was neck and neck with the Geforce RTX 3080 that just recently launched, which is great news. But the rumor mill has been churning some interesting news and it is worth discussing all of it, as there have not been as many leaks about these parts as the Ampere cards from Nvidia had.
RDNA2 – The First Fully New GPU Architecture from AMD in A Decade: Yes, last year’s RX 5700 XT and company were based on RDNA, a redesign of the company’s aging GCN. GCN was a GPU design that was built for compute horsepower and happened to be capable of gaming, and because of that, it met with several limitations. Its origin from the “no big dies” era of Radeon meant that it didn’t scale well past 64 compute units (nerd talk: the GCN scheduler issues instructions for execution in a “wavefront” which maxes out at 64 threads, but a mismatch in the SIMD unit width causes a full thread issue to be…problematic), which is why AMD’s cards have for years now slammed into this barrier and been constrained by it (the Fury lineup in 2015 had already reached 64 CUs and only it and Vega 64 have been released with 64 CUs since GCN started!). RDNA made some massive tweaks to scheduling and the GPU frontend to do away with a lot of issues of utilization in GCN, but held off on other larger changes.
RDNA2 is supposed to have larger changes in store – being the first RDNA card to exceed the 64 CU count and to show what the new design is truly capable of is a big part of that, but it also adds additional features like AMD’s ray-tracing solution and some strategic silicon use to manage memory bandwidth better (more on both in a moment). Let’s start with CUs, however…
Radeon RX 6000 Cards Have How Many CUs?: The magic numbers for the top two cards AMD is set to announce is 80 CUs and 72 CUs. Both represent a threshold crossing moment for Radeon as they finally break free of the limits of GCN and are able to mix solid architectural efficiency gains with more raw power behind them. That news on its own is just a pinned point for later, so let’s move on.
Raytracing from AMD: AMD finally gets into the real-time ray-tracing/hybrid rendering scene with Radeon RX 6000 cards. AMD’s solution is a mix of methodologies, between Nvidia’s RTX family of technologies (full hardware for everything, raytracing sequestered into its own segment of the GPU die) and the original Microsoft DXR implementation from Spring 2018 (shader cores fully math out raytracing with no other helping hand). AMD supposedly has hardware BVH acceleration to do the math of what light rays hit what objects at what point in their trajectory, but then relies on the shader cores to math out what that results in as far as scene composition. This has a major advantage over Nvidia’s RTX solution (tensor and RT cores both take a lot of GPU die space that doesn’t have any impact on rasterization performance but is crucial to RTX’s functioning) but also comes at a disadvantage (when the AMD solution needs to finish scene composition, it has to engage shader resources for this which reduces shader units available for other graphics work). The rumors back up this idea – in titles where AMD has a performance edge over the RTX 3080, it loses in raytracing. Real-time ray-tracing is still a niche feature, both because hardware support has been slow to rollout and reach high enough adoption rates, but also because developers are not very keen on the work to implement it alongside full traditional rasterization (Microsoft’s Xbox Series X presentation at this year’s Hot Chips had some quotes to this effect). Nonetheless, it is good that AMD has done it and can now claim support for it – as having it in next-gen consoles as well as more PCs only increases the potential for it to be widely supported.
Memory System – 16 GB, 16 GB, 12 GB of GDDR6, but What Bus Width?: The rumor mill points to AMD rolling out three SKUs during the event – a top-end card, one slightly below that, and an upper-midrange card. The top two cards are said to be equipped with 16 GB of GDDR6 at the fastest speed it is currently available (16 Gbps) on a 256-bit bus, giving those cards an effective memory bandwidth of 512 GB/s, with the bottom of the 3 being a 192-bit memory bus with the same GDDR6 speed, resulting in 384 GB/s of bandwidth. This isn’t inherently good or bad, but it is somewhat limiting. Nvidia left these speeds behind in the Turing era at the high end, and the RTX 2080 Ti as well as the 3080 and 3090 both offer substantially more bandwidth. AMD’s GCN GPUs were always particularly memory bandwidth starved – it is why AMD is the first company that shipped an HBM-equipped graphics card to consumers, so this decision is puzzling. However, smart memory management technologies (texture compression, shader caching, etc) can help, as can one other thing for memory AMD has done here…
Infinity Cache: AMD loves the word “Infinity” in their branding so very much. Based on a patent application and an examination of the most recent AMD Linux drivers, one can see very clearly how this feature works. With the RX 6000 series, AMD has tucked in 128 MB of cache on-die to assist the memory subsystem. How it is supposed to work is a bit unknown right now, but we can suss out the basics. Cache on a CPU or GPU is used for recently accessed data, with whole methodologies designed around how that cache is filled, emptied, and otherwise used. For the Infinity Cache specifically, the idea seems to be that it is memory that sits outside of the GPU’s lower-level execution caches (L1, L2) but before the external GDDR6 memory. The theory, if it holds, is that data could be placed into this cache instead of main memory if it is bandwidth dependent or frequently utilized, leaving the memory controllers of the card to use the GDDR6 less often as it is both lower bandwidth and higher latency. Off the top of my head, it could be used as framebuffer, texture memory, to do some fast and cheap anti-aliasing (a form of which was done with the SRAM cache on the ATI GPU in the Xbox 360 over a decade ago!), but a big use that pops to mind for me is ray-tracing. Ray-tracing is memory bandwidth hungry, but more specifically, latency-sensitive. If AMD wants their RT solution to be a contender, it would go a long way to use the Infinity Cache for BVH calculation data, ray intersection results, and all of the necessary math that spawns out from that. Nvidia has been specifically feeding more bandwidth to their current-gen Ampere cards in part because RTX is demanding of it, but with a fully external memory solution, latency will remain a problem. A far faster, very low-latency local cache that could be used for that data would make a world of difference.
Marketing Spin – The Middle of the Three Cards To Be Announced Was Compared to the Geforce RTX 3080?: This one is particularly spicy, but the word is this – AMD specifically did not name the card used for the comparison with the 3080 at the end of their Ryzen event because it was the middle of the 3 SKUs pending announcement and they wanted a competitive edge. Now, reasoning-wise, I don’t fully buy that (both companies do corporate intelligence work-ups and I would fully expect that Nvidia would be at least somewhat aware of what is coming), but I do think it makes a big impression on gamers. Showing a card competing with the 3080 is something that a lot of us, myself included, didn’t necessarily expect, since the Radeon group has made having expectations into more opportunities for disappointment. However, by being tight-lipped about it this time, they’ve built a sort of hype that they can match the 3080 and do well. If this turns out to be true, then that means that the CU counts mentioned way above put AMD into a far better competitive scenario than we dared imagine – with the midrange of the 3 cards trading blows with the 3080, and a card with around 10% more CUs being theoretically capable of doing the same with the 3090. That is the real competitive dream – if AMD shows up today with a full stack solution capable of matching or beating Nvidia in every class including at the top, then they stand a much better chance of really landing with enthusiasts this generation!
Insane Clock Speeds: The consoles have provided wind-up for this, because Microsoft’s 52 CU solution with RDNA 2 runs at around 1.8 GHz steady state, while the PS5’s 36 CU solution runs up to a blistering 2.23 GHz! A lot of folks speculated that this wouldn’t neatly translate to the desktop because these are custom solutions with far fewer CUs, but at the same time, a desktop graphics card can burn nearly the power that either console does as a full system! Naturally, then, as that speculation has grown, reports from add-in board partners have started to paint a picture of insane clockspeeds, with one partner supposedly being able to clock the 72 CU card to 2.57 GHz! For comparison, the Ampere GPUs under ideal boost conditions can barely scrape against the 2 GHz wall, with only a few partner models able to clock past that. Clockspeed itself is not as relevant as it was in the golden years of my youth, since what each chip can accomplish in a single cycle varies wildly, but it is worth pointing out that if AMD’s card can match or beat the 3080/3090 at around PS5 clockspeeds, then partner cards with better thermal solutions reaching 2.5 GHz+ should be capable of even more (depending on how well the design scales with frequency, the answer from the PS5 being “very well.”).
So overall? The rumors seem pretty valid from what I can tell, just looking at where AMD would need to land to be competitive and the kinds of things they could do to set themselves apart. What I need to see tomorrow for my potential purchasing decision is some other stuff not in rumors – a more robust software ecosystem to match Nvidia, developer support on PC, but most importantly – I want AMD to openly and honestly tackle drivers and discuss the ways in which they’ve made strides there.
Given the dire availability situation on the Nvidia Ampere cards, and the expectation that they’ll continue like that into 2021, if AMD launches a competitive product with good pricing and availability that won’t have drivers hardlocking and blackscreening a system every so often, well, that is going to be a strong hand to play.