Sidenote: The Geforce RTX 3090, 8K Gaming, and The Coming GPU Wars

Topic the First: The Disappointing/Interesting Launch of the RTX 3090 and 8K Gaming Viability

Nvidia’s RTX 3090 launches today, at a staggering $1,500 cost. It is a massive triple-slot GPU with 24 GB of fast GDDR6X VRAM, and I’ll admit, I wanted one.

The past tense being the operative here.

Nvidia has had a clearly difficult marketing strategy with the Ampere lineup that has become the RTX 30-series cards. I speculated last week after the initial launch of the 3080 cards that had Nvidia not been threatened with competition from AMD, they’d have released the 3080 as a 3080 Ti that would have been marked up $500 for no discernable reason and delivered a dud generational leap just as bad as Turing RTX 20-series was, with the 3090 then being a “Titan” and sold at $2,500 like the last-gen Titan, offering marginally more performance.

Well, while review embargo for hard data is still 5 hours away as of this writing, Nvidia has done a few weird things in regards to the 3090.

First, they sampled the cards with $30,000 LG OLED TVs capable of 8k to two YouTube channels – MKBHD and Linus Tech Tips. Both, to their credit, clearly disclosed that it was a sponsored video and avoided overt editorialization in favor of Nvidia. To Nvidia’s credit, they clearly made something cool – both channels showed real 8k gameplay with high or even maxed out settings, and not every title required Nvidia’s DLSS scaling to accomplish this! However, the challenge with this demo is this – if the only displays capable of showing your product at its best costs 20x what the video card does, then it is clearly not in-reach of most consumers – the card itself is pricey for a video card, but also cheaper than the last generation Titan, so it has that going for it.

Secondly, though, today Nvidia released unlabeled comparative slides of the Titan RTX, the 3080, and the 3090, and revealed something that many of us following the news knew – that at the standard resolutions most of us can afford access to, the 3090 only represents a 10-15% performance uplift over the 3080, and around 50% over the Titan RTX. Chinese reviewers TecLab posted results comparing unnamed GPUs by price in Nvidia’s new lineup, which were obviously a reference to the 3080 and 3090, and showed this in their data with hard numbers.

That leads to an interesting question, then – what is the appeal of this card? Professional benchmark leaks like Blender show a better uplift at around 20% improved render times, but that still doesn’t line-up with a 114% price increase for most hobbyists – prosumer level types might be able to justify that, but most cannot. For me, this is the point where I got off the 3090 hype train and onto the 3080 one – although waiting for the rumored 20 GB versions coming in November.

So then Nvidia, knowing this, chose to take an interesting tack that relates back to the sponsored videos mentioned above. Knowing that the card can’t offer a sufficient increase over 4k performance of the 3080, and that it only looks worse the lower the resolutions go or the higher the clockspeeds on the 3080 go (to a point, it seems, at least), they set the 3090 apart by making a push for 8k gaming, as unrealistic as it is.

Now, I’m of two minds on this tactic. Firstly, it is obvious that Nvidia wanted to hold this data but was largely forced by a myriad of leaked benchmarks casting a shadow on the 3090 launch, and with the potential impact it would have when coupled with a release-day review embargo to disallow informed decision-making, opted to push harder in the 8k direction by admitting that the 3090 is not a card for your average 4k gamer, or even someone really trying to push 4k hard. This was counter-intuitive to their marketing around the memory increase to 10 GB on the 3080, as they let that tidbit sit and allowed people to extrapolate to how much better 4k could be with 24 GB of even faster memory. On the other hand, I think most people seriously considering the 3090, myself included, knew that it wasn’t going to be much more than 20% better at best, considering the number of CUDA cores over the 3080, a lack of applications or games needing that much shader power, decreases to clock speed compared to its lower-tier kin, and the fact that Nvidia shared no marketing benchmarks at the unveil on September 1st – not even a relative ranking of the card.

But is 8K gaming even worth it? Spoiler alert – not today. Here’s the deal – I’m a techie, and even while I realize that 8K is impractical for nearly everyone in 2020, I still believe that it is cool and that Nvidia even got a card that could reach 8K at 60 FPS in some titles with higher-tier settings is awesome. However, 8K60 isn’t going to succeed for a handful of small, simple reasons. Firstly, it is just far too expensive to justify to 99% of the planet. Hell, in my Xbox Series S analysis, I posited that the console made sense because a majority of Americans don’t even have a 4k display, and that should tell you everything about what the next step beyond is like in terms of adoption rates. Secondly, we’re in an era where gamers have become more discerning of smooth motion with their high visual fidelity. There is a growing number of high refresh rate 4k panels in the wild, and nearly every gamer I know that has bought a new monitor in the last 3 years has bought one with higher than 60 Hz refresh. My main display, purchased in 2018, is a 100 Hz panel, and I’m looking at moving to a 240 Hz one! This often isn’t coming at the cost of visual fidelity, thanks to the slow hardware in the last gen consoles due to the SOCs being designed during the dark times of AMD, so it currently has been a best of both world scenario – slowly-increasing visual fidelity with much higher frame rates, to the point that both next-gen platforms use HDMI 2.1 and target 4k120 as the standard (with the poor Xbox Series S scaling up a 1440p120 render target to match). It is so much the standard that Nvidia GPUs from Turing onward and AMD GPUs from Navi forward have technologies designed to reduce bottlenecks and workloads where possible, most notably with Variable Rate Shading but also with DLSS on the Nvidia side and the less-featured Radeon Image Sharpening on the AMD side.

Given that, 8k just doesn’t make sense for today, and thus, it would be difficult to recommend the RTX 3090 to any but the richest gamer or the most dedicated Blender user. For me, I’ll be waiting on the RTX 3080 20 GB models!

Topic the Second: Raytracing Models in Geforce and Radeon GPUs and Why AMD Might Win

Real-time raytracing, or hybrid rendering, became a reality in 2018 with Nvidia’s Turing architecture and the RTX 20-series cards based on it. Nvidia did what they typically do with new technologies expected to become standard – they raced to be first, pushed their own solution, and leveraged their financial and relational resources to push developers to using their methodology. The “RT” in “RTX” has been one of the least-leveraged parts of the technology, but it is generally the most impressive in terms of comparisons to current lighting technologies and the increase to visual fidelity.

Nvidia’s solution was born from a desire to repurpose hardware, though. While the purpose-made RT cores were new to Turing, the Volta datacenter GPU seen in the Titan V bore the first-generation of Nvidia’s Tensor Cores. Designed to accelerate AI, machine learning, and deep learning workloads (hence their introduction in a datacenter product), they created an interesting scenario for Nvidia. Nvidia could have continued to branch their silicon designs sharply from Volta forward, keeping Tensor Cores in the datacenter line while increasing CUDA core count solely in the desktop gaming lineup, or they could find a square peg to mash into a round hole and do that.

Thus, we got Tensor Cores on desktop. Rather than build a more customized gaming card, or just not use them on desktop, Nvidia had a solution and found a few problems that it could handle – namely, AI denoising of ray-traced images and the use of deep-learning based acceleration for resolution scaling to account for motion vectors without smearing, smudging, or awful artifacts of some sort. They then built RT cores around that, didn’t have to make them particularly strong as they don’t need to complete the scene to a noiseless ground truth, and was able to unify the product stack. The only exception is now in the opposite direction – A100 Ampere datacenter GPUs do not have raytracing hardware, but do have Tensor Cores and make strong use of them, in addition to a boatload of CUDA cores with the same improvements as the gaming cards (being able to issue FP32+INT32 or 2xFP32 in the same cycle).

The challenge this poses on desktop gaming and real-time raytracing is that the process has a bottleneck. Nvidia’s own Turing whitepaper (yes I read it, I know I’m a nerd, shut up already) points out the dilemma – Turing was made to be able to concurrently perform the RT operations and both integer and floating point shading, but before a frame can be finalized for output, it must be sent to a Tensor Core to be denoised and processed for final output if using RTX technologies (this applies for DLSS as well). The end result, as is evident from benchmarks of both RTX 20 and 30-series cards, is that RTX runs much slower than standard rendering – in both generations, a hit of close to 50% performance. It also means that if you are buying the cards to play non-RTX games, there is a substantial portion of the GPU silicon that you are never using.

Nvidia is trying to fix that by offering other technologies outside of graphics that can use these functional blocks to better effect. RTX Voice and the full Nvidia Broadcast suite is a bit of an oddball, until you realize that Nvidia is once-again enlisting Tensor Core acceleration for a workload that wouldn’t conventionally be the GPUs purview. The same goes for the upcoming RTX IO – it is bizarre on paper, but it makes sense when you consider that such a technology allows the GPU to be better utilized without a negative gaming performance impact, but also that in the case of RTX IO, it can help – Nvidia is mostly shut out of consoles after their dismal two-act saga in the original Xbox and the PS3, so other than their Tegra SOCs in the Switch, they have to adapt to console innovations. PS5 and XSX both have accelerated SSD technology that allows, in particular, fast and streaming loading of texture and other graphical data from an NVME SSD. Rather than allow the PC as a platform to fall behind (and thus their only gaming foothold), RTX IO allows them to, rather cleverly, replicate this functionality specifically.

But when RDNA2-based graphics cards ship from AMD late this year, they also have raytracing. However, AMD’s approach is rather clever in its own right and is interesting for the difference it portrays. AMD’s approach, based on a 2017 patent and the August 2020 Hot Chips conference, where Microsoft presented a full detail breakdown of the inner workings of RDNA2 via their Xbox Series X panel, is to meet somewhere between Nvidia and a full shader-based software implementation. AMD agrees with Nvidia that BVH acceleration requires dedicated hardware, and the Xbox Series X panel helpfully shows that AMD has added a ray-tracing core to each compute unit in their GPU design, which performs the ray-triangle intersection math to determine lighting output. However, AMD’s design then shops this out to the shader cores to finish the work, rather than using dedicated additional denoising hardware to do the job.

This approach, at first, flummoxed me, and I thought – “that’s going to have slowdowns!” But then I realized that it was stupid to think anyways, because Nvidia’s solution with two full sets of dedicated hardware, still slows frame performance by as much as half anyways! That was the ticket for me to start thinking about the advantages and disadvantages of each. Nvidia’s approach, as documented above, doesn’t need beefy RT cores because they need to do just enough math to get to a noisy image with barely enough detail, at which point the AI denoising happens and produces the rest of the image. AMD’s approach, in theory, will need stronger and more fully-featured ray-tracing cores, because while it could do similar AI denoising on the shader cores, the raw compute performance for complex matrix mathematics that Tensor Cores offer is far higher than a shader core, at the cost of being highly specialized for that work. What I think is great about what AMD appears to be doing is that by not cluttering the GPU die of RDNA2 with multiple logical blocks, some of which only serve specific workloads, they’ve instead appeared to have made a much larger die with more compute units (80-86 are the rumors for the top RDNA2 chips, while the Xbox Series X has 52 enabled and the PS5 has 36), and it seems that the answer for raytracing performance is that it will be lower than standard rasterization, but hopefully by a similar margin to RTX, if not better.

The Xbox system architect that presented at Hot Chips, for his part, seemed to think that raytracing will remain a niche feature for developers, who are still targeting traditional rasterization methods and have been pushing to get a GPU that accelerates those at no performance cost.

I think there is a certain beauty to both ray-tracing methods, in a weird, science-y way. Nvidia figured it out first because they were able to find a way to make an imperfect image and polish it up so that you couldn’t tell. AMD want to make a seemingly more “correct” image and do so within means that allow the GPU to better balance supporting the feature while not dedicating too much hardware real estate to it. While the full details of how it works on PC remain to be seen until, probably, AMD’s RDNA 2 announcement on 10/28/2020, based on the DirectX Raytracing whitepaper (read that too!) it seems that the base model for raytracing it provides does not call for AI denoising or any special processing, instead using different scene structure to ensure objects are loaded and called quickly for BVH traversal processing, and then use specific shader types to generate the effects called for by the results of the ray-triangle intersection work done.

The late fall and winter of this year are shaping up to be very interesting for computer graphics!

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.