I have 3 small tech stories I wanted to talk about, but none of them felt like a post of their own (although I did start a longish draft on the cache configuration of the RTX 40-series GPUs!). So today, let’s take a look at these 3 stories – the new boosting methodology on Ryzen 7000 CPUs, Intel’s ARC A770/750, and that RTX 40 cache info.
Hot Hot Hot: Ryzen 7000 Boosts To Temp Target First
Ryzen 7000 series CPUs launched this week, featuring AMD’s new Zen 4 core architecture and solid performance gains over the 5000 parts that preceded them (the naming scheme is weird and puts even thousand numbers in mobile now, apparently). The most interesting detail to me so far isn’t the new platform, new socket, or the performance in isolation – but rather, how the CPU gets that performance and how it might hurt the CPU in the hands of an individual user.
Most boost methodologies that claim any level of “smart”-ness for CPUs or GPUs generally use a triangle of factors they can max out to improve performance – the clockspeed ceiling on a given part, the power the chip can be supplied, and thermal performance – don’t want the chip running too hot, now. AMD’s Ryzen had temperature as basically the last resort for increases in prior generations, with Precision Boost and XFR technologies (later rolled into a single branding term) used to keep clockspeed as high as possible without overdrawing power or overheating the CPU. Ryzen 7000 changes that according to reviews, as the new boost methodology first throttles the CPU to a nice and toasty 95 degrees Celsius before bringing power and clockspeed into ranges that can keep the CPU at this temperature.
I can actually kind of see the benefit this has theoretically – temperature is normally the most variable of the 3 attributes used in boosting algorithms, so sure, why not control the most unknown first and then dial in the remaining two? It means your CPU can always hit a maximum performance threshold based on your configuration, and the reviews I’ve looked through all seem to tell a similar tale – clockspeed remains pretty steady in this state, so the CPU is very good at finding a performance target that meets that temperature and then locking on to it. However, in practice, this ensures a couple of undesirable outcomes.
Firstly, most motherboards since the dawn of the feature use the CPU temperature as control for fan curves – if you set a CPU-based fan curve on Ryzen 7000, the second your CPU decides to boost, prepare for wind tunnel! You can set other components as the tell in most modern UEFI BIOSes, like the motherboard chipset or the GPU, so it’s fixable, but still annoying!
Secondly, this means that performance benchmarks are variable to a point and that the end-user experience under heavy workloads is going to depend a lot on cooling. If you have a cheap standby heatsink you love (Hyper 212 fans, looking at you), you can still use it – at the cost of a bit of boosting headroom. How much “a bit” is depends on the cooling in your whole system and room as well, and you can see why temperature is such a variable target.
Lastly, there is a real concern over lifetime of the CPU given the high thermals. Heat is one of those things most PC enthusiasts stress over as a silent killer of hardware, and seeing a CPU sitting at 95 C while playing games or doing any serious task would set me on edge. I do wonder if maintaining the high temps might actually be a benefit – heat stresses tend to be more from heating/cooling cycles causing materials to expand and contract, and while a silicon CPU die isn’t necessarily going to have those issues (or at least it’s engineered not to), it still feels like a concern. I’d also worry about the implications to your system as a whole and your room environment – 95 C CPU is going to be pumping a lot more heat into the ambient air, which will sit right above the GPU in most systems warming things up, and that hot air will then (hopefully with good airflow!) escape the case and heat up the air in your space.
On the plus side, the new parts have very precise boosting to stay at that target temp and seem to have pretty good performance even if the boost cannot go full throttle, but if you’re thinking about getting one – maybe get a 360mm closed-loop liquid cooler or a beastly heatsink like one from Noctua.
Bringing Back Balance, Says Intel Of Their New ARC Graphics Cards
Intel finally did it.
For the PC tech old-heads (yes, 2008 news is old now), you might remember discrete graphics cards being a business that Intel has had eyes on since they first started prototyping the failed, never-released Larrabee GPU back then. Since that flop, Intel hired up a fair bit of old graphics talent, including Raja Koduri from AMD and Tom Peterson from Nvidia, and set them to work on Intel Xe graphics, the project that would eventually turn into two distinct paths – stronger integrated graphics from the 11th-series Core CPUs onwards, and now ARC.
ARC has taken a tumultuous path to arrive in the market, from seeming silence to uncomfortable lengths of time between announcements or even rumors, and even has released cards already like the DG1 – an Intel integrated graphics processor on its own PCIE card with some slow DDR4 memory for framebuffer. The new ARC lineup, coming in October, aims to change the market, with two parts – the A750 and A770.
Intel’s goal here is a fairly humble-seeming one, at least to start – address the mid-range market. To be fair, this is absolutely a great idea, as Nvidia and even AMD have started the process (or in Nvidia’s case, completed the process) of abandoning that market segment. The lowest-tier Nvidia got in the RTX 30 series was the 3050, which was still a card at around $300 with an increasingly bad value proposition, and while AMD has Radeon RX 6000 cards further down that stack, they make a lot of compromises to get there, leaving the once golden fields of the enthusiast mid-range barren and wanting.
Intel’s own performance numbers are the only ones currently available, so I hesitate a lot to lean on them. This is the company, after all, who thought it was a great idea to put out benchmarks by a third party on the CPU side where they tested Ryzen CPUs with half the cores turned off! However, if they deliver around 90% of the performance they are promising in their presentations, then they might actually have a viable card or two on their hands, as the card is shown to be pretty close to the RTX 3060 while being around $50-$80 less to buy. Intel is probably the most ideal competitor in this space too, because while both Nvidia and AMD are design firms who have to outsource chip production to third-party foundry partners like TSMC or Samsung, Intel owns one of the most robust silicon foundry operations in the world and has started regaining ground on process technology they lost in the years they spent wandering the desert of their overambitious 10nm process. They have much greater control over their own supply chain and ability to dedicate wafers to making GPUs, where AMD and Nvidia both have to negotiate supply with a separate entity. Intel’s presentations call this out in vague terms, that they basically have the ability to absorb the kind of pricing pressure that exerts real influence on AMD and Nvidia (on top of both companies having documented desire to drive up average selling prices).
While I’m not personally in the GPU market ATM (getting an RTX 3080 at MSRP in December 2020 was enough of a miracle for a few years!), I’m watching ARC with a bit more interest than I anticipated, largely because Intel has finally moved beyond just putting integrated graphics silicon on an expansion card, and if they keep up the performance trend, it could make GPUs much more interesting in the years to come. Or, at least, more affordable, and I think we’d all welcome that!
Nvidia Is Giving You More Cache For More Cash
As the dust settles on the initial announcement and unveiling of the RTX 40-series cards from Nvidia, one detail in particular has stood out to me – the vastly increased L2 cache on the GPU die. The RTX 3090 Ti had 6MB of total L2 cache on the GPU, where the RTX 4090 has…96MB. This increase is fascinating and paints an interesting picture.
Last generation, AMD’s big marketing push for the Radeon RX 6000 series was Infinity Cache, their branding term for a 128 MB on-die cache on the top-end parts (which was whittled down with each lower-tier silicon version). This cache was purported to have two benefits – it helped to mitigate the sting of a top-end card with a 256-bit memory bus by offering faster local memory within the GPU, and it was said to help raytracing performance substantially, as RT demands fast access to small pieces of data like a BVH table (bounding volume hierarchy) to perform the math it needs. Marketing buzz aside, the Radeon RX 6000 cards did have great rasterization performance in spite of slower main memory than their competition, and the cards were pretty good for first-generation raytracing from AMD – in some games with RT shadows, they could actually even beat RTX 30-series cards!
Nvidia seems to be leaning on this cache in much the same way, without calling too much attention to it (wouldn’t want to give that engineering plaudit to AMD, after all). The RTX 4090 is no-compromises, with a full 384-bit wide memory bus and faster GDDR6X memory, but at the 4080 series (both of them), the memory bus is a big point of contention, with the 16 GB card having a 256-bit bus and the 12 GB having a paltry 192-bit bus. As a point of comparison, Nvidia traditionally has used a 192-bit bus in recent generation xx60 GPUs. This narrower bus width means slower memory, with the 12 GB RTX 4080 having around two-thirds the bandwidth of the RTX 3080. But this cache may help mitigate that by a fair amount. On the RTX 4090, the cache will likely enable some pretty respectable increases in RT performance even without taking into account the increased raster hardware, increases to the RT hardware, and generational technological updates. On both flavors of 4080, this cache will likely help reduce the negative impact of narrower memory buses, giving the GPU a small place to put frequently-used data.
The positive impacts of cache in an architecturally robust design are not new to GPUs either. Microsoft has used daughter-die cache in the GPUs for both the Xbox 360 and the Xbox One. In the 360, it took a GPU that was already more powerful than that of the PS3 and supercharged it, offering a small swap area for textures and some logic to enable performance-neutral anti-aliasing. In the Xbox One, the daughter-die of cache was all that kept the GPU from being choked to death by the baffling decision to use low-bandwidth DDR3 as the shared system memory, and while it wasn’t quite enough in many aspects (see the number of early Xbox One titles that were already rendering in 720p natively and upscaling as testament to the poor design of that system!), it did keep the Xbox One in relative comfort as the runner-up to the PS4, able to still handle things enough to be a steady generational companion to the better-kitted Sony hardware. AMD even now is experimenting with other cache-related improvements, with their stacked 3D V-Cache technology making the Ryzen 7 5800X3D one of the best CPUs for MMO gaming due to a daughter die of cache placed atop the CPU package. Intel has been upping their integrated CPU caches and AMD’s Ryzen 7000 parts already seem to be slated for refresh in early 2023 with V-Cache versions of the just-launched lineup.
Until benchmarks come out, of course, nothing is certain with the performance impact the cache will offer, and there’s not a way I have seen to do some A/B testing with or without the cache, but I would suspect that a decent part of the performance uplift between generations that Nvidia is promising with RTX 40-series comes down to better-optimized memory handling, which the cache would play a pivotal role in enabling.