Sidenote: The Upcoming Ryzen 4th Generation Announcement and What I’d Like To See

With the GPU market in a relative mode of stasis until Nvidia finishes releasing the Ampere graphics cards and AMD not set to announce anything, much less release, until late October, the CPU race is where the enthusiast in me has turned with nearly full attention (minus waiting patiently for 3090 benchmarks to see what I want to buy).

The best tech story of the last few years has been the comeback of AMD in the CPU market. In 2011, two things happened that made a huge shift in the market – AMD, after being competitive and even winning large chunks of the 2000’s CPU war, released the Bulldozer architecture as the FX CPUs, which was an utter failure and bombed in the market. Performance even regressed in some cases compared to the prior Phenom II generation, which was unexpected and made the series look awful. Compounding this, Intel launched their Sandy Bridge architecture, offering massive improvements on their prior-generation CPUs, a one-two knockout punch that almost led AMD to bankruptcy, as they were fighting losing battles on multiple fronts and it took then-CEO Rory Reid engineering a deal to get AMD intellectual property into the PS4 and Xbox One to save the company from the brink.

What went wrong? Simple – Fusion.

After buying GPU designer ATI for a massive amount of cash, AMD’s goal was to align to a heterogeneous system architecture, with much more sharing of work between the CPU and GPU in particular. Their vision was to take the floating point computational power of a GPU, and use that with an integer-strong CPU, with both sharing the total system work. Intel, being a CPU-primary company with no strong GPU options in their stack, did not adopt it, and Nvidia, in the opposite boat of Intel, also did not adopt it, so when AMD did adopt it, it made their products worse for a generation. FX CPUs were built with more integer performance than floating point, with the “8 core” claim being that the CPUs had 8 integer units, while each pair of integer units shared a single floating point unit. When they were the only hardware team in town supporting the HSA concept, the products sold individually to meet those needs failed and failed hard. Products where the idea could bear fruit did better, which is why AMD’s “APU” processors from this time are pretty decent, and the game console SOCs AMD makes to this day benefit from some implementation of this idea, although as time has moved on, little of the load-sharing idea remains.

Ryzen, in development starting in 2012, aimed to bring AMD back to a simple, standard CPU model, and to make it cost-effective by building a product stack on a very small number of core components. In the first two Ryzen generations, the entire lineup was built on a single chip. Called “Zeppelin,” the core silicon of all Ryzen and Epyc parts was an 8-core die with two-channel DDR4 memory control and a handful of PCI-Express lanes. You want a 32-core server CPU? Put 4 of those on a package. A strong 8-core gaming CPU? Throw a good sample of that die into a desktop CPU form-factor. Need a cheaper part? Just laser-cut non-functioning dies in a symmetrical manner and you can have 4 cores up to 32 cores, all built on the same silicon.

The Ryzen 3rd gen parts complicated this slightly through making the product stack 3 chip designs. Now, AMD produces a CCD – a “core chiplet die” which has 8 CPU cores and their cache but nothing else, and two variants of an IO die – a separate chip which contains memory controllers, PCI-Express lanes, and the Infinity Fabric interconnect logic that connects the chiplets in a CPU package together. With this, while the external package of Ryzen looks the same, the inside is vastly different. Desktop CPUs from 4-8 core designs use two chips – a single CCD and an IO die. The desktop Ryzen parts with 12 and 16 cores? Those use 3 – 2 CCDs and a single IO die. Threadripper and server Epyc CPUs? Those use a larger IO die with 8 memory channels and more PCI-Express lanes (Threadripper chips for sale to consumers have half of these resources available compared to the Pro and Epyc chips) and anywhere from 2 to 8 CCDs, allowing for up to 64 CPU cores.

AMD’s design is incredibly smart for a handful of reasons – it is cheaper to produce, less reliant on the whims of silicon yield, allows them to take parts that don’t scale well (the IO die) and buy that from their spun-off silicon manufacturing arm to meet their legal requirements to continue business with the spun-off unit, while making the CPU dies with cutting-edge TSMC manufacturing tech, and allows them to scale up or down in a lot of interesting ways.

However, there are design issues in Ryzen that cause some concern and sometimes allow Intel CPUs to outperform them. Let’s discuss those before talking about the rumors for the new generation!

CCD and CCX Design: AMD’s Ryzen design is modular and scalable to allow CPUs to rapidly be pushed to higher core counts and to optimize production by using a single CPU die in each lineup to improve yields and reduce costs. There is a problem with this however – latency. In a current Ryzen CPU, the CPU cores are divided into groupings called CCXs. On the die, each CCX is 4 cores, and products that aren’t even multiples of 8 usually disable one or two cores per CCX to get to those numbers (a 6 core Ryzen is 2 3-core CCXs, the 12 core Ryzens are 4 3-core CCXs, and all of the 4-core variants except the Ryzen 3 3300X use two 2-core CCXs). This creates a problem when an application needs more cores than it can get from a single CCX. At that point, any need for resources or responses from a core on a different CCX means that the data has to travel out of the home CCX, across the Infinity Fabric, over to the other CCX, and then back again. Physically, this is a longer journey and so while the cores in a single CCX can access each other quickly, when this trip is invoked, responsiveness slows down. AMD has worked to improve this each generation, and Infinity Fabric clock generally matches your RAM speed (meaning there is a larger performance incentive on Ryzen to buy faster DDR4) so you can influence this slightly, but it is a hindrance, when Intel’s ring bus design in their desktop CPUs is lower-latency.

Clock Speeds: Ryzen is, by design, a slower clocking CPU. Intel, at this point, has been releasing halo CPUs in their mainstream lineup that break 5 GHz for around 3 years. Ryzen, for contrast, could only hit around 4 GHz at launch in 2017. The current generation products get far closer, with rated boost clocks as high as 4.7 GHz. There is a snag, however. Intel’s boost is a “dumb” boost – short of the Thermal Velocity Boost in the i9 10th generation lineup, Intel simply sets a table of boost multipliers for the CPU based on how many cores are loaded, and the motherboard you socket that CPU into defines how long it maintains that specified boost. Most motherboard manufacturers on desktop set the time limit to infinity. As long as the CPU isn’t sitting idle with nothing to do, or on the verge of cooking itself to death, it will run at maximum boost clocks, and if the current i9 CPUs are under Intel’s temperature threshold for TVB, they will run at that speed instead. This means that Intel’s CPUs boost to higher clock speeds and maintain that boost for far longer, as any half-competent system build will cool them far beyond spec, even for TVB in the new i9s.

Ryzen’s boost, on the other hand, is smart! AMD used to define Ryzen boost as two technologies that were separate – Precision Boost and XFR (eXtended Frequency Range). Precision Boost would cap the CPU at a box-printed maximum speed provided operating conditions were favorable – if the box says a Ryzen 7 2700X maxes out at 4.3 GHz, that is the speed Precision Boost will try to get to and stay at. XFR (originally the reason for the X branding in some CPUs), would use analytics to determine how much power delivery headroom your motherboard had, how much thermal headroom the CPU currently has, and then use any gaps between available capacity and current utilization of these resources to dial up the clockspeed further, in 25 MHz increments in the last iteration. This would lead to my 2700X, for example, boosting through both technologies as high as 4.35 GHz under ideal conditions.

However, AMD combined these technologies in the third gen and lists a boost spec that combines both. The end result is that for many high-end CPU owners, like my own 3900x 12-core, you may never see the printed boost speed from the box, and indeed I haven’t – 4.55 GHz out of a possible 4.6 GHz. The other issue is that the “smart”-ness of AMD boost technology means the CPUs often don’t hit high boosts or stay at high boosts for long. In order to save power and thermal headroom, the CPUs will bounce all over the place, reaching peak speeds for as little as fractions of a second before shifting back down. When they do reach high sustained clocks, they’ll taper off much faster than an Intel part. The end result? Ryzen users will see what feels like a failure of the CPU to reach a proper boost frequency, will see it boost less often for less time and less clockspeed, and this can often feel like performance left on the table. At least the second-generation fixed the “all or nothing” nature of Precision Boost so that it flows up and down instead of zipping to top boost and then dropping back to stock!

IPC: This one isn’t actually a problem, but there is a perception it is depending on who you ask. Most data suggests that AMD’s current-gen Ryzen parts have higher IPC (instructions per clock) than Intel’s current CPUs. Not by a ton, but enough. However, as AMD’s design seemingly pushes clockspeeds less, IPC is needed to make a gain in performance that is noteworthy and deemed sufficient for an upgrade or new purchaser to make the plunge. The problem here is that AMD’s margin of victory over Intel does not exceed the clockspeed advantage Intel has, which means Intel squeaks out a win, and in some cases (gaming and lightly-threaded apps), they look dominant when compared to AMD. IPC is not a universal measure as different types of math work can change the complexion of the contest. AMD doubled Ryzen’s floating point capabilities for the third-gen CPUs, which is where a bulk of the performance came from and is what has allowed them their lead there. Most games are floating-point heavy, and this is why Ryzen third-gen was such a huge upgrade for gamers over the prior parts. For AMD to be perceived as winning this race by all observers, they need to inch up clockspeeds again while making a similarly good leap forward in IPC.

With all of that on the table, the next-gen Ryzen design has some big potential opportunities it can fill, and the reason I am writing this is because they have, by rumors and data from various AMD presentations to investors and at conferences over the last year and change, done a lot to maximize on their strengths while filling in some of those opportunities. Let’s discuss, point by point:

Core Design and Latency: AMD is moving to a stronger design that uses the physical separation of components within their CPUs as the logical unit of measure, instead of imposing the launch-era artificial boundry. Instead of 4-core CCX designs with two CCX per CCD and added latency within the same die, AMD is consolidating to an 8-core CCX design, with each physical core chiplet being a CCX unto itself. This means that there is vastly reduced latency when moving beyond 4 cores, less need for the higher core count CPUs to lean on the Infinity Fabric, and, because of the shared resources per CCX being opened up, each CPU core has access to a larger level 3 cache, which now effectively doubles in size for any given core. Smaller software tweaks (working with Microsoft to include Ryzen-aware scheduling in Windows, doing the same on Linux, further generational improvements to the Infinity Fabric) will couple with this to further improve performance, which should make fourth-generation Ryzen CPUs far better for gaming. This is something you can witness today, as the Ryzen 3 3300x uses a single 4-core CCX instead of a symmetrical 2+2 design, and it is vastly better at gaming than even higher-clocked 4-core Ryzens for it.

Clock Speed Consistency: Because AMD is not changing the manufacturing process for fourth-gen Ryzen parts, increases to clock speed will be meager if offered at all. Instead, rumors suggest that AMD has been working to improve clock stability, tuning the Precision Boost algorithm to work towards higher sustained clockspeeds such that if the box says 4.6 GHz, you should get there, but also should see the CPU at higher overall clockspeeds more of the time, with less ping-ponging and inconsistency. It’s unknown how fully AMD would get there, but there are a lot of levers within the Precision Boost mechanism that could be used to achieve that result (less aggressive temperature management when in safe parameters, using more of the rated power envelope given to the CPU under stock settings, reaching total socket power more of the time), and coupled with the changes to core design, this will deliver for gamers – higher clocking CPUs with reduced latency between cores leading to far fewer slowdowns or traffic jams while waiting on data to roundtrip all over the CPU package.

IPC Improvements: This one is firmly speculation, but multiple leakers have been talking about how the new Ryzen design has improvements to Integer performance that can yield up to a 10-15% performance improvement, similar to the IPC gain offered from doubling floating-point performance in the third-gen CPUs. IPC is highly workload dependent, but AMD has been focused on ensuring floating-point performance has kept pace as its neglect was the primary failure point of the FX series processors. Now that Ryzen has a firm foothold and solid FP performance, however, a shift in focus to improve integer workloads is a way that AMD can further increase performance in a less-drastic fashion. I’m dangerously close on this point to Dunning-Krueger territory, so I’ll conclude by simply saying that a quick look around Google for what percentage of a modern CPU’s workload is integer versus floating point and it was inconclusive and suggests both still matter for different types of work, so the improvement is welcome all the same!

Overall, all of this in consideration, I am really excited for this new generation of CPUs from AMD. Since Ryzen launched, it has been the only thing I have personally bought, bought for my wife, or recommended and built for friends, because while yes, technically Intel’s highest-end consumer chip is very slightly faster at gaming, Ryzen is better at most other workloads – video rendering, 3D rendering, various system operations, and especially at multitasking. My current streaming encoding on Twitch is a full CPU encode, which I can run with a better x264 profile for higher quality while still using less than 20% of the CPU while both running the game and encoding and uploading the video data to Twitch in real time. The changes that AMD is rumored and confirmed to be making, taken together, point to a real inflection point in the modern CPU wars. At a time where Intel isn’t on track to offer anything new until next year, which will see their first CPU core redesign in 5 years, AMD may actually grab the gaming performance crown, and at a time where the new-generation consoles, also using Ryzen technology, will ensure games are built for the AMD model of computing more so than the Intel one.

Fascinating times in technology we live in.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.