Search

The AMD Ryzen 9 3950X Review: 16 Cores on 7nm with PCIe 4.0 - AnandTech

Deciding between building a mainstream PC and a high-end desktop has historically been very clear cut: if budget is a concern, and you're interested in gaming, then typically a user looks to the mainstream. Otherwise, if a user is looking to do more professional high-compute work, then they look at the high-end desktop. Over the course of AMD’s recent run of high-core count Ryzen processors that line has blurred. This year, that line has disappeared. Even in 2016, mainstream CPUs used to top out at four cores: today they now top out at sixteen.

Does anyone need sixteen cores? Yes.

Does everyone need sixteen cores? No.

There are two fundamental drivers for most PC builders: cost and performance. Users who want a gaming machine are going to put their dollars in what gives them the best gaming performance. Users that want to edit video are going to look at content creation focused hardware. For those in the business world, the added incentive of extra performance is being able to offset or amortize those costs with an improved work rate. For the video editor needing a week per video, if they can spend +40% to reduce the render time by half then it can pay off over a short period of time.

As we move through 2019, users are doing more with their systems. Even at the low end, users might have double monitors where they game and watch their favourite streamer at the same time. High end users might reserve certain cores for different tasks, ensuring that there’s always some horsepower for the high-throughput tasks or virtual machines. Even though processors became ‘multi-core’ over a decade ago, we all as users are only recently adjusting how we do things to be more parallel, and the hardware is coming up to match our demands.

To that end, AMD’s Ryzen processors have been timely. The first generation mainstream Ryzen hardware in 2017 was a breath of fresh air in a market that had become sufficiently stale to be unexciting. With the color drained, AMD’s Ryzen enabled up to eight cores on a single CPU, and at the time aimed to throw its weight against Intel’s hardware in the class above. The new architecture didn’t push ahead on day one clock for clock, but it enabled a different paradigm at an obscenely reasonable price point.

Enter round 2, and Zen 2. Earlier this year AMD pushed again, this time putting 12 cores in the market for the same price as 8, or what had been the 4-core price point only three years prior. In three years we had triple the cores for the same price, and these cores also have more raw performance. The frequency wasn’t as high as the competition, but this was offset by that raw clock-for-clock throughput and ultimately where the competition now offered eight cores, AMD offered 12 at a much lower power consumption to boot.

Today is round 2 part 2: taking that same 12-core processor, and adding four more cores (for a 50% increase in price), and not only going after the best consumer processor Intel has to offer, but even the best high-end desktop processor. This is AMD squeezing Intel’s product portfolio like never before. What exactly is mainstream, anyway?

AMD’s new Ryzen 9 3950X has a suggested retail price of $749. For that AMD is advertising sixteen of its latest Zen 2 cores built on TSMC’s 7nm process, running at a 3.5 GHz base frequency and a 4.7 GHz single-core turbo frequency. The TDP of the chip is rated at 105 watts and it has 24 PCIe 4.0 lanes as well as dual memory channels that support up to 128 GB of DDR4-3200.

AMD 'Matisse' Ryzen 3000 Series CPUs
AnandTech Cores
Threads
Base
Freq
Boost
Freq
L2
Cache
L3
Cache
PCIe
4.0
Chiplets
IO+CPU
TDP Price
(SEP)
Ryzen 9 3950X 16C 32T 3.5 4.7 8 MB 64 MB 16+4+4 1+2 105W $749
Ryzen 9 3900X 12C 24T 3.8 4.6 6 MB 64 MB 16+4+4 1+2 105W $499
Ryzen 9 3900 12C 24T 3.1 4.3 6 MB 64 MB 16+4+4 1+2 65W OEM
Ryzen 7 3800X 8C 16T 3.9 4.5 4 MB 32 MB 16+4+4 1+1 105W $399
Ryzen 7 3700X 8C 16T 3.6 4.4 4 MB 32 MB 16+4+4 1+1 65W $329
Ryzen 5 3600X 6C 12T 3.8 4.4 3 MB 32 MB 16+4+4 1+1 95W $249
Ryzen 5 3600 6C 12T 3.6 4.2 3 MB 32 MB 16+4+4 1+1 65W $199
Ryzen 5 3500X 6C 6T 3.6 4.1 3 MB 32 MB 16+4+4 1+1 65W OEM

It wasn’t too long ago that this price range used to be the realm of AMD’s high-end desktop Threadripper processors, which started at 8 cores and we up to 32 cores. AMD is now shifting that paradigm as well, with this 16-core chip being at $749, and AMD’s next generation Threadripper 3000 processors starting at 24-cores at $1399. When AMD CEO Dr. Lisa Su was asked earlier this year what would happen given the drive to more cores for the mainstream processors, her response was ‘as Ryzen goes up, Threadripper goes up-up’. This is the realization of that.

It is worth noting that the price is likely to be higher at retail initially, as demand is expected to be high and stock levels haven’t been defined – given the popularity of the 12-core chip, it would seem that users wanting the mainstream platform always want the best.

Going AM4: The Battle with Motherboards

When the AM4 platform was first launched, technically with pre-Zen hardware, it supported four cores. The same platform now goes all the way up to sixteen cores, which is no small task. The flip side of this comes down to motherboard support: some AM4 motherboards were not designed with high-power sixteen core processors in mind. Some motherboards built on the AM4 socket were for the budget market, and will struggle when it comes to this 16-core part.

AMD has attempted to at least segment its AM4 market a little. Only the latest AM4 chipset, the X570 chipset, has official support for the Ryzen 3000-series PCIe 4.0 connections. In order to enable the PCIe 4.0 lanes on the processor as qualified by AMD, users will have to purchase an X570 motherboard, otherwise these lanes will run at half speed (PCIe 3.0) in non-X570 motherboards.

The quality of the motherboard is likely to affect turbo frequencies as well. AMD’s turbo algorithms are influenced in part by the ability of the power delivery to push current through from the power supply. We are seeing X570 motherboards range from $170 all the way up to $999. This isn’t saying that doubling the cost of the motherboard will double the ability to turbo, but as seen with the previous Ryzen 3000 series chips, the motherboard choice (as well as the cooling it uses) will matter.

All the X570 motherboards we’ve tested recently are up to the task of taming the Ryzen 9 3950X. Here’s a list of what we’ve tested:

Users looking at motherboards have to find the right mix of capacity, cost, and features. We did a visual inspection of all 35+ launch models.

Toe-to-Toe: Intel Core i9-9900KS / Core i9-9980XE / Core i9-10980XE

With the mainstream and high-end desktop market now seemingly merging, there are many angles to consider different competitive parts between Intel and AMD. If we compete purely on PCIe lanes, then we might put the Core i9-9900KS (8-cores) up against the 3950X (16-cores), although there is a big price difference ($513 vs. $749). If we compare on pricing, the nearest processor to the 3950X would be either the 9900KS (mainstream) or the Core i9-10940X ($729), however while 3950X has more cores than either, but doesn’t have as many PCIe lanes/memory lanes as the 10940X. If we go for core count, then Intel’s sixteen Core i9-9960X would be the obvious candidate, although this CPU is a lot more expensive (until Intel reduces the price) and is technically an X299 processor, so has more PCIe lanes and memory channels.

Unlocked CPU Pricing
and Select Others
AMD
(MSRP Pricing)
Cores AnandTech Cores Intel*
(OEM Pricing)
    $900-$999 18/36 Core i9-10980XE ($979)
    $800-$899    
Ryzen 9 3950X ($749) 16/32 $700-$799 14/28 Core i9-10940X ($784)
    $600-$699 12/24 Core i9-10920X ($689)
    $500-$599 10/20
8/16
Core i9-10900X ($590)
Core i9-9900KS ($513)
Ryzen 9 3900X ($499) 12/24 $400-$499 8/16 Core i9-9900K/F ($488)
Ryzen 7 3800X ($399) 8/16 $350-$399 8/8 Core i7-9700K/F ($374)
Ryzen 7 3700X ($329) 8/16 $300-$349    
    $250-$299 6/6 Core i5-9600K ($262)
Ryzen 5 3600X ($249) 6/12 $200-$249    
Ryzen 5 3600 ($199) 6/12 Below $200 4/4 Core i3-9350K ($173)
*Intel quotes OEM/tray pricing. Retail pricing will sometimes be $20-$50 higher.

There is no easy comparison between any of the processors. AMD is pushing the boundaries of the mainstream dual channel memory processor regime, and Intel doesn't have an equivalent in that space. Intel can match it in the high-end desktop space, but therein lays other issues with PCIe lane counts and memory channel support disparity between the two, as well as Intel’s current retail options being high-priced variants. Intel’s published next generation hardware is set to be launched sometime in November, and with it a number of price cuts, however given the known differences between Intel’s current and Intel’s next generation processor line, the performance gain is not expected to be particularly big.

For regular readers, we have covered the discrepancy in how different companies ascribe the Thermal Design Power to their product lines:

While Intel’s TDP represents the internal power measured for long and sustained high performance (also motherboard dependent), AMD’s metric is more akin to actual thermal cooling requirements for a given cooler rating. That being said, the power consumption of AMD’s first and second generation Ryzen processors has often been parallel to the TDP rating on the box, with the CPU levelling out to the TDP value as we load up the cores with a high energy workload.

For example, here’s our 16-core 1950X data. The Threadripper 1950X is a 180 W chip, and we saw the cores take a total of 134 W.

Here’s our Ryzen 7 2700X data.

This 105 W TDP processor was only recording 86W across the cores at full load.

It’s worth noting that our data is primarily to do with the total power consumed by the cores. There are other power factors at play, such as the Infinity Fabric, the DRAM controller, the PCIe controller, and any other IO, which might add up to the power of the overall package. The maximum power available to a processor should be the package, of which the cores take up most of the sum.

With Ryzen 3000 and Zen 2, AMD’s attachment to TDP was not as clinical as its first two generations of hardware. In our Ryzen 7 3700X review, with the 12-core processor, we saw this:

The Ryzen 7 3700X is a 65 W processor, and yet we can see that the cores total up to 74 W by themselves, with the rest of the chip taking another 16W or so, totalling 90 W for the whole chip. This is perhaps indicative of two things: firstly, that Intel’s turbo policy was creating 95 W TDP chips that consumed 160W in turbo modes and AMD believed it had headroom, or pushing these new chips to the edge required a little more power.

With the Ryzen 9 3900X, with 12 cores, we saw the same thing again.

Despite this being a 105 W TDP chip, the cores at full load saw 122 W peak, with the rest of the chip getting ~24 W, making for an overall 146 W power draw (as measured by the processor internally).

This would all indicate that Zen 2 has a different strategy to the previous Zen chips when it comes to how AMD is rating its TDP. If we saw the same thing with the Ryzen 9 3950X, then it pretty much confirms the hypothesis.

At its peak, the 3950X draws 137 W for the cores when 10 cores are loaded. The chip as a whole hits ~144-145W at that level, well above the 105 W TDP rating on the box. This is partly why AMD is recommending a large liquid cooler for this chip. Under Intel’s definition, the TDP rating is a guarantee for the power consumption at base frequency, although most Intel processors can go above that frequency and stay within the power. We might be seeing something similar here with AMD now.

It is worth noticing that when up to two cores are loaded, we see each core getting around 18 W of power, but when all the cores are loaded, we are seeing between 6.9 W and 7.6 W. This is compared to the 12-core 3900X, which has about 17.5 W per core initially, and falls down to 10 W per core. AMD is trying to get a higher single core frequency from the 16-core hardware, so by giving more power when a single core is loaded, this might help.

One other thing to note is where the peak power is observed. We kind of already saw this on the Ryzen 9 3900X in that review, where the peak power of the chip happened when 10 cores were loaded, not the full 12 cores. The difference between the two was minimal, but we’re seeing this on a larger scale with the Ryzen 9 3950X.

When looking at both the cores-only power and the CPU total power, we get a peak with this processor when 10 cores are loaded. This would indicate a 3+2+3+2 mix on the CCXes, which is perhaps an inflection point when current densities start getting much higher and per-core power has to be reduced to ensure everything is still working optimally. The power differential between 10-core use and 16-core use is almost 20W, so users that don’t always use all the cores all the time might exhibit good per-thread performance up to 10 core workloads.

Speaking of frequencies, this has been a touchy topic of late. We have seen with recent news and testing that some users are not observing peak single core frequencies of their Ryzen processors. As we explained in our deep dive of the issue, part of it comes down to the fact that AMD’s turbo policies for Zen 2 are different to Intel: only one core in a set is likely to turbo up to the highest frequency, whereas Intel’s Turbo Boost 2.0 mandates that all cores should hit peak turbo. The other part of it is the testing methodology, but also the fact that the ACPI standards at the OS level can indicate a turbo on a shorter time scale than software can record, ultimately giving users a smeared out version of that turbo value. Then there are other things, like BIOS versions and Windows power plans.

With our Ryzen 9 3950X, the on-the-box single core turbo frequency is listed as 4.7 GHz. We tested using the ASRock X570 Taichi motherboard, a very high-end product, using Windows 10 v1909 on AGESA 1004B, on both the High Performance (HP) power plan and the Ryzen High Performance (RHP) power plan. For peak single core frequencies, we were able to see 4525 MHz on the HP plan, and 4650 MHz on the RHP plan. This latter value is pretty much on the button for the on-the-box turbo value (I’m sure some people will disagree about those 50 MHz).

These values on the RHP power plan were very instantaneous, as when we put a consistent single thread load on the core, the frequencies very quickly came down.

On the Ryzen High Performance power plan, our sustained single core frequency dropped to 4450 MHz. In these tests, we use an affinity mask to limit how many cores are active while we run POV-Ray, and take the reading about 30 seconds into the benchmark, which allows a core to experience a form of heat soak and reach a reliable current density. This is also how we reached the 18 W per core value for 1-2 core loading in the graphs above, indicating that in order to get a sustained 4.7 GHz single core frequency, AMD would need to drive around 21-24W to the core in order to get that value. It is very likely that the CPU can hit those high numbers, for microseconds at a time, as per the ACPI/CPPC2 stack, but for any user doing per-second or per 100ms monitoring, they’re not likely to see it.

Within this frequency graph though, we can see that the frequency beyond 3 cores has segments. Between 3 cores and 8 cores loaded, we get 4225 MHz to 4125 MHz (100 MHz range), and even at all cores loaded, we’re seeing 3875 MHz, well above the 3500 MHz base frequency listed on the box.

So to answer the question at the top of this page: is the 105 W TDP accurate? If you think it equates to power consumption, then no.

In our full review, we are testing the Ryzen 9 3950X on both the HP and RHP power plans.

As per our processor testing policy, we take a premium category motherboard suitable for the socket, and equip the system with a suitable amount of memory running at the manufacturer's maximum supported frequency. This is also typically run at JEDEC subtimings where possible. It is noted that some users are not keen on this policy, stating that sometimes the maximum supported frequency is quite low, or faster memory is available at a similar price, or that the JEDEC speeds can be prohibitive for performance. While these comments make sense, ultimately very few users apply memory profiles (either XMP or other) as they require interaction with the BIOS, and most users will fall back on JEDEC supported speeds - this includes home users as well as industry who might want to shave off a cent or two from the cost or stay within the margins set by the manufacturer. Where possible, we will extend out testing to include faster memory modules either at the same time as the review or a later date.

Test Setup
AMD Ryzen 3000 AMD Ryzen 9 3950X
AMD Ryzen 9 3900X
Motherboard ASRock X570 Taichi 2.50 (AGESA 1004B)
CPU Cooler Kraken S62
DRAM Corsair Vengeance RGB 4x8 GB DDR4-3200
GPU Sapphire RX 460 2GB (CPU Tests)
MSI GTX 1080 Gaming 8G (Gaming Tests)
PSU Corsair AX860i
SSD Crucial MX500 2TB
OS Windows 10 1909

We must thank the following companies for kindly providing hardware for our multiple test beds. Some of this hardware is not in this test bed specifically, but is used in other testing.

Hardware Providers
Sapphire RX 460 Nitro MSI GTX 1080 Gaming X OC Crucial MX200 +
MX500 SSDs
Corsair AX860i +
AX1200i PSUs
G.Skill RipjawsV,
SniperX, FlareX
Crucial Ballistix
DDR4
Silverstone
Coolers
Silverstone
Fans

Our System Test section focuses significantly on real-world testing, user experience, with a slight nod to throughput. In this section we cover application loading time, image processing, simple scientific physics, emulation, neural simulation, optimized compute, and 3D model development, with a combination of readily available and custom software. For some of these tests, the bigger suites such as PCMark do cover them (we publish those values in our office section), although multiple perspectives is always beneficial. In all our tests we will explain in-depth what is being tested, and how we are testing.

All of our benchmark results can also be found in our benchmark engine, Bench.

Application Load: GIMP 2.10.4

One of the most important aspects about user experience and workflow is how fast does a system respond. A good test of this is to see how long it takes for an application to load. Most applications these days, when on an SSD, load fairly instantly, however some office tools require asset pre-loading before being available. Most operating systems employ caching as well, so when certain software is loaded repeatedly (web browser, office tools), then can be initialized much quicker.

In our last suite, we tested how long it took to load a large PDF in Adobe Acrobat. Unfortunately this test was a nightmare to program for, and didn’t transfer over to Win10 RS3 easily. In the meantime we discovered an application that can automate this test, and we put it up against GIMP, a popular free open-source online photo editing tool, and the major alternative to Adobe Photoshop. We set it to load a large 50MB design template, and perform the load 10 times with 10 seconds in-between each. Due to caching, the first 3-5 results are often slower than the rest, and time to cache can be inconsistent, we take the average of the last five results to show CPU processing on cached loading.

AppTimer: GIMP 2.10.4

As a raw single threaded test, we see Intel's high 5.0 GHz CPUs near the top. The Ryzen 3700X and Ryzen 3900X beats the 3950X here by small margins, perhaps due to memory traffic or the complexity of dealing with more cores in the system. However the Ryzen 9 3950X sails past Intel's HEDT chips.

3D Particle Movement v2.1: Brownian Motion

Our 3DPM test is a custom built benchmark designed to simulate six different particle movement algorithms of points in a 3D space. The algorithms were developed as part of my PhD., and while ultimately perform best on a GPU, provide a good idea on how instruction streams are interpreted by different microarchitectures.

A key part of the algorithms is the random number generation – we use relatively fast generation which ends up implementing dependency chains in the code. The upgrade over the naïve first version of this code solved for false sharing in the caches, a major bottleneck. We are also looking at AVX2 and AVX512 versions of this benchmark for future reviews.

For this test, we run a stock particle set over the six algorithms for 20 seconds apiece, with 10 second pauses, and report the total rate of particle movement, in millions of operations (movements) per second. We have a non-AVX version and an AVX version, with the latter implementing AVX512 and AVX2 where possible.

3DPM v2.1 can be downloaded from our server: 3DPMv2.1.rar (13.0 MB)

3D Particle Movement v2.1

For some simple math without AVX acceleration, the 3950X piles on the core performance and IPC to give our best results, above and beyond what the Core i9-9980XE can provide for less power at under half the cost.

3D Particle Movement v2.1 (with AVX)

However, this is one benchmark where ratcheting in AVX2 and AVX512 really helps. There's no escaping the Intel HEDT family here, but on AVX2 mode AMD wins the best of the rest.

Dolphin 5.0: Console Emulation

One of the popular requested tests in our suite is to do with console emulation. Being able to pick up a game from an older system and run it as expected depends on the overhead of the emulator: it takes a significantly more powerful x86 system to be able to accurately emulate an older non-x86 console, especially if code for that console was made to abuse certain physical bugs in the hardware.

For our test, we use the popular Dolphin emulation software, and run a compute project through it to determine how close to a standard console system our processors can emulate. In this test, a Nintendo Wii would take around 1050 seconds.

The latest version of Dolphin can be downloaded from https://dolphin-emu.org/

Dolphin 5.0 Render Test

Dolphin is another ST test, and Intel's 4.7+ GHz family are ahead of AMD here. The 3700X is a smidgen ahead of the 3950X, perhaps due to having only one chiplet rather than two.

DigiCortex 1.20: Sea Slug Brain Simulation

This benchmark was originally designed for simulation and visualization of neuron and synapse activity, as is commonly found in the brain. The software comes with a variety of benchmark modes, and we take the small benchmark which runs a 32k neuron / 1.8B synapse simulation, equivalent to a Sea Slug.


Example of a 2.1B neuron simulation

We report the results as the ability to simulate the data as a fraction of real-time, so anything above a ‘one’ is suitable for real-time work. Out of the two modes, a ‘non-firing’ mode which is DRAM heavy and a ‘firing’ mode which has CPU work, we choose the latter. Despite this, the benchmark is still affected by DRAM speed a fair amount.

DigiCortex can be downloaded from http://www.digicortex.net/

DigiCortex 1.20 (32k Neuron, 1.8B Synapse)

DigiCortex likes memory channels, and so Intel's HEDT chips win here. Again we see the 3700X beating the 3950X, likely due to the available bandwidth per core being higher and more cores not making much of a difference in performance.

y-Cruncher v0.7.6: Microarchitecture Optimized Compute

I’ve known about y-Cruncher for a while, as a tool to help compute various mathematical constants, but it wasn’t until I began talking with its developer, Alex Yee, a researcher from NWU and now software optimization developer, that I realized that he has optimized the software like crazy to get the best performance. Naturally, any simulation that can take 20+ days can benefit from a 1% performance increase! Alex started y-cruncher as a high-school project, but it is now at a state where Alex is keeping it up to date to take advantage of the latest instruction sets before they are even made available in hardware.

For our test we run y-cruncher v0.7.6 through all the different optimized variants of the binary, single threaded and multi-threaded, including the AVX-512 optimized binaries. The test is to calculate 250m digits of Pi, and we use the single threaded and multi-threaded versions of this test.

Users can download y-cruncher from Alex’s website: http://www.numberworld.org/y-cruncher/

y-Cruncher 0.7.6 Single Thread, 250m Digitsy-Cruncher 0.7.6 Multi-Thread, 250m Digits

y-Cruncher is another piece of software that can use AVX-512, but AMD still comes very close. For single threadeded AVX2, the 5.0 GHz CPUs from Intel have a small lead, but in multi-threaded AVX2 the 16-cores with Zen 2 allow AMD to power through Intel's maintream offerings by 25%.

Agisoft Photoscan 1.3.3: 2D Image to 3D Model Conversion

One of the ISVs that we have worked with for a number of years is Agisoft, who develop software called PhotoScan that transforms a number of 2D images into a 3D model. This is an important tool in model development and archiving, and relies on a number of single threaded and multi-threaded algorithms to go from one side of the computation to the other.

In our test, we take v1.3.3 of the software with a good sized data set of 84 x 18 megapixel photos and push it through a reasonably fast variant of the algorithms, but is still more stringent than our 2017 test. We report the total time to complete the process.

Agisoft’s Photoscan website can be found here: http://www.agisoft.com/

Agisoft Photoscan 1.3.3, Complex Test

This variable threaded workload shows the power of AMD's 16 Zen 2 cores at a high frequency. Despite 5.0 GHz all-core turbo being on the 9900K, only having 8 cores lets it down here. Intel's HEDT line of processors just don't have the per-core performance to keep up.

Rendering is often a key target for processor workloads, lending itself to a professional environment. It comes in different formats as well, from 3D rendering through rasterization, such as games, or by ray tracing, and invokes the ability of the software to manage meshes, textures, collisions, aliasing, physics (in animations), and discarding unnecessary work. Most renderers offer CPU code paths, while a few use GPUs and select environments use FPGAs or dedicated ASICs. For big studios however, CPUs are still the hardware of choice.

All of our benchmark results can also be found in our benchmark engine, Bench.

Corona 1.3: Performance Render

An advanced performance based renderer for software such as 3ds Max and Cinema 4D, the Corona benchmark renders a generated scene as a standard under its 1.3 software version. Normally the GUI implementation of the benchmark shows the scene being built, and allows the user to upload the result as a ‘time to complete’.

We got in contact with the developer who gave us a command line version of the benchmark that does a direct output of results. Rather than reporting time, we report the average number of rays per second across six runs, as the performance scaling of a result per unit time is typically visually easier to understand.

The Corona benchmark website can be found at https://corona-renderer.com/benchmark

Corona 1.3 Benchmark

Intel's HEDT chips are quite good at Corona, but if we compare the 3900X to the 3950X, we still see some good scaling.

Blender 2.79b: 3D Creation Suite

A high profile rendering tool, Blender is open-source allowing for massive amounts of configurability, and is used by a number of high-profile animation studios worldwide. The organization recently released a Blender benchmark package, a couple of weeks after we had narrowed our Blender test for our new suite, however their test can take over an hour. For our results, we run one of the sub-tests in that suite through the command line - a standard ‘bmw27’ scene in CPU only mode, and measure the time to complete the render.

Blender can be downloaded at https://www.blender.org/download/

Blender 2.79b bmw27_cpu Benchmark

AMD is taking the lead in our blender test, with the 16-core chips easily going through Intel's latest 18-core hardware.

LuxMark v3.1: LuxRender via Different Code Paths

As stated at the top, there are many different ways to process rendering data: CPU, GPU, Accelerator, and others. On top of that, there are many frameworks and APIs in which to program, depending on how the software will be used. LuxMark, a benchmark developed using the LuxRender engine, offers several different scenes and APIs.

In our test, we run the simple ‘Ball’ scene on both the C++ code path, in CPU mode. This scene starts with a rough render and slowly improves the quality over two minutes, giving a final result in what is essentially an average ‘kilorays per second’.

LuxMark v3.1 C++

Despite using Intel's Embree engine, again AMD's 16-cores easily win out against Intel's 18-core chips, at under half the cost.

POV-Ray 3.7.1: Ray Tracing

The Persistence of Vision ray tracing engine is another well-known benchmarking tool, which was in a state of relative hibernation until AMD released its Zen processors, to which suddenly both Intel and AMD were submitting code to the main branch of the open source project. For our test, we use the built-in benchmark for all-cores, called from the command line.

POV-Ray can be downloaded from http://www.povray.org/

POV-Ray 3.7.1 Benchmark

POV-Ray ends up with AMD 16-core splitting the two Intel 18-core parts, which means we're likely to see the Intel Core i9-10980XE at the top here. It would have been interesting to see where an Intel 16-core Core-X on Cascade would end up for a direct comparison, but Intel has no new 16-core chip planned.

With the rise of streaming, vlogs, and video content as a whole, encoding and transcoding tests are becoming ever more important. Not only are more home users and gamers needing to convert video files into something more manageable, for streaming or archival purposes, but the servers that manage the output also manage around data and log files with compression and decompression. Our encoding tasks are focused around these important scenarios, with input from the community for the best implementation of real-world testing.

All of our benchmark results can also be found in our benchmark engine, Bench.

Handbrake 1.1.0: Streaming and Archival Video Transcoding

A popular open source tool, Handbrake is the anything-to-anything video conversion software that a number of people use as a reference point. The danger is always on version numbers and optimization, for example the latest versions of the software can take advantage of AVX-512 and OpenCL to accelerate certain types of transcoding and algorithms. The version we use here is a pure CPU play, with common transcoding variations.

We have split Handbrake up into several tests, using a Logitech C920 1080p60 native webcam recording (essentially a streamer recording), and convert them into two types of streaming formats and one for archival. The output settings used are:

  • 720p60 at 6000 kbps constant bit rate, fast setting, high profile
  • 1080p60 at 3500 kbps constant bit rate, faster setting, main profile
  • 1080p60 HEVC at 3500 kbps variable bit rate, fast setting, main profile

Handbrake 1.1.0 - 720p60 x264 6000 kbps FastHandbrake 1.1.0 - 1080p60 x264 3500 kbps FasterHandbrake 1.1.0 - 1080p60 HEVC 3500 kbps Fast

7-zip v1805: Popular Open-Source Encoding Engine

Out of our compression/decompression tool tests, 7-zip is the most requested and comes with a built-in benchmark. For our test suite, we’ve pulled the latest version of the software and we run the benchmark from the command line, reporting the compression, decompression, and a combined score.

It is noted in this benchmark that the latest multi-die processors have very bi-modal performance between compression and decompression, performing well in one and badly in the other. There are also discussions around how the Windows Scheduler is implementing every thread. As we get more results, it will be interesting to see how this plays out.

Please note, if you plan to share out the Compression graph, please include the Decompression one. Otherwise you’re only presenting half a picture.

7-Zip 1805 Compression7-Zip 1805 Decompression7-Zip 1805 Combined

Again, AMD's 16-core Zen 2 hardware is breezing past Intel's 18-core Skylake-Refresh family. Even with the added frequency that Cascade Lake will bring, it would be hard to see it able to topple AMD here.

WinRAR 5.60b3: Archiving Tool

My compression tool of choice is often WinRAR, having been one of the first tools a number of my generation used over two decades ago. The interface has not changed much, although the integration with Windows right click commands is always a plus. It has no in-built test, so we run a compression over a set directory containing over thirty 60-second video files and 2000 small web-based files at a normal compression rate.

WinRAR is variable threaded but also susceptible to caching, so in our test we run it 10 times and take the average of the last five, leaving the test purely for raw CPU compute performance.

WinRAR 5.60b3

As a variable threaded workload, WinRAR also probes memory performance. Both the 3700X and 3800X beat the 3950X here.

AES Encryption: File Security

A number of platforms, particularly mobile devices, are now offering encryption by default with file systems in order to protect the contents. Windows based devices have these options as well, often applied by BitLocker or third-party software. In our AES encryption test, we used the discontinued TrueCrypt for its built-in benchmark, which tests several encryption algorithms directly in memory.

The data we take for this test is the combined AES encrypt/decrypt performance, measured in gigabytes per second. The software does use AES commands for processors that offer hardware selection, however not AVX-512.

AES Encoding

Our AES benchmark seemed a bit off - I would suggest we're being memory limited here but the Ryzen 9 3900X scores a lot higher over the 3950X. More investigation needed.

While more the focus of low-end and small form factor systems, web-based benchmarks are notoriously difficult to standardize. Modern web browsers are frequently updated, with no recourse to disable those updates, and as such there is difficulty in keeping a common platform. The fast paced nature of browser development means that version numbers (and performance) can change from week to week. Despite this, web tests are often a good measure of user experience: a lot of what most office work is today revolves around web applications, particularly email and office apps, but also interfaces and development environments. Our web tests include some of the industry standard tests, as well as a few popular but older tests.

We have also included our legacy benchmarks in this section, representing a stack of older code for popular benchmarks.

All of our benchmark results can also be found in our benchmark engine, Bench.

WebXPRT 3: Modern Real-World Web Tasks, including AI

The company behind the XPRT test suites, Principled Technologies, has recently released the latest web-test, and rather than attach a year to the name have just called it ‘3’. This latest test (as we started the suite) has built upon and developed the ethos of previous tests: user interaction, office compute, graph generation, list sorting, HTML5, image manipulation, and even goes as far as some AI testing.

For our benchmark, we run the standard test which goes through the benchmark list seven times and provides a final result. We run this standard test four times, and take an average.

Users can access the WebXPRT test at http://principledtechnologies.com/benchmarkxprt/webxprt/

WebXPRT 3 (2018)

WebXPRT 2015: HTML5 and Javascript Web UX Testing

The older version of WebXPRT is the 2015 edition, which focuses on a slightly different set of web technologies and frameworks that are in use today. This is still a relevant test, especially for users interacting with not-the-latest web applications in the market, of which there are a lot. Web framework development is often very quick but with high turnover, meaning that frameworks are quickly developed, built-upon, used, and then developers move on to the next, and adjusting an application to a new framework is a difficult arduous task, especially with rapid development cycles. This leaves a lot of applications as ‘fixed-in-time’, and relevant to user experience for many years.

Similar to WebXPRT3, the main benchmark is a sectional run repeated seven times, with a final score. We repeat the whole thing four times, and average those final scores.

WebXPRT15

Speedometer 2: JavaScript Frameworks

Our newest web test is Speedometer 2, which is a accrued test over a series of javascript frameworks to do three simple things: built a list, enable each item in the list, and remove the list. All the frameworks implement the same visual cues, but obviously apply them from different coding angles.

Our test goes through the list of frameworks, and produces a final score indicative of ‘rpm’, one of the benchmarks internal metrics. We report this final score.

Speedometer 2

Google Octane 2.0: Core Web Compute

A popular web test for several years, but now no longer being updated, is Octane, developed by Google. Version 2.0 of the test performs the best part of two-dozen compute related tasks, such as regular expressions, cryptography, ray tracing, emulation, and Navier-Stokes physics calculations.

The test gives each sub-test a score and produces a geometric mean of the set as a final result. We run the full benchmark four times, and average the final results.

Google Octane 2.0

Mozilla Kraken 1.1: Core Web Compute

Even older than Octane is Kraken, this time developed by Mozilla. This is an older test that does similar computational mechanics, such as audio processing or image filtering. Kraken seems to produce a highly variable result depending on the browser version, as it is a test that is keenly optimized for.

The main benchmark runs through each of the sub-tests ten times and produces an average time to completion for each loop, given in milliseconds. We run the full benchmark four times and take an average of the time taken.

Mozilla Kraken 1.1

3DPM v1: Naïve Code Variant of 3DPM v2.1

The first legacy test in the suite is the first version of our 3DPM benchmark. This is the ultimate naïve version of the code, as if it was written by scientist with no knowledge of how computer hardware, compilers, or optimization works (which in fact, it was at the start). This represents a large body of scientific simulation out in the wild, where getting the answer is more important than it being fast (getting a result in 4 days is acceptable if it’s correct, rather than sending someone away for a year to learn to code and getting the result in 5 minutes).

In this version, the only real optimization was in the compiler flags (-O2, -fp:fast), compiling it in release mode, and enabling OpenMP in the main compute loops. The loops were not configured for function size, and one of the key slowdowns is false sharing in the cache. It also has long dependency chains based on the random number generation, which leads to relatively poor performance on specific compute microarchitectures.

3DPM v1 can be downloaded with our 3DPM v2 code here: 3DPMv2.1.rar (13.0 MB)

3DPM v1 Single Threaded3DPM v1 Multi-Threaded

x264 HD 3.0: Older Transcode Test

This transcoding test is super old, and was used by Anand back in the day of Pentium 4 and Athlon II processors. Here a standardized 720p video is transcoded with a two-pass conversion, with the benchmark showing the frames-per-second of each pass. This benchmark is single-threaded, and between some micro-architectures we seem to actually hit an instructions-per-clock wall.

x264 HD 3.0 Pass 13DPM v1 Multi-Threaded

GeekBench4: Synthetics

A common tool for cross-platform testing between mobile, PC, and Mac, GeekBench 4 is an ultimate exercise in synthetic testing across a range of algorithms looking for peak throughput. Tests include encryption, compression, fast Fourier transform, memory operations, n-body physics, matrix operations, histogram manipulation, and HTML parsing.

I’m including this test due to popular demand, although the results do come across as overly synthetic, and a lot of users often put a lot of weight behind the test due to the fact that it is compiled across different platforms (although with different compilers).

We record the main subtest scores (Crypto, Integer, Floating Point, Memory) in our benchmark database, but for the review we post the overall single and multi-threaded results.

Geekbench 4 - ST OverallGeekbench 4 - MT Overall

Albeit different to most of the other commonly played MMO or massively multiplayer online games, World of Tanks is set in the mid-20th century and allows players to take control of a range of military based armored vehicles. World of Tanks (WoT) is developed and published by Wargaming who are based in Belarus, with the game’s soundtrack being primarily composed by Belarusian composer Sergey Khmelevsky. The game offers multiple entry points including a free-to-play element as well as allowing players to pay a fee to open up more features. One of the most interesting things about this tank based MMO is that it achieved eSports status when it debuted at the World Cyber Games back in 2012.

World of Tanks enCore is a demo application for a new and unreleased graphics engine penned by the Wargaming development team. Over time the new core engine will implemented into the full game upgrading the games visuals with key elements such as improved water, flora, shadows, lighting as well as other objects such as buildings. The World of Tanks enCore demo app not only offers up insight into the impending game engine changes, but allows users to check system performance to see if the new engine run optimally on their system.

All of our benchmark results can also be found in our benchmark engine, Bench.

AnandTech IGP Low Medium High
Average FPS
95th Percentile

Upon arriving to PC earlier this, Final Fantasy XV: Windows Edition was given a graphical overhaul as it was ported over from console, fruits of their successful partnership with NVIDIA, with hardly any hint of the troubles during Final Fantasy XV's original production and development.

In preparation for the launch, Square Enix opted to release a standalone benchmark that they have since updated. Using the Final Fantasy XV standalone benchmark gives us a lengthy standardized sequence to record, although it should be noted that its heavy use of NVIDIA technology means that the Maximum setting has problems - it renders items off screen. To get around this, we use the standard preset which does not have these issues.

Square Enix has patched the benchmark with custom graphics settings and bugfixes to be much more accurate in profiling in-game performance and graphical options. For our testing, we run the standard benchmark with a FRAPs overlay, taking a 6 minute recording of the test.

All of our benchmark results can also be found in our benchmark engine, Bench.

AnandTech IGP Low Medium High
Average FPS
95th Percentile

Next up is Middle-earth: Shadow of War, the sequel to Shadow of Mordor. Developed by Monolith, whose last hit was arguably F.E.A.R., Shadow of Mordor returned them to the spotlight with an innovative NPC rival generation and interaction system called the Nemesis System, along with a storyline based on J.R.R. Tolkien's legendarium, and making it work on a highly modified engine that originally powered F.E.A.R. in 2005.

Using the new LithTech Firebird engine, Shadow of War improves on the detail and complexity, and with free add-on high-resolution texture packs, offers itself as a good example of getting the most graphics out of an engine that may not be bleeding edge.

All of our benchmark results can also be found in our benchmark engine, Bench.

AnandTech IGP Low Medium High
Average FPS

Strange Brigade is based in 1903’s Egypt and follows a story which is very similar to that of the Mummy film franchise. This particular third-person shooter is developed by Rebellion Developments which is more widely known for games such as the Sniper Elite and Alien vs Predator series. The game follows the hunt for Seteki the Witch Queen who has arose once again and the only ‘troop’ who can ultimately stop her. Gameplay is cooperative centric with a wide variety of different levels and many puzzles which need solving by the British colonial Secret Service agents sent to put an end to her reign of barbaric and brutality.

The game supports both the DirectX 12 and Vulkan APIs and houses its own built-in benchmark which offers various options up for customization including textures, anti-aliasing, reflections, draw distance and even allows users to enable or disable motion blur, ambient occlusion and tessellation among others. AMD has boasted previously that Strange Brigade is part of its Vulkan API implementation offering scalability for AMD multi-graphics card configurations.

All of our benchmark results can also be found in our benchmark engine, Bench.

AnandTech IGP Low Medium High
Average FPS
95th Percentile

AnandTech IGP Low Medium High
Average FPS
95th Percentile

The highly anticipated iteration of the Grand Theft Auto franchise hit the shelves on April 14th 2015, with both AMD and NVIDIA in tow to help optimize the title. GTA doesn’t provide graphical presets, but opens up the options to users and extends the boundaries by pushing even the hardest systems to the limit using Rockstar’s Advanced Game Engine under DirectX 11. Whether the user is flying high in the mountains with long draw distances or dealing with assorted trash in the city, when cranked up to maximum it creates stunning visuals but hard work for both the CPU and the GPU.

For our test we have scripted a version of the in-game benchmark. The in-game benchmark consists of five scenarios: four short panning shots with varying lighting and weather effects, and a fifth action sequence that lasts around 90 seconds. We use only the final part of the benchmark, which combines a flight scene in a jet followed by an inner city drive-by through several intersections followed by ramming a tanker that explodes, causing other cars to explode as well. This is a mix of distance rendering followed by a detailed near-rendering action sequence, and the title thankfully spits out frame time data.

There are no presets for the graphics options on GTA, allowing the user to adjust options such as population density and distance scaling on sliders, but others such as texture/shadow/shader/water quality from Low to Very High. Other options include MSAA, soft shadows, post effects, shadow resolution and extended draw distance options. There is a handy option at the top which shows how much video memory the options are expected to consume, with obvious repercussions if a user requests more video memory than is present on the card (although there’s no obvious indication if you have a low end GPU with lots of GPU memory, like an R7 240 4GB).

All of our benchmark results can also be found in our benchmark engine, Bench.

AnandTech IGP Low Medium High
Average FPS
95th Percentile

The latest title in Ubisoft's Far Cry series lands us right into the unwelcoming arms of an armed militant cult in Montana, one of the many middles-of-nowhere in the United States. With a charismatic and enigmatic adversary, gorgeous landscapes of the northwestern American flavor, and lots of violence, it is classic Far Cry fare. Graphically intensive in an open-world environment, the game mixes in action and exploration.

Far Cry 5 does support Vega-centric features with Rapid Packed Math and Shader Intrinsics. Far Cry 5 also supports HDR (HDR10, scRGB, and FreeSync 2). We use the in-game benchmark for our data, and report the average/minimum frame rates.

All of our benchmark results can also be found in our benchmark engine, Bench.

AnandTech IGP Low Medium High
Average FPS
95th Percentile

Aside from keeping up-to-date on the Formula One world, F1 2017 added HDR support, which F1 2018 has maintained; otherwise, we should see any newer versions of Codemasters' EGO engine find its way into F1. Graphically demanding in its own right, F1 2018 keeps a useful racing-type graphics workload in our benchmarks.

We use the in-game benchmark, set to run on the Montreal track in the wet, driving as Lewis Hamilton from last place on the grid. Data is taken over a one-lap race.

All of our benchmark results can also be found in our benchmark engine, Bench.

AnandTech IGP Low Medium High
Average FPS
95th Percentile

There are many attributes by which a processor is examined. Absolute performance is the one that grabs the headlines – either by way of single core performance, or the chip performance as a whole, most often applied to an arbitrary benchmark. Aside from raw by-the-numbers performance, we also look at popular workloads, where the CPU plays a critical role. Gaming, on the other hand, is where the CPU is still important but often secondary, and so we end up analyzing the relationship between the CPU and the GPU. Beyond performance, we have cost to consider, and if a product is priced appropriately for the market it aims for.

For performance, perhaps the obvious that was expected was observed: AMD’s 3950X knocks down walls that only a couple of years ago seemed impenetrable. When compared to its immediate rivals, The Ryzen 9 3950X smashes through several of our tests published here, such as the Photoscan, Blender, Handbrake, and 7-zip, while CineBench R20 and SPEC in our benchmark database also have some strong numbers.

Because it is competing against Intel’s high-end Skylake Refresh desktop platform (comparing equal core count, rather than price), it can’t compete when AVX-512 is in play, or memory bandwidth is the limiting factor. At some level having this many cores requires extra memory bandwidth, and dual channel isn’t going to cut it. This is one of the downsides of moving ‘mainstream’ up to a HEDT price point, even if it does come with more cores.

There are some other minor points to note – if we compare single threaded performance, despite AMD’s Zen 2 having a general IPC advantage, the Core i9-9900KS is still running at 5.0 GHz for sustained single threaded work, which is still 7-15% higher than the Ryzen 3950X, and as a result it does pull out ahead in a number of ST tests as well as in low resolution (CPU-bound) gaming. At higher resolution gaming, most of the CPUs in our test perform within a fraction of each other.

We did test the Ryzen 9 3950X with both the normal Windows 'High Performance' (HP) power plan, and the 'Ryzen High Performance' (RHP) power plan that comes with the chipset drivers. Overall the RHP showed higher frequencies and 2-3% more performance than the HP, for about +4W on peak power consumption. Now we're in a quandry as to continue with the RHP - it's technically not out of the box performance, however we still have to install chipset drivers with every test, and the RHP is part of the package. We recommend users use the RHP either way, however it would be helpful if Microsoft recognized a Ryzen 3000 series processor and automatically offered / enabled the RHP in the power settings.

Another key point in our testing was power. Using our power consumption test that outputs the processor's internal power metrics, we saw a considerable inflection in overall CPU power consumption when more than 10 cores were loaded. At this level, the two chiplets are over half occupied, so the desire to balance power over cores and reduce frequency occurs. Normally we expect the power to equalize at that peak level across all the cores, however something else is happening on the chip that means a lower overall power level is found. This is perhaps current density on the Zen 2 cores and chiplets becoming a factor, although we are talking about moving from 10W per core to 12W per core, which isn't a considerable jump, but might just be enough of a voltage jump to become a concern at some level. We technically saw this on the Ryzen 9 3900X as well, with the 10-core power consumption being a few watts higher than the 12-core result. But as that chip only has 12 cores, it wasn't much of a concern.

The Ryzen 3000 family also marks a turning point regarding AMD's use of official TDP ratings. We're seeing power consumption values higher than the TDP on the box, where before with Ryzen 1000 and Ryzen 2000 there was some good parity. Note, this isn't as bad as the difference between Intel's TDP-on-the-box and the actual power consumption of its turbo modes, and likely AMD is seeing this market expoitation of TDP vs Turbo as a potential avenue for performance. Ideally we should be getting several numbers on the box explaining this, but as of yet I've not seen any desire to do so. As we move into 280W 3rd Gen Threadripper processors, we'll be inspecting those to see if it changes for those as well.

The High-Level Holistic View: Ryzen Wins

In order to put all of our data into a conclusion that was more meaningful, we took the geometric mean of a cross section of our benchmarks. For the cross section of benchmarks, we covered a range of ST and MT tests, and we plotted a number of chips as a function of price.

A geometric mean (or average) is different to a standard average: e.g. if two benchmarks score 100 and 500 on one chip, but 250 and 350 on the other, it gives the same standard average. But, looking at the data, the second test is 33% lower but the first test is 2.5x higher. Ultimately the second chip might overall be better: both chips are equal in a standard average, but the geometric mean recognizes that 2.5x difference vs only a 33% drop and adjusts the value accordingly.

We get this graph:

In terms of absolute performance across our benchmark range, the Ryzen 9 3950X has the lead. This metric also puts the 3900X above the 9900KS, because despite the 5.0 GHz all-core on 8-cores, moving to 12-core and 16-core at almost the same performance per core gives more of an advantage in our test suite's MT-heavy workloads. As we move to the more expensive HEDT chips, the 16-core and 18-core from Intel, then even with strong AVX-512 performance, it’s not enough to offset other areas.

For users who want the best without going to the high-end desktop, the Ryzen 9 3950X has the best overall score out of all the chips we’ve ever tested.

If you thought the 3900X didn't have enough power, the 3950X is your answer.

Looking forward, we’re going to have an interesting time with the upcoming HEDT chips from both companies, as users who need HEDT will need something more than what the AM4 has to offer. AMD is set to launch its 3rd Generation Threadripper platform on the 25th, and Intel's Cascade Lake-X CPUs are also due out later this month. Stay tuned for those reviews.

Let's block ads! (Why?)


https://www.anandtech.com/show/15043/the-amd-ryzen-9-3950x-review-16-cores-on-7nm-with-pcie-40

2019-11-14 14:00:00Z
52780436397248

Bagikan Berita Ini

0 Response to "The AMD Ryzen 9 3950X Review: 16 Cores on 7nm with PCIe 4.0 - AnandTech"

Post a Comment

Powered by Blogger.