PCI-Express 4.0 vs 3.0 Video Card Performance

Always look at the date when you read an article. Some of the content in this article is most likely out of date, as it was written on November 30, 2020. For newer information, see

Some of the content in this article is most likely out of date, as it was written on. For newer information, see our more recent articles.

Introduction

PCI-Express has been the standard for connecting video cards and other expansion devices inside of computers for many years now, and several generations of the technology have now passed. With each of those generations, the amount of data that can be transferred over the PCIe connection has increased. How much impact does that have on modern video cards? Is there any benefit to running a PCIe 3.0 card in a 4.0 slot, or loss if using a 4.0 card in a 3.0 slot? We frequently get questions like these during the consultation process with new customers, so I thought it would be worth taking some time to test in order to better answer these queries.

PCI Express logoPCI Express logo

Test Methodology

In order to test the impact of PCI-Express bandwidth on performance, we are going to look at two video cards in a system where we can control the PCIe slot generation within the BIOS. Why two cards? Because one is natively PCIe 3.0 (NVIDIA’s Titan RTX) while the other uses PCIe 4.0 (their GeForce RTX 3090). Both have 24GB of memory, to avoid the amount of onboard VRAM affecting anything, and are effectively the top performing card from their respective GPU families. To minimize the impact of the CPU, we went with the top-end of AMD’s new Ryzen 5000 series, the 5950X, installed on a Gigabyte X570 AORUS ULTRA motherboard which has a BIOS setting for selecting which PCIe version is used.

PCIe Slot Configuration Options in Gigabyte X570 AORUS ULTRA Motherboard BIOSPCIe Slot Configuration Options in Gigabyte X570 AORUS ULTRA Motherboard BIOS

Here are the full specifications of the system we used for this testing:

With this hardware configuration, we tested each of the two video cards in each of the four PCIe Slot Configuration settings (Gen1 through Gen4). Most of the questions we get from prospective customers center around PCIe Gen3 and Gen4, but by going further back with our tests we can get a better picture of how PCIe bandwidth impacts video card performance. For example, PCIe Gen2 on a full x16 size slot (which these video cards were using) is roughly equivalent in bandwidth to PCIe Gen3 x8, and that is a common setting for motherboards to use when running multiple video cards on chipsets that don’t have a massive number of PCIe lanes available. Likewise, PCIe Gen1 at x16 should be comparable to PCIe Gen3 at x4 – and PCIe Gen3 at x16 is on par with PCIe Gen4 at x8.

Finally, on the software side, we used a handful of benchmarks across three main types of applications:

  • OctaneBench, Redshift Demo, and V-Ray Next Benchmark for GPU based rendering
  • PugetBench for DaVinci Resolve Studio and Neatbench for post-production (video editing)
  • Unigine Superposition (at a couple of different resolutions) for game engines

Benchmark Results

Here are the results of our testing, split into galleries by application type, with some analysis after each set of charts:

GPU Based Rendering Engines

Gallery ImageGallery Image
Gallery ImageGallery Image
Gallery ImageGallery Image

Previous

Next

None of these rendering benchmarks show much difference in performance between the various generations of PCI-Express. There is a slight curve in Redshift, with about an 8-second slowdown from PCIe Gen4 to Gen1 on the RTX 3090 and 5 seconds on the Titan RTX. V-Ray Next shows nothing outside the test’s margin of error, and while there is a small drop on OctaneRender it is only around 2% (so that may well be within the margin of error too).

It is worth remembering the way that GPU rendering works, though: scene data is sent to the card over the PCIe connection, and then the processing is all done on the video card(s), then the resulting image is sent back to the system to be displayed and/or saved. The speed of the PCIe bus is going to impact how quickly the data can be moved back and forth, but won’t impact the actual computations happening on the card. That probably explains why we see so little impact from the older versions of PCI-Express in this test.

There are also some notable exceptions to this, which are simply outside the purview of these benchmarks. For example, some rendering engines support “out of core memory” – which is where some of the scene data is stored in main system memory if there isn’t enough dedicated video memory on the card(s) themselves. In that situation, there would be a lot more data being transmitted over PCI-Express, throughout the rendering process, and thus the speed of that connection would be a lot more important.

Post Production

Gallery ImageGallery Image
Gallery ImageGallery Image
Gallery ImageGallery Image

Previous

Next

Post-production makes much more interactive use of the video than rendering, so here we see large performance differences across the various PCI-Express generations. DaVinci Resolve shows a steady drop from PCIe Gen4 down to Gen1 on the RTX 3090, while the Titan RTX has effectively no difference between Gen4 and Gen3, but then drops when using Gen2 and Gen1.

NeatBench, which tests the Neat Video noise reduction algorithm’s performance, shows an even larger reduction when using the older version of PCI-Express… to the point where an RTX 3090 on Gen1 is only providing half the speed of the same card running on Gen4. Again, though, the Titan RTX doesn’t benefit from Gen4 vs Gen3 – presumably, because it is a Gen3 card itself, so even if the system it is in is capable of the newer PCIe Gen4 speeds the Titan is stuck at Gen3.

Game Engines

Gallery ImageGallery Image
Gallery ImageGallery Image

Previous

Next

We don’t generally test game performance here at Puget Systems, as so many other outlets already look at that subject in great depth, but I thought I would try seeing if Unigine Superposition showed any differences across the different PCIe speeds. It did not. There is a small, sub-2% difference – similar to what we saw with OctaneBench – but even if that is being cased by the generational difference it is so small that a variance like that would not be noticeable when playing games. Again, this is likely due to the way this benchmark works: if all of the data for the test scenes can fit in the system memory, then once it is loaded up at the start the speed of the PCIe bus will no longer matter. Real gaming would see a different usage pattern, but even there I suspect that PCIe Gen4 vs Gen3, at least, would have no measurable performance impact.

Perhaps we can revisit this with more of a focus on game development in the future, as my colleague Kelly Shipman has been doing amazing work on testing the Unreal Engine.

Conclusion

For applications where data is constantly traveling across the PCI-Express bus, we can see that the generational bandwidth differences do have a very measurable impact on real-world performance. The best examples of that in the tests we conducted for this article were those looking at post-production & video editing, which exhibited substantial gains moving up from PCIe Gen1 to Gen2, moderate gains from Gen2 to Gen3, and then a small boost from Gen3 to Gen4 on the RTX 3090 (which is, itself, a Gen4 card). The Titan RTX, a Gen3 card, did not show a difference between running on PCIe Gen3 vs Gen4.

Other programs where data is only sent across PCIe before and after a long calculation did not see that sort of difference, however. At least within the manufacturer benchmarks we utilized, there was at best a small gain when using the latest Gen3 and Gen4 speeds – but definitely nothing like what we saw with video editing.

In the end, though, the PCI-Express Gen1 and Gen2 stuff is mostly an academic question. Virtually all modern motherboards are going to run at PCIe Gen3 or Gen4, and if running a single video card then they pretty much all will offer full x16 lane support as well. This gets a little trickier when running multiple video cards, which is common for some of these professional workloads, because while the PCIe generation isn’t going to drop the number of lanes available per slot/card definitely can. PCIe Gen3 at x8 lanes is going to be roughly on par with PCIe Gen2 at a full x16 lanes, and Gen3 x4 is close to Gen1 at x16… so depending on your exact motherboard and GPU configuration it is entirely possible to end up with lower bandwidth per card. The good news is that this looks like it will have little negative impact on GPU based rendering, which is one of the places where having a lot of video cards can really shine – but if you are working with video editing or some other application that depends on sending a lot of data back and forth to the graphics card(s), then it is a good idea to ensure that your system is providing the most bandwidth possible over PCI-Express.

Does putting a PCIe Gen3 video card in a Gen4 slot improve performance?

No, if the graphics card itself is PCIe 3.0 then putting it in a faster 4.0 slot will not provide any benefit since they will be operating at Gen3 speed.

Does putting a PCIe Gen4 video card in a Gen3 slot reduce performance?

In some applications, yes – there can be a small performance drop when running a PCI-Express 4.0 capable card in a system/slot that is only using PCIe 3.0. We did not find any impact for gaming or GPU-based rendering, but we did measure a small decline (less than 5%) with video editing in DaVinci Resolve and a little bit larger drop (~10%) with noise reduction in Neat Video.