Hacked Up: The Vertex Streaming Hack

Update: There is an issue with the Nvidia drivers that kept buffer storage from being utilized properly on Windows: they do not report the driver version. Since the Linux version of the driver reports its version correctly, the Dolphin devs assumed that the nvidia drivers would report it and used a version check to make sure ARB_buffer_storage was only utilized on drivers that actually support it. Because of this issue, even the latest drivers that support the function failed the version check and Dolphin didn't use buffer_storage on Windows. And thanks to an unrelated bug discovered later, the lack of buffer_storage wasn't detected in the initial testing. As of 4.0-722 the version check was removed and buffer storage is confirmed to be working correctly.

Unfortunately, that version check was there for a reason. Some driver versions tell Dolphin that they support buffer_storage, but actually don't. Any user caught in that situation will encounter a blackscreen. If you do, update your drivers.

One of the constant struggles in modern emulation is the battle between performance and accuracy. Throughout Dolphin's history, developers have added various tricks to get more performance out of computers - one of them being the Vertex Streaming Hack, formerly known as Hacked Buffer Upload. It drastically improves OpenGL performance on Nvidia GPUs. However, as of 4.0-615, the Vertex Streaming Hack has been removed from Dolphin.

Vertex Streaming Hack -
Uses unsafe operations to speed up vertex streaming in OpenGL. There are no known problems on supported GPUs, but it will cause severe stability and graphic issues otherwise.

To explain why it was removed requires a little history lesson on why the hack existed in the first place. The Vertex Streaming Hack emulates an OpenGL extension called "ARB_buffer_storage". Basically ARB_buffer_storage makes GPU memory management much more flexible. For example, a lot of buffer objects tend to not change often during rendering. ARB_buffer_storage makes OpenGL aware of this tendency, allowing it to make assumptions for a performance boost. It also eliminates overhead for buffer updates and allows better control over the buffer overall. Dolphin emulates a console that shares memory between the CPU and GPU, whereas our computers have discrete GPU memory and discrete RAM. So Dolphin has to do a lot of WAITING, waiting to upload to the buffer, render, wait until the render is done, waiting to upload to the buffer, repeat. Having this additional level of control on the buffer really helps Dolphin speed up these cycles, giving a very nice speedboost for the OpenGL backend. There was a little problem though: there was no official way to do this.

In OpenGL 4.3, which was the latest version of OpenGL when the GLSL rewrite was merged and "hacked buffer upload" was put into Dolphin, ARB_buffer_storage did not exist. Technically it was in development as an extension at the time, written as a proposed additional feature for OpenGL 4.4, but it was not public yet. Fortunately someone already beat OpenGL to it: AMD. AMD noticed this potential optimization and created their own function, “AMD_pinned_memory”, to take advantage of it. Degasus realized that this could be applied to Nvidia users with a little black magic, and created the Vertex Streaming Hack. The hack was awful and aroused criticism from other developers, and it created many issues for Dolphin:

  • The Vertex Streaming Hack relied on specific driver behavior, and caused severe problems on anything that didn't support it. The only way to know if it worked or not was to try it, and the only way to prevent it from running on unsupported platforms was to hardcode vendor and version checks. Keeping track of all the specific driver/OS/version combos was a mess.
  • The hack was strictly forbidden by OpenGL spec, using code that was known to cause crashes. The code made no logical sense, much like "dividing by zero", but it worked somehow and the devs had no good alternatives, so they rolled with it.
  • At any point Nvidia might decide to change how their driver manages memory and break the VSH. This is absolutely non-ideal, since we want our software to be usable even in a few years. Also, there's always a latency between the point where something breaks and the point where it gets reported to us, so a problem may be introduced and not be caught for some time.
  • The hack works fine in 99% of cases on supported hardware, but there is a number of games which actually show glitches with the VSH. Because of its "close but not quite perfect"-ness, in almost every case a user has reported a VSH related issue they assumed it was Dolphin's fault and ignored the hack. It was especially problematic on Linux: flickering, broken polygons, etc were very common.

It was hardly ideal, but it worked. With the Vertex Streaming Hack and AMD_pinned_memory in the OpenGL GLSL rewrite, the OpenGL backend went from a slow, accuracy focused backend to a comparable performer to the highly inaccurate D3D9. This move allowed D3D9 to be cut, for a sizeable improvement to the codebase quality, without sacrificing speed or accuracy. We owe a lot to the Vertex Streaming Hack.

But time changes everything. The extension was approved and added into OpenGL 4.4, released in July of 2013. ...but it took a little while for any GPU driver developers to bother supporting all of it's features. In Nvidia's Geforce 332.21 drivers, released very recently, Nvidia now supports ARB_buffer_storage. With an official way to handle the same function the Vertex Streaming Hack was no longer needed, so it was removed.

There are a few caveats though. ARB_buffer_storage relies on OpenGL4 features to work, so it’s only useable on OpenGL4 generation cards. OpenGL3 cards, which previously were able to run the Vertex Streaming Hack, will not be able to run ARB_buffer_storage. That means the Geforce 8000, 9000, 200, and 300 series are losing the Vertex Streaming Hack and do not have anything to replace it. Considering how dated they are in terms of performance in PC gaming, just consider this another nudge to upgrade.

Let’s stress this again: ARB_buffer_storage was only added in the Nvidia's Geforce 332.21 drivers, which are brand new. Any user with a card that supports ARB_buffer_storage yet moves to a new development build without updating their drivers first will see a performance drop. The new drivers are required for this to work. Basically:

Nvidia users should update their Nvidia GPU drivers!*

*Some Geforce 400 and 500 series (Fermi) cards have an ongoing crash issue with Nvidia driver versions above 314.22. A user reported this to the devs but multiple testers have been unable to confirm it, so it appears to only affect a small percentage of Fermi users. Users with those cards should try the new graphics drivers, and if they encounter problems, roll back to Nvidia’s 314.22 drivers and use development builds before 4.0-615. Hopefully Nvidia will correct this error soon. UPDATE: This issue was corrected by the Nvidia 334.89 release drivers.

To AMD GPU owners: none of this really affects you. AMD will support ARB_buffer_storage fairly soon, and we are not sure yet how this will affect AMD_pinned_memory. But there is no reason for AMD to remove a working extension, so AMD will likely support both AMD_pinned_memory and ARB_buffer_storage.

Thanks to neobrain and degasus for help with the technical aspects of this article, and JMC47 and neobrain for proofreading and suggestions.

You can continue the discussion in the forum thread of this article.

Next entry

Previous entry

Similar entries