In light of the recent announcements by NVIDIA and AMD in support of Linux for their graphics drivers, we would like to share with the world some of the experience we had developing our open source project, Dolphin, a GameCube and Wii emulator for Windows, Linux, Mac and recently Android.
At the beginning of this year, after the successful release of Dolphin 3.5, Markus Wick (degasus) and Ryan Houdek (Sonicadvance1) started working on a rewrite of Dolphin's OpenGL backend in order to be compliant to the OpenGL ES 3.0 standard. While this rewrite was needed for other reasons (it provides the foundations for very cool optimizations), compatibility with mobile devices and the future Android port of the emulator (now in beta) was one of the key goals. This rewrite was merged into the main Dolphin codebase a few months back and started to be used by tens of thousands of Dolphin users, either on OS X and Linux where it is the only viable graphics backend, or on Windows where it is available alongside our D3D11 graphics backend.
Sadly, using recent, advanced OpenGL features also meant we got to discover how bad some graphics drivers actually are at doing their job. It turns out very few applications use some parts of the OpenGL standard we need to rely on to accurately emulate a GameCube GPU. More than that, on Android, OpenGL ES 3.0 support is extremely recent and only a couple applications on the Play Store use ES 3.0 features.
Here is basically our hall of shame of graphics drivers, sorted by the number of issues we found, how hard it is to report issues to the company and how many bugs were actually fixed.
Excellent - NVIDIA
We had no occasion to test NVIDIA's response time to driver issues because we actually never had any issue with NVIDIA's OpenGL implementation, either on Windows or on Linux. Before the OpenGL backend rewrite, we actually had one bug happening only on OS X machines running NVIDIA graphics cards, but it’s unclear whether it was due to an undefined behavior on our side or a bug in NVIDIA’s drivers. It basically works great on the subset of OpenGL we use. We could almost have called NVIDIA "perfect" for our use case here, but their drivers actually have a few minor issues that are not directly bugs:
- No support for client side buffer storage (until their very latest beta driver version): because of how emulation works, Dolphin needs client-side buffer storage to be fast. This feature is added to OpenGL by the ARB_buffer_storage extension (part of OpenGL 4.4). AMD drivers started supporting this feature long ago through the AMD_pinned_memory extension. In Dolphin, we have an optional horrible hack that allows us to bypass the lack of this extension by copying data in unmapped buffers (ugghhh), which actually works for our purpose but make us very sad. We could probably blame the lack of support for this feature on the Khronos group instead, but AMD didn’t wait for Khronos to support this use case.
- Licensing for the NVIDIA Graphics SDK that is incompatible with GPL licensed applications (including Dolphin), coupled with bad analysis of GPU performance requirements. The NVIDIA driver often detects Dolphin as an application which requires little GPU performance, which we assume is due to how we have to operate for accurate emulation (lot of stalling on GPU->CPU transfers, leading to very short average command queue length). We could use the NVIDIA Graphics SDK to force the driver to give us a high performance profile, but the GPL forbids us to do so, leading to degraded performance for some NVIDIA users. If relicensing the SDK is impossible, looking for an exported symbol in our binary as it is currently done for NvOptimusEnablement sounds like a fairly good idea to us.
- Getting support or technical answers is difficult for open source developers. Every time we have tried asking NVIDIA about some slowness in their implementation (GL calls serializing in the driver), we received no answer from their official forums. When we reported our NVIDIA + OS X rendering issue, we also received no answer. The NVIDIA official forums also go offline for extended periods of time, making it hard to rely on them for support and/or documentation.
Given that our experience with mobile drivers has been… less than good (this article will talk about that in details later), we are really curious to see how good NVIDIA drivers for the Tegra 4 SoC are. We couldn’t get hold of a device powered by Tegra 4 yet (they are expensive!), but we are really curious to see how Dolphin would perform on them, given that the bottleneck for Dolphin on Android is now mostly GPUs and their drivers.
Good - Mesa
Surprisingly, open source drivers (Mesa) were far from the worst offenders: while one could assume they receive less QA and less professional testing, it turns out that we only found a few bugs in Mesa, all related to a single feature that was supported recently in Mesa 9.0/9.1. More than that, all of the bugs we found in Mesa were promptly fixed after we reported them through the correct channel (freedesktop.org bug tracker). Communication with drivers developers was always good, which is standard in the open source world: when asking on IRC, Mesa developers were very excited about having an application using their new, shiny feature.
<anholt> a consumer of ubos! that's exciting. <anholt> basically, we got them working, and said "when a workload shows up, we can optimize"
Here are the four bugs that we found in Mesa drivers, all related to handling of a feature called Uniform Buffer Objects (UBO), and all fixed in stable versions of the drivers:
- Wrong GLSL built-in functions results with UBO arguments, fixed in Mesa 9.1.1, 7 days after we reported the bug.
- Dithering effects with UBOs, centroid and i965, fixed in Mesa 9.2.0.
- Mesa bug causing black screen on r600 (radeon), patch provided 10 minutes after we reported the issue on IRC, merged the same day and released in a stable version 2 weeks later (9.1.3).
- Nouveau black screen issue with UBOs larger than 64KB, debugged by one of our developers, merged 1 day after the bug was reported and released in a stable version a week later (9.1.6).
Our use of UBOs also lead to a big redesign of the implementation of that feature in i965: Eric Anholt, a Mesa/Intel developer rewrote most of the UBO data fetching code, leading to performance and accuracy improvements with Dolphin.
To the Mesa developers’ defense: when we started using UBOs with Mesa, they were a very new, just released feature and very few applications actually used them enough to cause these bugs. There were no Piglit tests for UBOs offset, and maybe that test suite should be expanded a bit more to cover less used features. Once again, we do not expect professional QA, and were delighted when the Mesa developers took our bug reports seriously and promptly came up with proper fixes and/or debugging steps.
Our main problem now is that Linux distributions are very slow to upgrade and do not provide an easy way to upgrade Mesa without upgrading the rest of the system (unlike on Windows). This means our application is still broken for users of LTS releases of some Linux distributions.
OpenGL version support is also not as good as we would like it to be, but we can’t really complain about open-source drivers, especially for NVIDIA GPUs (nouveau gets basically 0 support from NVIDIA). We consider ourselves lucky to have open-source drivers in a good enough shape to run heavy applications like Dolphin properly, especially when compared with the state of some proprietary drivers.
We are now also using Mesa's software renderer LLVMpipe extensively for one of our Continuous Integration projects, and were not able to find any visible bug yet that was not already fixed in unreleased versions (the only bug we found was a crash due to wrong AVX detection, and had been fixed 2 weeks before we noticed this issue).
Good - Intel HD Graphics on Windows
Intel integrated graphics chipsets are usually not a good match for Dolphin: while the GPU usage of the emulator can be decreased significantly, even our minimal profile is usually too slow to run on Intel IGPs. However, we are not aware of any bugs in the OpenGL implementation present in Intel’s IGP drivers on Windows impacting Dolphin. Unfortunately, the supported feature set is also very reduced: Intel HD3000 IGPs, the most common IGPs used with Dolphin according to a quick sampling of our forums users, only support up to OpenGL 3.1 and Shader Model 4.1 on Windows (OpenGL 3.3 is supported on latest versions of OS X).
A weird thing about Intel OpenGL drivers concerns Intel Ironlake IGPs. These IGPs support everything needed for OpenGL 3, except for MSAA support. Because of this one missing feature, Intel decided to not implement any OpenGL 3 features for Ironlake, making it impossible for us to support this particular IGP.
We are also kind of disappointed that IGP manufacturers don’t push more for OpenGL extensions related to Unified Memory, given that they would be the ones to benefit the most from it. Intel currently only provides the INTEL_map_texture extension, which could be useful for texture mapping but provides no support for buffers mapping. AMD seems to be leading this effort in the console world, and it’s very likely this will come back to the PC through OpenGL extensions at some point. Since the GameCube and the Wii are Unified Memory architectures, being able to do the same on PC would help us a lot and potentially make Intel IGPs usable with Dolphin.
Mediocre - AMD
We had the most issues with AMD when using their proprietary graphics driver on Linux, fglrx/Catalyst. A lot of issues that do not happen on Windows are present on Linux, sometimes with a very visible effect in our emulator.
One of the most widespread issues we had with AMD drivers actually corrupts textures when an application asks the driver to create mipmaps. Here is what it looks like on a very simple textured cube demo for the Wii, running in Dolphin:
This happens when creating a GL_UNSIGNED_SHORT_5_6_5 texture and running glGenerateMipmap. Our first complaints about this bug started more than 2 years ago. A thread was started on the AMD developer forums, only to be ignored and deleted when AMD moved to a new developer forums system a year later. To this day we are not sure if this bug was ever fixed, but due to changes in the way Dolphin emulates mipmapping, this is not an issue for us anymore.
The quality of the code in the userland AMD driver looks horrible from the outside: using valgrind on a program using the AMD driver causes valgrind to complain about the large number of errors (ioctls using unintialized structures, access to unintialized memory). In some error cases, instead of reporting an error to the caller, the AMD driver will simply call exit(123) and kill the whole application. This kind of issues impacted SDL 1.3: calling XCloseDisplay caused the driver to exit. A workaround was put in place later in SDL 2.0 to avoid this problem which should have never happened in the first place. Fun fact: this bug was found while writing a minimal program that reproduce the mipmapping issue…
But bugs don’t only happen on fglrx: the Windows AMD driver also has a few major bugs. AMD supports a form of client-side buffer storage that would be extremely useful for Dolphin. It is exposed via the AMD_pinned_buffer extension. Using AMD_pinned_buffer with Vertex Buffers or Uniform Buffers works perfectly, but trying to use it with Index Buffers starts rendering random polygons. Because of this issue, we had to stop using AMD_pinned_buffer for Index Buffers, leading to decreased performance for AMD users of our OpenGL backend.
To this day we’re still not sure how to report fglrx bugs to AMD: we haven’t seen developers reply to bug reports on their official forums, and while there is an unofficial bug tracker for fglrx issues it does not seem to be looked at by AMD developers and keeps accumulating new issues.
On the other hand, AMD is making steps that will help Dolphin a lot in the future: pioneering Unified Memory Access for graphics APIs, and working on Mantle, a new API that exposes more low-level GPU features to applications. If these last two improvements come together, it could potentially make AMD GPUs the best platform for high-gen console emulation.
Bad - ARM/Mali
Let’s start talking about mobile GPU drivers. For us open-source developers, this is where things start becoming a nightmare:
- Only binary blobs, no good documentation about the architecture
- Drivers limiting performance of the devices when the GPU could perform faster
- Phone manufacturers and operating systems developers bragging about support of features their hardware could have had for the last 2 years (OpenGL ES 3.0) if drivers were not so bad
- Very few applications using advanced OpenGL features, thus making Dolphin reveal how broken some of the GLES3 features are.
- Driver developers that seem to be doing less QA than open-source teams like Mesa
The Mali GPUs, made by ARM, are not the worst offenders. In fact, Dolphin on Android now runs properly on Mali devices, after a lot of time spent working around driver bugs and general slowness.
Mali has developer forums where bugs can be reported. However, our experience when reporting bugs was “yes, we know this is broken” followed by a lack of replies. Even Google seems to be working around Mali issues in Chromium, suggesting that even they probably cannot get ARM to fix their bugs or implement missing features advertised as supported.
Here is our current list of bugs present with Mali drivers:
- glBufferSubData and glMapBufferRange stall the GPU driver and cause extreme slowness that does not happen on most other drivers. This is an issue that was independently found by Google and worked around.
- glBufferSubData creates copies of data and fails to free them, causing out of memory after a few seconds of usage of Dolphin. Again, that issue was independently found by Google and documented.
- Clearing color with glClear kills performance as if the drivers waits for completion of the clear when it should be mostly asynchronous. Sometimes glClear takes more than 1/60th of a second (a full frame!) to complete, making it completely unusable. This has been reported by other Android developers too: on the Ouya forums or on StackOverflow.
- The Mali shader compiler does not support global shader variables which aren’t constant. This is a violation of the OpenGL ES 3.0 standard, but an easy fix on our side.
The glBufferSubData issues required us to change our vertex streaming code to a slower solution in order to work around, which was a significant effort. Luckily it later turned out that it doesn’t only work around Mali issues (as you will see below).
These are not just minor issues like small rendering artifacts or slight performance issues. Here, basic features like glClear are completely unusable, and we can’t really understand why developers are ok with this.
We are following the LIMA project very closely. LIMA people are reverse-engineering the Mali driver blob and working on their own driver for the hardware. In a year of work, they managed to create a driver supporting a few 3D demos, including Quake 3 Arena which already runs at a higher framerate than the official Mali driver. Their main issues currently involve the shader compiler, which requires a full understanding of the ISA of Mali GPUs. We have good hopes that they will make the situation better on Mali devices in the next few years - and less good hopes that ARM will actually start providing decent drivers for their devices.
Horrible - Qualcomm/Adreno
Where do we even start? First, a good thing: Qualcomm and Mali definitely seem to be sharing some code in their graphics drivers. It’s only unfortunate that this code is handling glBufferSubData and glMapBufferRange. That’s right: the Mali bugs causing these two functions to be slow and unusable (out of memory condition after a few seconds of runtime) are also present in the Qualcomm drivers for Adreno devices! At least our vertex streaming changes to support Mali devices were useful for more than one driver.
But that’s not all:
- The shader compiler reports an “internal error” when dynamically indexing a Uniform Buffer Object. Reported to Qualcomm, acknowledged as a bug, still not fixed to this day: bug report.
- Support for the centroid attribute in OpenGL ES 3.0 is broken. Simply not working: when using it, our output is a white screen. Reported to Qualcomm 3 months ago, confirmed as a bug, no fix yet: bug report.
- Shader info log related functions completely mess up the length values, always returning a length of 0 for shader compilation info and always truncating info messages to 1023 characters even when a larger buffer is provided. This does not help with debugging, especially when their shader compiler floods the log with meaningless warnings about the creation of a macro. Reported to Qualcomm 3 months ago, no replies from anyone: bug report.
- Looking at the decompiled command stream for Adreno devices, support for ARB_draw_elements_base_vertex should be trivial. It is actually supported by open source drivers for the Adreno devices. The only reply from Qualcomm about this was 6 months ago, basically saying “maybe”.
- glBlitFramebuffer to a hardware buffer rotates the output of the buffer by 90°. You just can’t invent these kind of things. How does that even happen? Meanwhile, other developers report basic glBlitFramebuffer calls crashing on Adreno.
- Some of the shaders generated by Dolphin crash the Adreno shader compiler. We are still debugging this issue so we don’t have any specifics, but disabling Uniform Buffer Objects support in our shader generator fixes this issue, leading us to think a memory corruption issue in the Adreno shader compiler is at fault here.
- Until recently (V43), the Adreno kernel-land driver had a size limit on their instruction buffer, killing applications when their user-land driver generated instruction streams that were too long. This required people to root their devices to disable this size limit in order to run our application. See our bug report for more information.
We are still investigating more bugs happening only with Qualcomm/Adreno driver which lead to nice, trippy graphics rendering and device crashes.
Looking at the Qualcomm developer forums, we are far from the only ones getting issues with Adreno drivers and OpenGL ES 3.0. This recent message complains about crashes in Unity 3D applications. Another older message relates to out of memory conditions and random results when indexing tables in a vertex shader.
Luckily, we were helped by the Freedreno developers when porting Dolphin to Android Qualcomm devices. Like LIMA, Freedreno is an open source driver for Adreno devices, in a mature enough state to run large applications like XBMC on top of an open source stack. Once again, Rob Clark from Freedreno did not get any help from Qualcomm: to provide a better driver for everyone, he had to reverse engineer the blob provided by Qualcomm and write his own shader compiler backend on top of the Mesa infrastructure. One person, working mostly alone, produced better quality drivers than a whole team working at Qualcomm. And because nobody seems to care about these issues, his work is not used by any major phone manufacturer or in any Android version that we know of. Rob was more helpful than the Qualcomm team when we were confronted with Adreno issues, and we really thank him for his support.
Unknown - PowerVR
PowerVR currently does not support OpenGL ES 3.0 in publicly released software/hardware, which makes it impossible for us to test how well their drivers support Dolphin. We're hoping to learn more about this when we finally get ES 3.0 support for PowerVR.
One of the reasons we wrote this post is to give some attention to the extremely bad state of mobile GPU drivers. If people want mobile to become a serious contestant for graphics intensive applications, they will have to fix these issues, and it looks more and more like Qualcomm and ARM will not be able to develop proper drivers in the future or support newer versions of graphics APIs quickly. NVIDIA stepping into the mobile world might just be the best thing that has happened for mobile graphics developers; while Dolphin cannot run properly on Tegra 3 devices because of one single missing feature (24bit depth buffer - Tegra 3 is limited to 16/20 bit depth), we are hoping we can get our hands on a Tegra 4 device in the future and that it will work as well as NVIDIA drivers on Windows and Linux.
AMD recently announced that they wanted to support Linux more, and to give them some credit they have released a lot of documentation about their GPUs internals (at least compared to NVIDIA) and contribute a lot to open source drivers (being the largest Gallium3D contributors). But so far, NVIDIA drivers are just plain better on Linux. We cannot wait to see what Mantle, the new graphics API from AMD, will bring to the table. While we currently recommend our users to not use AMD CPUs or APUs because of their terrible single-core performance (see this PCSX2 benchmark), bringing good support for Unified Memory Access support to their APUs could improve Dolphin performance a lot and reduce our high time spent waiting for copies and streaming vertices.
Thanks to Ryan Houdek for creating the initial list of bugs I used to write this article.
Thanks to Tony Wasserka, Markus Wick, Matthew Parlane and Paul Olszewski for helping proof-read and fact-check this article.