Dolphin Progress Report: January 2015

Progressreportheader-January2015.jpg


Let's kick off the new year with a bang! January will finally let Dolphin answer the question that gets asked every progress report: "Does Rogue Squadron work yet?"


Rogue Squadron II: Rogue Leader in 1080p 60 fps with Dolphin


Thanks to a ton of work from the staff, tons of testing from the forum users, hardware tests, newcomers and veteran's alike, Star Wars Rogue Squadron II: Rogue Leader and Star Wars Rogue Squadron III: Rebel Strike are both playable and completable in Dolphin at long last.

Considering just how many big merges were changed and how much work was done that may not even be the biggest news of the month. So hold tight, and please enjoy this month's Notable Changes!


Notable Changes


4.0-4963 - Faster Memory Management Unit (Part 2) by Fiora

A few months ago Fiora as much as doubled the performance of MMU games through improvements like the "far code" cache, implementing paired loads/stores in MMU mode, and a few other tweaks. Regardless of all of her, and other developers, optimizations, MMU mode still remained a very demanding feature.

Developers attacked the problem again over the past two months, with a goal of reducing MMU overhead as much as possible. While there are quite a few MMU games, the goal was to get Rogue Squadron 2/3 near full speed on current hardware before it was playable.

There were many changes revolving around two basic ideas: eliminate as much of the possible impact of the MMU on all code in the game that didn't actually use those features, and shorten the address translation code path, from memory loads and stores to their associated page table lookups, as much as possible. skidau started by fixing block linking to work in MMU mode, and then magumagu extended this by improving fastmem to support MMU too.

Fiora then painstakingly assembled roughly two dozen MMU-related patches, including optimized paired loads and stores, exception checking and TLB lookups while fixing a number of bugs that could cause random crashes in MMU titles.

The overall performance improvement of Fiora's "Faster MMU 2" was on the order of ~80% in the Rogue Squadron titles and significant amounts in other MMU titles. Mixed with the performance improvements from magumagu and skidau, some MMU titles are nearly twice as fast as just a month ago!


fastermmu2.svg


The MMU performance improvements have also inspired magumagu to start work on a variety of large-scale, much-needed changes to unify and correct Dolphin's memory handling -- which we hope may lead to the other two titles in the Disney Trio of MMU Destruction* to actually work.

*The Disney Trio of MMU Destruction includes Toy Story 3, Cars 2, and Disney Infinity. skidau got Toy Story 3 booting, meaning that only two titles remain before the trio has fallen.


4.0-5024 - Support Overriding the Emulated CPU Clock by Fiora

This is one of those nifty enhancements that has been talked about for a few months now. By overriding the clockrate of the GameCube/Wii CPU, users can affect games in quite a few ways.


Original intended use

  • Variable Framerate Games - Some games support variable framerates, such as The Sims series, Gauntlet Dark Legacy, Spyro the Dragon: Enter The Dragonfly, Crash Bandicoot: Wrath of Cortex, The Last Story and many others. Depending on the CPU load, they will swap between 20, 30, and even 60 fps. By giving the GameCube/Wii more processor horsepower, Dolphin can now allow these games to run at their maximum possible framerate at all times.

Spyro: Enter the Dragonfly running at 60 FPS


Most 30 FPS games are not variable framerate titles, and won't run at a higher framerate even if the CPU is overclocked to 400% without some kind of game specific patch/hack. Anyone willing to try to make some should go for it, as the difference we've seen already is immense!


Other discovered uses

  • Cycle Accuracy Issues - Dolphin is not a cycle accurate emulator, so sometimes its emulation of the GC/Wii CPU is not accurate enough in a game. Several games have videos that rely on IPC (Instructions Per Clock) very carefully, such as The Legend of Zelda: Ocarina of Time Master Quest's promotional videos. By being able to overclock or underclock the processor, users can avoid hangs caused by Dolphin's CPU speed inaccuracies. In the future, the goal will be to make Dolphin's CPU emulation better so that a hack is not needed to make any game run correctly. In fact, there are already plans to get this ready; so anyone interested in the actual implementation would likely be able to jump right in and help. Underclocking can even help with weird glitches like the sun flickering in Dualcore with The Legend of Zelda: The Wind Waker.

  • Speedhack - In some games, this feature can be used as a speedhack. By lowering the clockrate of the emulated CPU, it reduces the demands the emulated wii places on the users machine - making it much easier to emulate the game without slowdown. While the actual game may run worse (choppier in some games, slower in others,) it can be preferable to the audio stuttering/unpredictability if the host CPU is too slow to emulate the game.

Fire Emblem: Path of Radiance running Full Speed Nexus 9 with CPU Clock Override


4.0-5061 - Allow Locked L1 DMA to Write to the EFB by magumagu

This is a relatively self-contained fix; but it's definitely worth noting as it allows several games to boot properly. Several games based on Nickelodeon properties used this specific method to render cartoon cutscenes; which inevitably ended in failure as Dolphin didn't emulate this at all. With the fix, things will render properly, but because of how slow it is to emulate EFB Pokes currently, people playing these games may want to disable this feature by enabling "Skip EFB Access to CPU" in order to get into the game faster.

With a feature that lets Dolphin write more than one pixel per poke, it should be possible to make these videos work fullspeed; it's just a matter of implementing it.


4.0-5124 - Texture Pooling by degasus

A year ago degasus merged an amazing optimization called texture pooling. Texture pooling is a cache for unused texture objects. Allocating and freeing these resources isn't an easy task, especially for OpenGL. As such, people would notice that refraction effects in games like Metroid Prime would bring OpenGL to its knees, while only mildly bothering D3D.

In the old system, if the texture cache entry didn't match, Dolphin would free it and create a new one. With texture pooling, Dolphin doesn't free it but instead pushes it into a pool. And instead of creating a new one, it will also check first within the pool to see if the texture already exists. Texture pooling resulted in absolutely massive speedups in games that hit this bottleneck.

Unfortunately, without the necessary cleanups and work done, it caused crashes and other issues and had to be reverted despite its huge potential as a performance enhancement. Now, after several texture cache cleanups and a much more carefully coded patch, texture pooling returns back and better than ever. This is an absolutely massive speedup in some games, in Harvest Moon: A Wonderful Life this can amount to a 1000% speedup during nighttime sequences!


TexturePoolingPerf.svg


In most games, there will be more moderate speedups. The Metroid Prime series will see great improvements on a lot of their special effects. It seems that almost every game benefits from the increase in efficiency to the tune of 5 - 15%.


4.0-5143 - Vertex Loader JIT X86-64 by Tilka

Dolphin's Vertex Loader was one of the obvious bottlenecks that seemed like a low hanging fruit. Fiora showed significant performance gains a few months ago through basic optimizations of our existing Vertex Loader, but it was still a very primitive sort of JIT; it was well known it needed a proper rewrite for ideal performance.

Once degasus did the necessary cleanup and preparation; he passed on the task of rewriting the x86-64 Vertex Loader onto Tilka. After a few weeks of struggling, and a few odd regressions Dolphin's brand new Vertex Loader JIT was merged.

What this does is more efficiently convert vertices passed from the emulated GPU into a format usable by the host GPU; meaning less CPU overhead. How big of a benefit mostly depends on if, and how much, the game was bottlenecked on the vertex loader, but in some areas on Rogue Squadron II the game can be up to 50% faster in Vertex Loader-limited scenes, like the ship bay with 10,000+ polygon ships!


VertexLoaderJITPerf.svg


The one catch is that the Vertex Loader JIT relies on SSSE3, so only SSSE3-supporting CPUs (Core 2 and newer for Intel, Bulldozer and newer for AMD) will benefit from the speedup.


4.0-5190 - Emulated GameCube Keyboard Support by skidau

This one is pretty self-explanatory; users can use configure their keyboards to allow them to type in GameCube games and Homebrew that support the GameCube Keyboard Controller, such as Phantasy Star Online. Another awesome, obscure GameCube peripheral emulated in Dolphin! Unfortunately GameCube Controller Adapters, including Native GameCube Controller Support, will not allow you to plug in and use a GameCube Keyboard, since the adapters do not transfer serial input directly. The one exception to this would be the Raphnet Adapter, since it could convert it into ordinary keyboard presses; but it is currently unknown if it will work properly. If anyone finds out, please let us know in the comment thread.


Phantasy Star Online I & II with Working Keyboard


4.0-5205 - Make Paletted Textures Less Broken on EFB Copies to Texture by mimimi

Sometimes, when digging through the code some funny little problems can be found. In this case mimimi realized that the texture cache for paletted textures was completely broken. This meant that when using emulating framebuffer copies to texture, they would be a garbled mess. When properly sending the emulated framebuffer copies to the emulated ram, the texture cache would have to be disabled or else the textures would not detect they needed to update.

What caused this oversight? Technically, it actually spawned from a speedhack from ancient times that made paletted textures a lot faster in Dolphin. But, users who have already updated may have noticed that mimimi's quick fix doesn't cause any performance regressions. It turns out that the previously mentioned Texture Pooling merge prevents the slowdown that this merge would have caused!


ztpmap-efb2texbroken

"Hey! Listen! Check out the... nevermind, you're on your own!"

ztpmap-efb2tex

While it's faint, the map now works with EFB to Texture!



targetingcomputerbroken

"Luke... you turned off your targeting computer, is something wrong?" Yes, yes there is.

rs2targetingworkingefb2tex

Still not quite right, but it works!



ztpmap-efb2ram

Twilight Princess mini-map with EFB2RAM set.

Sonic Colors

Star War's Rogue Squadron II Targeting Computer in EFB2RAM.



If that wasn't good enough, by fixing how they were handled, EFB copies to RAM will no longer need safe texture cache for paletted textures! This isn't a complete solution though - all it's doing is reinterpreting the EFB copy as a paletted texture. In one unfortunate case, this missing functionality breaks games like Dragon Ball Z: Budokai Tenkaichi 3 in EFB copies to texture. Anyone experienced in graphics programming could likely write a GPU decoder for this and fix not only that, but also get Twilight Princess, Rogue Squadron, and the rest of the affected games working perfectly with paletted textures without needing the expensive EFB copies to RAM option.


4.0-5225 - Light Attenuation Fixes by NanoByte011

Dolphin's lighting code is not one of its bright spots. It's to the point where previous attempts to sort out what it was doing and compare it to how console works left the coder dismayed to the point of not wanting to mess with it. NanoByte011 being relatively new to the project, did not realize this and ended up solving a lot of Dolphin's weird lighting problems while reshuffling Dolphin's light attenuation code.

fifoci-lighting

FifoCI shows off the exact difference that this change brings to Mario Power Tennis.

It's really hard to say how much this actually fixes. While there are a few big examples where known issues were fixed, a majority of Dolphin's lighting problems were relatively minor and hard to notice. There could be hundreds of games that perform more like their hardware counterparts.


4.0-5234 - Improve Custom Texture Handling by degasus

A feature that often goes overlooked by users is that Dolphin has the ability to dump textures from games, so users can modify them and then reload them into the game through the "load custom textures" option. By doing this, users can create all kinds of textures, but by far the most popular use of this is for high definition texture packs. By placing these texture packs in the load directory and enabling the option, Dolphin can greatly enhance the visual fidelity of the game in question.

The most expansive HD Texture Pack to date is for Xenoblade Chronicles. While the texture pack is a massive work of art, the people behind it were experiencing problems. Namely, Dolphin's way of handling paletted textures was insane; sometimes there would be thousands of duplicates of the same texture, and for the HD texture to be guaranteed to work; they'd have to replace every single one. degasus does away with that design issue and adds a bunch of new enhancements to make HD Texture Packs easier to make and use. Hopefully with these changes, Dolphin will see many more custom texture packs in the future.

Do note, that compatibility with older texture packs will be broken by this. Dolphin will currently convert old format custom textures into new ones as they are loaded if an INI setting is enabled, but that functionality WILL be removed eventually. All users actively working on or managing custom texture packs are advised to convert their texture packs to the new format. Details are available on the forums.


4.0-5279 - Add zfreeze Emulation to Hardware Backends by neobrain, phire, and NanoByte011

zfreeze is a notable feature of the GameCube/Wii GPU with no real equivalent on modern PC GPUs. It can "freeze" the depth value for pixels in a polygon to an arbitrary reference plane. The intended use for this was to combat z-fighting, that ended up being used in a variety of ways by different games. While this sounds like something that should be fairly easy to emulate, it definitely isn't. Limited ability to understand the feature on top of limitations within what Dolphin can do with OpenGL and D3D made it a nightmare to even comprehend how to tackle the feature.

It has gotten to the point where tackling zfreeze has gotten personal for many developers. For years, it has taunted the Dolphin as this seemingly impossible to emulate feature that breaks some very popular titles. Not even the software renderer had a working implementation! Many attempts were made to properly emulate it, hack it, or work-around it in a way that would make the feature less of a stopping point, but nothing succeeded.

The first partially successful attempt came from neobrain in 2012. His zfreeze branch actually got Rogue Leader's skybox to work in certain situations, but attempting to fix any of the other titles immediately broke Rogue Squadron II. He indefinitely put the project on hold in order to write hardware tests, but never got around to it and eventually lost interest in emulating the feature.

While his branch may have been left in the past, the desire to play Rogue Squadron 2 (and 3; once it started booting in Dolphin,) never left Dolphin's userbase. One of the most asked questions after every single progress report posting was "Is Rogue Squadron playable yet?" Eventually, phire came up with a hack to at least make the Rogue Squadron games work correctly with zfreeze. This sacrificed compatibility with all other zfreeze titles, and wasn't ever considered for an actual build, but nonetheless planted the seeds of curiosity.

phire's project was to make a set of hacks that would work with every single situation that zfreeze was used for in various titles. During this time, he ran into several different uses for zfreeze.


Combating zfighting on Decals

Used by: Mario Power Tennis, Super Mario Strikers


Without zfreeze, zfighting rules the day.

Proper zfreeze allows all the decals to sit flat without zfighting.



zfreeze was designed as a way to eliminate zfighting when rendering decals, instead of other hacks like OpenGL's glPolygonOffset(), but the developers never really use it for that. phire suspects that it's either just too expensive, since it requires a new drawcall for every set of decals on a different triangle and developers just manually bias vertices instead, or was poorly documented to the developer.

Going through the list of fifologs for testing, there are exactly two known games (Mario Tennis and Mario Strikers) which uses zfreeze in the intended decal rendering mode. Mario Strikers uses it for rendering the shadows onto the field while Mario Tennis uses it for rendering the tennis court lines. But Mario Tennis also uses other zfreeze-based tricks for its shadows, so Super Mario Strikers is the only game that purely uses zfreeze in its intended manner.


Depth Override

Used By: Rogue Squadron II/III, Mario Golf: Toadstool Tour, Blood Omen 2: Legacy of Kain


Star Wars Rogue Squadron II

The polygons for the skybox actually sit only a short distance away from the ship, which is why it shows up in front of the building.

Star Wars Rogue Squadron II

Even though the skybox is closer than the building, it's forced to be rendered behind everything through zfreeze.



Most famously used by Rogue Squadron's skyspheres. These skyspheres are rendered extremely close to the player but use zfreeze to override the depth and project it behind all other objects to the zfar plane. This is essentially the same as putting "depth = 1.0" in a fragment shader (which is what phire's first hack did), except that in the GameCube this is done in the triangle setup and early z culling still happens. Factor 5 used this method because putting the skysphere in the distance would take up a huge chunk of the zbuffer range (due to Factor 5 using Hardware Anti-aliasing, they were limited to a 16bit zbuffer) and rendering the skysphere first with zbuffer disabled would cause too much overdraw.

The other games that do this don't seem to have a reason for it. Mario Golf: Toadstool Tour was likely done by the same people who did Mario Power Tennis, which explain why they'd use the feature. But, they only used it for the main menu. Blood Omen 2 on the other hand uses it not only for the Skybox, like Rogue Squadron 2/3, but also for the pause menu, like Mario Golf. This completely broke plans for making hacks that could suit each game and meant a real implementation would be necessary.


Stencil Shadowing

Used By: Most EA sports games, some EA racing games, Mario Power Tennis, and likely others


Need For Speed: Hot Pursuit 2

Without zfreeze the nature of how the shadow is drawn is revealed.

Need for Speed: Hot Pursuit 2

With zfreeze, it looks like a normal stencil shadow



Shadows are one of the harder things in 3D graphics, many methods have been developed for dynamic shadows over the years and they all have various trade-offs. The choice of a shadowing technique depends a lot on the capabilities/performance of the hardware. Doom 3's famous stencil volume shadows produce the best looking results for sharp shadows, but modern hardware isn't optimized for stencil volume shadow's excessive stencil operations. Most modern games use shadow maps, but the resolution on these is limited often leading to pixelated shadows.

EA sports games use pure projection shadows - the shadow object is projected onto the floor in software, which is easy because the floor of sports games is completely flat, and rendered on the floor. This works fine if the developers want a pure black shadow, but generally they want an alpha blended shadow to prevent zfighting and double darkening when polygons overlap.

But the GameCube doesn't have a stencil buffer to handle those situations. Instead these games enable zfreeze, which ensures that each pixel on the screen will always have an identical depth in the zbuffer if rendered to twice. It then changes the depth compare method from the usual less than or equal to less than, so each pixel of the shadow can only possibly be drawn once. This essentially creates a 1bit stencil buffer in the depth buffer.


Fighting the Problems Head On

With all of these different uses, there wasn't going to be a singular hack that worked for every game, and some games were unfixable due to using multiple forms of zfreeze. Hacks would lead to partial support at best. However phire still had hope that zfreeze could be emulated properly; so while polishing up his hacks for master, he also kept trying to think of actual implementations that may work.

Completely against having a hack, neobrain formulated plans about how to tackle zfreeze without stooping to game specific hacks. He'd have already seen numerous failed attempts at zfreeze, and knew that the most important step would be properly understanding zfreeze. Yet, all the hard work and planning in the world can't take into account pure serendipity.

NanoByte011, in an effort to get some kind of zfreeze implemented, ported neobrain's old zfreeze branch forward and made a pull request. Even though it had been tested before, phire was shocked to see the EA shadows somewhat working in the branch and quickly began working from that as his new base.

Within modern Dolphin, the zfreeze branch showed more of its potential to actually emulate the feature. phire and NanoByte011 worked together on zfreeze from this new base, solving impossibility after impossibility. Nanobyte011 dealt the final death blow to the mystery when he solved the zfighting in Mario Power Tennis' gimmick courts. Once adding in consideration for EarlyZ, all remaining issues with zfreeze disappeared.

Thanks to the the hard work of all of these developers and many others, zfreeze is now emulated properly in Dolphin with no known drawbacks.


4.0-5390 - Perspective Divide Line-width Coordinates Before Comparing Angle by Armada651

Factor 5's games will not let any emulation bugs get by them at all. Armada's previous line-width rewrite enabled almost every title that used line-width to work perfectly in OpenGL just as it used to in D3D.

The major exception was, of course, both Star Wars Rogue Squadron games for the GameCube. For some reason, even with all the hardware tests and work done to make sure things were perfect, they still managed to find a way to be broken. Many of the line-width objects, including the wireframe weapon models and tow cables, would not draw many lines depending on the angle of the object or the camera. But as the fix for it was merged at the last minute it was not included in the video at the top of this article.

The fix turned out to be an oversight in the original line-width implementation for D3D; Dolphin needed to perspective divide the coordinates it uses to decide in which direction to grow the lines. By doing this, not only does it fix Star Wars Rogue Squadron II and III, but Dolphin will also more accurately display lines in other games using this feature, such as Star Fox Assault where the difference was so small that only an automated program like FifoCI could catch the difference.


Star Wars Rogue Squadron II

The incorrect linewidth resulted in broken lines.

Star Wars Rogue Squadron II

But now the lines are rendered properly.



And as always

Thanks to all of the developers, forum staff, wiki editors, and users for their continued support to the project and allowing it to have such a strong start to the new year!

You can continue the discussion in the forum thread of this article.

Next entry

Previous entry

Similar entries