My Career – Part 13: DirectFlip

DirectFlip is a feature which allows the DWM to:

Determine if the entire screen contains only content from a DirectX app that’s DirectFlip compatible.
When true, the DWM will display the application’s buffer directly rather than performing and additional rendering pass.

This was not particularly hard to implement, but after getting it to work I realized that we had a fundamental architectural incompatibility. To understand the issue, I must draw some timing diagrams and give a brief explanation of how display hardware works.

Generally speaking, display hardware has to continually re-read the image from memory and send it to the monitor. The rate at which this occurs is called the “refresh rate” and common values are 60hz (i.e. 60 times per second), 75hz, or 120hz. Each time it starts reading the image is called a VSYNC. When you tell the display hardware to change the address that it reads the image from (an action that we call “flip”), the address change takes effect at the start of the next VSYNC.

Our scheduler ensures that the application can only render to a buffer that is not being displayed (otherwise you will see all types of weird artifacts). Hence, in a double buffer scenario (the most common for games), one buffer is displayed while the other buffer is being rendered to.

Before the DWM, the timing diagram for a full-screen game looked like this (the vertical line represent VSYNCs):

Normal full-screen DirectX game timing prior to the DWM

With the DWM (but before DirectFlip), the timing diagram looked like this:

The same game after the DWM was added in Windows Vista

This needs a little explaining. The DWM wakes up shortly after the VSYNC and at that time, it:

Looks for the most recent buffer from the game.
It then indicates that it’s using that buffer, which prevents the game from writing to it.
It releases any other buffer that it was using, allowing the game to again write to it.
It then composes the final desktop to it’s own buffer and issues a flip to that buffer.

But from the above, you can see that the DWM causes extra GPU rendering to happen and each frame presented by the game must wait an extra VSYNC period to be displayed. These are the issues that DirectFlip solves.

But when we implemented DirectFlip, what we saw was this:

Original DirectFlip behavior on the same game

We see that the DWM is no longer rendering or flipping to it’s own buffers (which is good), but why are the frames being displayed for 2 VSYNC periods rather than 1? This needs some explaining:

Time A: The DWM wakes up, sees that buffer A is now being displayed, so it releases buffer B back to app. Hence it starts rendering to it.

Time B: The DWM wakes up, sees that buffer B is the most recent buffer, so it submits a flip. But since the flip only happens on the start of a VSYNC, it doesn’t take effect until the next VSYNC.

Time C: The DWM wakes up, sees that buffer B is being displayed, so it releases buffer A back to the game so it can start rendering to it.

Hopefully by this point, it is clear the problem is that the DWM always issues the flip after the VSYNC, which means that it cannot take effect until the next VSYNC. Hence, the game’s buffer is always displayed for 2 VSYNC periods rather than 1.

The only way to fix this is for the DWM to issue the flip before the VSYNC rather than after the VSYNC.

With this in mind, we asked the DWM team if there was anyway that they could modify their architecture to accommodate a different behavior, but they said “absolutely not” – their scheduler was written by a person who had left their team, nobody left on their team really understood how their scheduler worked, and in their most recent attempt to tweak the scheduler behavior, everything broke in unexpected ways that nobody understood.

My management realized the importance of solving this, so they gave me a month to try to fix the DWM myself. One month later, I produced:

A very detailed document on how the DWM scheduler worked and why, including an explanation of why their previous minor tweaks caused the behaviors that they saw.
A working version of the DWM containing a new feature that I called “early wakeup mode”, which they allowed to become part of their official code base.

If a game or other application is using DirectFlip, the behavior was modified so that as soon as a new buffer was added to the DWM’s queue (which happens shortly after game’s rendering to the buffer has finished), the DWM will wake up, do a few lightweight checks, and then issue the flip to the new buffer.

The new timing looks like this:

DirectFlip combined with early-wake-up-mode removed the GPU overhead normally added by the DWM, which helped immensely with power and performance (especially for low end devices). It also do not require special hardware for it to operate.

The downsides were:

This only benefitted scenarios where every visible pixel on the screen belonged to the application. If you used a software cursor, ran in windowed mode, or had some other popup occur, DirectFlip was of no benefit.
While early wake-up mode solved the timing issues that resulted in throttling the game rendering, it did so at the expense of additional CPU utilization and thread hops.

These issues were fixed using multiplane overlays and independent flip.

NOTE: The HoloLens project (which came a few years later) required additional processing and stages to render their images. They heavily leveraged my “early wake up mode” code to make its work.

Talking Smac

My Career – Part 13: DirectFlip

Leave A Comment Cancel reply