Code Change Request

# 13504

Back to Code Changes

Christopher
Technical Support
StableBit DrivePool
Windows 8.1 (64 bit)
Public
Alex

* [D] [Issue #13504] Optimized some spin lock code that was causing a DPC_WATCHDOG_VIOLATION on Windows 8+ in one 
                     reported case.
Public
Alex

Ok, so this is going to be a complicated one. Here we go...

First of all, the crash was triggered by multiple concurrent directory listings, 8 of them at the same time.

The error was a DPC_WATCHDOG_VIOLATION which is a bug check that is now enabled by default starting with Windows 8. It makes sure that deferred procedure calls (DPCs) are not taking too long to run as that can impact the responsiveness of the system in general. Our drives doesn't deal with DPCs directly, but this bug check introduces another quantity that can affect us. In addition to tracking DPC run times it also checks how long code running at IRQL_DISPATCH is running.

What is IRQL_DISPATCH? Basically, when code runs under that mode, no other code can run in the system. Effectively multitasking is off. In order to ensure that the system remains responsive developers are encouraged to minimize the amount of time spent in IRQL_DISPATCH. Starting with Windows 8 this is now enforced with a hard time limit.

CoveFS does minimize the use of IRQL_DISPATCH, unless absolutely necessary. One place where it is necessary is when processing I/O completion.

In the dump submitted for this case, it looks like 8 threads (on a 8 core CPU) were trying to run the completion code at the same time. Since only one thread can run the code at a time (for data synchronization purposes), the other 7 had to wait. The code that needs to run is a small loop and it sets a few events. This should take microseconds, not milliseconds and certainly not seconds.

I've done some googling on the issue and it looks like at least one other programmer has experienced something similar and tracked it down to the KeSetEvent function. Now that was the exact function that was being called when the crash occurred. With that in mind, I've optimized the code that runs under the DISPATCH lock in 2 functions that use the same lock, to never call KeSetEvent while in IRQL_DISPATCH.

To me this really makes no sense at the end, because only 8 threads were waiting on that lock, not 8000 or 80000, so it should not have been an issue in the first place. This could still be a hardware issue where the CPU just locks up or something else is locking up. 

As a side note, Microsoft has a hotfix to disable this new feature for Windows 2012: http://support.microsoft.com/kb/2789962

I don't see anything for Windows 8.1 though.