Code Change Request

# 20851

Back to Code Changes

Christopher
Technical Support
StableBit DrivePool
2.2.0.624
Windows Home Server 2011
Public
Alex

* [D] [Issue #20851] Very rarely, a pool part folder is reported to not exist, while a sub-folder under that folder does exist.
                     This causes duplication inheritance to get confused and to inherit the wrong value from the root. This issue 
                     was reported on Windows 7 and WHS 2011. It could not be reproduced.
Public
Alex

So I've done a code review and debugged the code a bit, and have come to the conclusion that what's happening is quite impossible :)

I don't think it's a bug in the duplication code. If anything, it's some kind of OS-level issue reading the tags from the disk itself.

One thing that I can say at this point about Lars' case, is that we were looking for the problem in wrong place. After closely examining his screenshot and all of the other info that he submitted, it's clear to me that the "\ServerFolders" tags ARE being read in correctly after all. In the "broken" state, the \ServerFolders are shown in a blue hue with "x1+" duplication. Blue means "Inherit" and x1 means that x1 is being inherited from the parent. Moreover, the "+" means multiple different duplication levels for sub-folders (this can only be read in from the tags on the disk). So, that translates into an "MI" tag that was read in successfully, and that's exactly what's on the disk, as reported.

The problem is what "\ServerFolders" is inheriting, not the folder itself. It's inheriting x1, when we can clearly see in the screenshot that the root has x2 duplication. So the million dollar question is, where did the x1 come from?

Following that line of reasoning led me to the fact that duplication tags are cached. So presumably, some time in the past, the root folder reported itself as having x1, and that x1 was propagated to the sub-folder \ServerFolders, and that fact was cached. Afterwards, the root folder somehow switched to being x2 duplicated, and everything else inherited the correct duplication level.

The problem with that sequence of events is that, the duplication tag cache is cleared all at once, not selectively. So if the root was cached as x1, the only way that it can switch to x2 is if the entire cache was cleared and the tags were re-read back in from disk. This happens from time to time, depending on certain I/O. But here's the problem, if that happened, how come \ServerFolders still has the broken x1 inherited from the root? Should it not have been cleared and re-read as well using the, now correct, x2 duplication from the root?

In the end, the fact that we see multiple sub-folders inheriting different duplication values from the root folder makes this entire scenario impossible, given that the cache can only be cleared all at once.

Given the code at hand, here's the only thing that I can come up with that would cause this:
  1. "\" was queried for its duplication level.
  2. The duplication tag was not found on the root folder (which can happen for non-duplicated pools).
  3. The root directory itself was queried, and it did not exist.
  4. Since we're querying the duplication level of the root, we assume a MI duplication level. All roots default to MI in the absence of a duplication tag. They inherit x1 duplication from an imaginary parent directory.
  5. Since the root directory doesn't exist, we choose not to cache this fact. That's because this result is not definitive, as this folder can potentially exist on some other pool part (of course it's the root, so really it cannot).
  6. Normally we don't search sub-folders for tags any further, if the parent has no tag. But in this case, since this is the root (and an implicit assumption is made, that all root directories MUST exist), we continue on.
  7. We then query \ServerFolder for its duplication tag, and we find it. It says "MI", and we make it inherit the x1 duplication level from the root that we couldn't find.
  8. We check whether \ServerFolders exists and we cache the x1 duplication level that we just read in.
Later, something else tries to query the duplication level of some other folder and we repeat the process above, except that this time the root directory is found and the correct x2 duplication level is read in and cached, giving whatever other folder we're querying the correct inherited duplication count.

What circumstances would cause a root directory to not exist, and yet a subdirectory of that directory to exist? I don't know.

But this is the only thing that I can think of that fits all of the symptoms:
  • It leads to the exact problem that's being described, and it fits all of the symptoms exactly. I've gone through other possible scenarios but none of them fit the problem being described perfectly.
  • This would spontaneously "self-heal", as is being described too. The duplication tag cache is cleared periodically, depending on certain I/O and the tags are re-read every now and then.
  • The problem is deterministic, and it leads to the same exact problem every time. It only ever involves the root directory specifically.
  • It seems like an OS-level error reading in duplication tags from the disk. Which kind of makes sense, given that all of the problems that we've seen are happening on Windows 7 and WHS 2011 (which have the same kernel).
How can we fix this? I'm not sure at this point. I guess we can remove the assumption that all root directories must exist and treat this as a non-recoverable error.

Finally, I've yet to be able to reproduce any of this, it's just a theory that fits all the facts.
Public
Alex

Like you said, really strange.

Ok, so I'm going to provide a lot of details here on how duplication works and I'm making this public so feel free to link to it.

What Controls Folder Duplication?

Folder duplication in StableBit DrivePool is fundamentally controlled almost entirely by our covefs.sys pooling file system driver. Reading duplication levels and setting duplication levels on folders is not "cached" (or remembered) by the service in any way. So resetting the service or reinstalling should not affect folder duplication. Duplication levels are however cached by covefs. The reason why it works like this is because covefs needs to know the duplication level of any file on the pool, in real-time, as those files are created. Because of this requirement, it makes sense to have the code to efficiently read / cache and update the folder duplication level in one place, in the kernel, and have the service interact with that code.

Incidentally, this is the same interaction that's available from the dpcmd command line utility. It doesn't read / write the duplication level, instead, it sends special commands to the pooling file system driver (covefs) to change the duplication level.

How is the Folder Duplication Level Stored?

Each folder on the pool, can conceptually have a duplication level associated with it. For example, it can be x2, x3, etc... That duplication level is stored in an alternate data stream on the folder itself.

For example, a pool part folder will have the following alternate data stream:
PoolPart.305d6edd-5f99-4c5e-9fa2-456d6d6fb5b3:DuplicationCount.Tag.CoveFs

You can actually open this up in notepad like this:

D:\>dir /a /r
 Volume in drive D has no label.
 Volume Serial Number is BE55-251F

 Directory of D:\
...
12/17/2015  05:10 PM    <DIR>          PoolPart.305d6edd-5f99-4c5e-9fa2-456d6d6fb5b3
                                     4 PoolPart.305d6edd-5f99-4c5e-9fa2-456d6d6fb5b3:DuplicationCount.Tag.CoveFs:$DATA
                                    48 PoolPart.305d6edd-5f99-4c5e-9fa2-456d6d6fb5b3:PoolId.Tag.CoveFs:$DATA
...
               1 File(s)              0 bytes
               7 Dir(s)  664,052,600,832 bytes free

D:\>notepad PoolPart.305d6edd-5f99-4c5e-9fa2-456d6d6fb5b3:DuplicationCount.Tag.CoveFs

For a root directory (which this is), you will see the level set to MI.

Both M and I are special tags.
  • M - Sub-folders may have a different duplication level than this folder (this is an optimization so that we don't have to crawl the entire pool every time)
  • I - Indicates that this folder is inheriting its duplication level from the parent folder (in the case of a root folder, it's an imaginary folder x1 above it)
In the case of a x2 duplicated folder, the level may be simply indicated as 2. Now I really don't recommend editing these tags by hand. While they appear as ASCII, they are actually read as binary data and the special handling of M and I tags has to be correct. However, it's ok to delete these tags and let the system rebuild them if something goes very wrong with the directory structure of a pool part (such as file system directory index corruption).

Which leads us into...

How are Duplication Tags Repaired?

Every time the StableBit DrivePool background service starts up, it does a scan of all of the duplication tags on the pool and ensures that each one is "consistent". This process is intentionally separate from what the file system already does because it checks each pool part individually for consistency with all of the other pool parts on the pool.

If it finds an inconsistency, such as a duplication tag that's missing or incorrect, it will automatically repair all of the tags falling back to the highest duplication level detected in the case of a conflict.

So that's in a nutshell how the entire folder duplication system works, not considering file placement, which works alongside it. I'm not entirely sure what's going on here, but it seems to me quite clearly that the duplication tags on those folders contain an incorrect level. If that's the case, this should be confirmed, and the information provided here can hopefully be of some help in that regard.