In the kernel, there is actually no requirement that the the directory listing has to be in any particular order. That's actually the job of the application making the call.
Because NTFS directory index entries are stored in a binary tree, they always happen to be listed in alphabetical order. So it would be nice if we could do that as well, but it's really not ideal because of the performance overhead it would introduce.
We would have to pre-buffer the entire directory listing, then sort it, and finally return parts of it to the caller upon request.
Instead we use a "fast directory listing" algorithm, and it works somewhat like this:
- Each disk in the pool is queried for a list of directories in parallel.
- As each disk starts to responds with blocks of data, we buffer that data internally and once we have enough of it, we respond to the caller.
- The caller may ask us for the "next" block of entries, and usually we can instantly come back with more from our internal buffer.
This all happens in the kernel and so it's really fast. It is however probably one of the most complicated pieces of code in DrivePool, and is really at the heart of how it performs its pooling magic.
Now you may say, this is all theoretical and today's computers are fast enough, why not just pre-buffer everything? Well, we had one case were someone was storing hundreds of thousands of files in a folder on the pool and even this method of listing directories was excessively slow. I had to re-tune the data structures used in the listing algorithm in order to support that kind of scenario.