Les Bell and Associates Pty Ltd
PO Box 297, Neutral Bay Junction NSW 2089 Australia
1 Cullen Street, Forestville NSW 2087 Australia
Tel: +61 2 9451 1144 Fax: +61 2 9451 1122
This message was originally posted by a Microsoft employee on the international Fidonet conference. The Microsoft employee explains its significance:
01/05/91 16:20:58
From: MARK RYLAND
To: ALL
Subj: HPFS INFO
The following message from Gordon Letwin appeared on an internal Microsoft forum in response to a discussion between me and others regarding Benny Ormson's HPFS utilities, in particular his HPFS defragging program (Gammatech Utilities for OS/2 - highly recommended). It is quite interesting and informative, so I got permission to post this publicly.
Take it away Gordon Letwin, OS/2 architect and HPFS/HPFS386 architect:
It's good that there's a third party defragger. As the following study shows, HPFS does a very good job of keeping stuff contiguous, so I agree that for those files which are fragmented just copying them, especially with the "size advisory" argument, will almost certainly defrag them. As per below, it's pretty rare that files are fragmented; I've only seen a few badly fragmented files. So for most people a defrag utility doesn't serve much purpose. If you have some files that you pound real heavily, or if you were unlucky and one of the rare files that's badly fragged is a heavily used one, a defragger would have occasional use. I suspect that the major role for the tool is to make people feel comfortable and in control and as if they're "tuning" their system; its actual bottom line contribution to performance probably isn't that important.
Here is a condensation of a study that I did re: HPFS fragmentation.
An analysis of my development machine: My machine is a 386/33 with a single 300 megabyte disk.
Total disk space 317 megs Total files 10300 Space used 285 megs Space free 30 megs
This machine is a pretty tough example for a non-server box. I'm doing development on it and there's lots of activities. Plenty of background builds, compiles, etc. Lots of simultaneous editor activity from a variety of editors. Very large file hierarchies are created, expanded way up, shrunk way down, sometimes gradually, sometimes in a burst of activity. Big log files are written by one app while builds are being done simultaneously by other apps, etc. I also periodically run out of space and then prune back; usually it's the *older* files that are removed, not the newly created ones. So although this is just one (non-server) machine, I think that it makes a pretty good test case. It's been running HPFS for about 18 months and has never been "packed" or otherwise regularized.
First, an analysis of the # of extents needed to store the files. HPFS tracks the space allocated to files by keeping a list of contiguous extents. The extents can be of arbitrary size and can start on an arbitrary sector. HPFS can store up to 8 extents in the file's FNODE; this table is kept resident when the file is open. Should more than 8 extents be necessary HPFS will allocate a B+TREE to keep track of an unlimited number of extents (during development we tested with 10s of thousands of extents).
Files with only one extent are fully contiguous. Files with 8 or less extents are "nearly" contiguous in that - although all of the file's sectors are not adjacent on the disk surface - no disk reads are necessary for any access - the entire extent list is RAM resident. Only in the case of an allocation B+TREE might non-data disk reads be necessary.
Of the 10300 files on my disk, I found:
| # of extents | # of files requiring this # of extents | %of total files on disk |
|---|---|---|
| 0 | 26 | 0.25 |
| 1 | 7997 | 77.64 |
| 2 | 1538 | 14.93 |
| 3 | 278 | 2.70 |
| 4 | 132 | 1.28 |
| 5 | 76 | 0.74 |
| 6 | 58 | 0.56 |
| 7 | 35 | 0.34 |
| 8 | 28 | 0.27 |
| B+TREE | 132 | 1.28 |
In other words:
78% of the files on my disk are contiguous
93% of the files take only 1 or 2 extents
99% of the files are "nearly" contiguous
1% required an allocation B+TREE
Also, it's interesting to look further at those files which do require an allocation B+TREE. Those 132 files are typically very large, averaging 513923 bytes, for a total consumption among them of 68 megabytes of disk space. The fragmentation overhead due to these files being discontiguous is very low for sequential I/O since it takes a lot of time to read or write such large files anyway. As for random access, in all cases the allocation tree fits totally within one sector so there is at most a one sector hit for any random access and, if the file is being frequently accessed, there will be no hit since the allocation btree sector will be in the cache.
Next, let's look at very large files. Of the 52 files I have larger than 500,000 bytes, 26 are "nearly" contiguous and 26 have allocation B+TREEs. For these files, the average extent is 304 sectors long, or 152K bytes. So even for very large files, there's a high degree of contiguity.
Finally, lets look at the free space on the disk to see how fragmented it is. This is of secondary importance since the real bottom line is how fragmented the files are; a poor file system can have contiguous free space but fail to use it effectively.
Of my 30 megabytes of free space:
| free runs smaller than ? sectors | represent ? % of the free space |
|---|---|
| 10 | 10.33% |
| 25 | 20.32% |
| 42 | 30.44% |
| 67 | 40.00% |
| 124 | 50.13% |
| 208 | 60.20% |
| 411 | 70.48% |
| 642 | 80.74% |
| 919 | 90.66% |
In other words, only 10% of the free space is in blocks of less than 10 sectors. Half of the free space is in blocks of at least 124 sectors, or 62K. This number is very large compared to the file size distribution - only 8% of my files are that big, yet half of our free space is available in chunks bigger than that.
As a followup to my previous analysis of my work HPFS disk, I analyzed my home HPFS disk. It's also a 300 meg disk, but it's only about 6 months old and it hasn't seen the active use that the work machine has.
A quick summary of the statistics:
total space 327 megs total files 10114 space used 254 megs space free 73 megs
Of these files, 99% are stored in just one or two extents - i.e., nearly all files are fully contiguous or are stored in two contiguous pieces. Of the 10114 files, only 17 require more than 4 extents, and only 3 have an allocation B+TREE.
With regards to free space, only 5% of the free space is in runs of less than 1000 sectors (yes, 500K byte free runs). 90% of the free space is in runs of 6000 sectors or better. In other words, the free space is esentially contiguous.
Although this file system doesn't get the workout that my office machine does, it does see a reasonable degree of use, including the hosting of SLM trees.
I followed up with some studies of Microsoft project servers - big disks used for major projects like DOS 5. They showed the "super high" contiguity like my home machine. My development machine was by far the worst; I take this to mean that in spite of a lot of activity, the usage pattern on a server isn't worst case.
Page last updated: 5 Sep 1996