Disk fragmentation when dealing with many small files
- by Zorlack
On a daily basis we generate about 3.4 Million small jpeg files. We also delete about 3.4 Million 90 day old images. To date, we've dealt with this content by storing the images in a hierarchical manner. The heriarchy is something like this:
/Year/Month/Day/Source/
This heirarchy allows us to effectively delete days worth of content across all sources.
The files are stored on a Windows 2003 server connected to a 14 disk SATA RAID6.
We've started having significant performance issues when writing-to and reading-from the disks.
This may be due to the performance of the hardware, but I suspect that disk fragmentation may be a culprit at well.
Some people have recommended storing the data in a database, but I've been hesitant to do this. An other thought was to use some sort of container file, like a VHD or something.
Does anyone have any advice for mitigating this kind of fragmentation?
Additional Info:
The average file size is 8-14KB
Format information from fsutil:
NTFS Volume Serial Number : 0x2ae2ea00e2e9d05d
Version : 3.1
Number Sectors : 0x00000001e847ffff
Total Clusters : 0x000000003d08ffff
Free Clusters : 0x000000001c1a4df0
Total Reserved : 0x0000000000000000
Bytes Per Sector : 512
Bytes Per Cluster : 4096
Bytes Per FileRecord Segment : 1024
Clusters Per FileRecord Segment : 0
Mft Valid Data Length : 0x000000208f020000
Mft Start Lcn : 0x00000000000c0000
Mft2 Start Lcn : 0x000000001e847fff
Mft Zone Start : 0x0000000002163b20
Mft Zone End : 0x0000000007ad2000