I have a setup of about 20 Linux machines, each with about 30-150 gigabytes of customer data. Probably the size of data will grow significantly faster on some machines than others. These are virtual machines on a VMware vSphere cluster. The disk images are stored on a SAN system.
I'm trying to find a solution that would use disk space sparingly, while still allowing for easy growing of individual machines.
In theory, I would just create big disks for each machine and use thin provisioning. Each disk would grow as needed. However, it seems that a 500 GB ext3 filesystem with only 50 GB of data and quite a low number of writes still easily grows the disk image to eg. 250 GB over time. Or maybe I'm doing something wrong here? (I was surprised how little I found on the subject with Google. BTW, there's even no thin-provisioning tag on serverfault.com.)
Currently I'm planning to create big, thin-provisioned disks - but with a small LVM volume on them. For example: a 100 GB volume on a 500 GB disk. That way I could more easily grow the LVM volume and the filesystem size as needed, even online.
Now for the actual question:
Are there better ways to do this? (that is, to grow data size as needed without downtime.)
Possible solutions include:
Using a thin-provisioning friendly filesystem that tries to occupy the same spots over and over again, thus not growing the image size.
Finding an easy method of reclaiming free space on the partition (re-thinning?)
Something else?
A bonus question: If I go with my current plan, would you recommend creating partitions on the disks (pvcreate /dev/sdX1 vs pvcreate /dev/sdX)? I think it's against conventions to use raw disks without partitions, but it would make it a bit easier to grow the disks, if that is ever needed. This is all just a matter of taste, right?