Dear Diary,
today I’ve written a small benchmark utility to try to emulate NNTP server performance. A one-file-per-article spool has somewhat unusual performance characteristics, totally dominated by stat-ing and stuff.
So my little utility is a C program that recursively reads a real news spool, and then just discards the result. It’s extremely single-threaded, which isn’t typical of NNTP usage patterns, but otherwise it should be kinda ok. It’s on GitHub.
To test, I copied over a 26GB portion of the read Gmane news spool (3.3M files) over to three different partitions: One btrfs on the MegaRAID, one ext4 on the MegaRAID, and one ext4/btrfs on the spinning system disk, just to get a baseline.
(And always do echo 3 > /proc/sys/vm/drop_caches before testing anything.)
btrfs wastes a lot of room, though. What takes 32GB on ext4 takes 42GB on btrfs. But with max_inline=0 that shinks to 36GB. Still kinda sucky.
Anyway, the results are, when reading files in readdir() order:
btrfs on ssd: 10600 files per second, 84MB/s
ext4 on ssd: 4460 files per second, 35MB/s
btrfs on spinning disk: 5030 files per second, 40MB/s
ext4 on spinning disk: 238 files per second (yes, I know. With noatime. Yes. Yes. Try it yourself.)
And when sorting the files in alphabetical order:
btrfs on ssd: 7800 files per second, 62MB/s
ext2 on ssd: 19200 files per second, 152MB/s
ext4 on ssd: 19100 files per second, 152MB/s
ext4 on spinning disk: 6100 files per second, 48MB/s
So two things stand out here:
1) ext4 is really sensitive to the order you read files
3) the LSI MegaRAID SAS 9265-8I is quite slow on small files
I mean, when reading large files, I get 1.2GB/s! This is bullshit! Where are my IOPSes! I want more IOPS!
Perhaps I should set the stripe size on the RAID to something smaller than the default, which is 128KB. I mean, the mean file size in the spool is 8K, which means that it’s probably reading a lot more than it has to.
It has to!
Your workload with lots of random reads of small files probably doesn't really need the bandwidth of SATA3 anyway since you will only hit it with large sequential reads, so I suggest trying out the performance without the RAID card as well. Maybe soft RAID or BtrFS's built-in RAID might even be faster with just SATA2.
And even though I'm a Debian guy myself, I would actually suggest benchmarking the brand new FreeBSD 9 with the latest ZFS version in RAID-Z mode (again without the RAID card) as well. ZFS simply provides enough awesome to even bear using FreeBSD.