SQLIO vs. DiskSpd

In 2015, Microsoft formally announced that SQLIO has now been fully deprecated, in favor of DiskSpd. While that tool was great in the days of spinning rust (read: hard drives), SQLIO really had nothing to do with SQL Server. It was a storage benchmarking tool, much like IOmeter, CrystalDiskMark, ATTO, HDTune, and probably 20 or […]

Pure Storage

Dec 16, 2015

In 2015, Microsoft formally announced that SQLIO has now been fully deprecated, in favor of DiskSpd.

While that tool was great in the days of spinning rust (read: hard drives), SQLIO really had nothing to do with SQL Server. It was a storage benchmarking tool, much like IOmeter, CrystalDiskMark, ATTO, HDTune, and probably 20 or 30 more.

SQLIO vs. DiskSpd?

Nowadays, Microsoft is advocating for the use of DiskSpd. Both SQLIO and DiskSpd are great for performing basic plumbing tests, where you are trying to find the maximum data throughput you will obtain from your system. For a more realistic simulation of your production environment, though, you will need to use something else. Mimicking your actual workload with DiskSpd will be a very, very time consuming task, if at all possible. The reason is that SQL Server never actually uses a fixed IO block size. It can issue IOs as tiny as a few hundred bytes (typically your sector size for transaction log flushes), or a large as several megabytes in size (the case of backups and restores). You can certainly try to kick off multiple DiskSpd threads at different IO block sizes and see what kind of behavior you’d get from your system, but in the end this continues to be just a synthetic test.

In data reducing arrays like the Pure Storage FlashArray, there’s nothing like testing with your actual workload, and looking at how your actual production IO patterns behave on your target system.

Distributed Replay

So, what’s a great tool for this? You might wonder. My favorite, by far, is Distributed Replay. It might not be the friendliest tool in the world (take note, Microsoft!) but it certainly is one of the best (if not _the_ best) tools for determining whether a target system is suitable for a given workload. Note how I didn’t specifically say that it’s great for testing storage subsystems; it is, but that’s one of the many scenarios where you would use Distributed Replay. You should use every time you do things like:

Upgrade to a new build of SQL Server, to make sure you won’t be suffering from issues like query optimizer plan regressions.
Move to new hardware, or are working on consolidation of workloads onto new infrastructure.
Plan to implement Always On Availability Groups, or even old fashioned Database Mirroring, to understand how your workload will behave under log replication.
Want to test the effect of configuration settings on your workload, for example: forced parameterization, database compatibility mode, recovery modes, trace flags, etc.

Happy testing!