One of my favorite myths out there is the one that says that the optimal IO block size for SQL Server is 64 kilobytes. If you believed that until now, sorry — it’s just not true. The myth stems from the fact that the best practice (from Microsoft) says that you should use an NTFS allocation unit size of 64 kilobytes for your database partitions/volumes – with some exceptions, like FILESTREAM/FILETABLE stores. Some folks confused the two things, and thus the myth came to be. So let me clarify it one more time: IO block size and NTFS allocation unit size are two very different things.
So, what’s SQL Server’s IO block size, then? You might be wondering. The truth is that there isn’t just one – it varies, wildly, depending on the operation performed and the disk structure associated with it.
Before we move on to discussing specifics, here’s a link to our SQL Server best practices – we talk a little bit about NTFS allocation units there.
A basic fact about transaction log IO: writes are hardened (meaning, persisted) immediately, every time they’re issued, unless you’re using the delayed durability feature. If you don’t know about delayed durability, please go ahead and read up on it before you start thinking that it can be used as an optimization and promptly shoot yourself in the foot.
Another fact about transaction log writes: they’re sector-aligned. Which means that they can be either 512 bytes, or 4096 bytes in some cases (4Kn drives, some PCIe cards) – whatever the size of a sector on your system is. On Pure Storage FlashArrays, the sector size is 512 bytes.
Does this mean that SQL will issue only sector-size writes for transaction log IOs? No, absolutely not. As you can imagine, this would be very inefficient – causing tons of round trips to storage on a busy system. There is a structure called a log buffer, that can bundle log data in it so that it’s persisted all at once in an operation known as a log flush. These log flushes can be up to 60 kilobytes in size. But they really could be anywhere in between. There is a set criteria for when a log flush takes place, as detailed in this blog post by my friend Tim Chapman [Blog|Twitter], a SQL Server Premier Field Engineer (PFE) at Microsoft. Relevant excerpt:
There are 3 things that cause SQL Server to need to flush a log buffer to disk (to an adjoining log block in the transaction log).
- A transaction commits and that transaction has an active log record in a log buffer. This will cause all log records in those buffer (again, up to 60K worth of transaction log record for each log buffer) to be flushed to the log file.
- We hit the 60K size limit for the log buffer and must flush it to disk.
- We have data pages that we must write to disk and there are log records related to those pages. We must write those log records to the log file BEFORE we write the data pages to the data files (this supports our Write Ahead Logging (WAL) protocol).
Now data file IO is going to be a lot different than the case of the transaction log. As you know, SQL Server has two main ways of writing to the database files: one is checkpoint, which can issue IOs of anywhere from 64KB to 1MB (up from 256KB max), and the other is the lazy writer process (similar to checkpoint in IO characteristics). Both have similar access patterns: entire pages (of 8 kilobytes in size), or entire extents (groups of 8 contiguous pages, meaning 64 KB). On the reads side of the house, pretty much the same thing – with the exception of read-ahead operations, which can be up to 512 kilobytes in size, depending on the edition of SQL Server that you’re using.
There are other ways in which data could be pushed into the data files, like bulk loads, for example. These are also multi-page operations – 256KB in size.
Great. Now we understand a bit more about SQL Server IO block sizes.
But…what about backups? And restores? We need to look at these, too!
It turns out that for SQL Server backups and restores, the engine will pick a different IO block size depending on the media – if you’re backing up to a “disk” target or a file share, SQL will go ahead and pick a 1MB IO block size.
SQL Server will issue even more kinds of different IOs for other, special structures. A typical example of this is ColumnStore indexes – which can actually issue IOs of up to 8MB in size! And just like that example, there’s more cases to consider. Initialization of files, In-Memory OLTP data persistence, blah, blah. SQL Server is a huge product, and as such has many, many features that have different I/O access patterns.
If you’re hungry for more on this topic, I suggest you go over Bob Ward’s presentation at PASS Summit 2014 on SQL Server I/O. It’s a brain-hurt inducing 3 hour talk packed with tons of details, but trust me, it doesn’t get much more low level than that 🙂
Now that I’ve discussed the multiple IO block sizes that SQL Server uses, in closing I’d like to remind you that the public performance characteristics of our FlashArray line of storage arrays are all currently expressed in 32KB-sized IOPS for a reason. As you can deduce from above, 4KB “hero” numbers are not really that relevant for databases. When comparing Pure Storage with other vendors for your SQL Server workloads, demand to understand what the behavior of the device will be when presented with different IO block sizes. Or, do what thousands of customers are doing today, and run your SQL Server databases on Pure Storage – with confidence.
Thanks for reading. In the next installment, we’ll pick up where we left off on my previous SQLIO/DiskSpd post – and we’ll talk about ways to do proper testing of data reducing arrays. Small spoiler: it takes a whole lot more than just running DiskSpd!
Until next time,