This doesn’t come up very often these days, but every once and awhile it does and every time it does, I look to see if we have documentation on it and there never seems to be. After writing this post I did find a forum post where my friend Drew answers it there too. Well anyways let’s quickly explain the situation with IO operations.
Most block vendors these days tell customers to change their path switching policy for their storage in ESXi from the default of Round Robin (1,000) to 1. This makes ESXi switches logical paths for a given device after every I/O instead of every 1,000. The reason I say this doesn’t come up much anymore is that in modern version of ESXi (6.0 express patch+, 6.5 U1+ and 6.7+) we (Pure) have rules in ESXi that makes sure this is set by default without any user configuration. Many other vendors do as well.
Anyways, when using VMware tools to see if a device is configured properly, depending on how it is set, it can readout differently.
So if I run the following command:
1 |
esxcli storage nmp device list |
I see two devices that have slightly different multipathing configurations (or so it seems):
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
naa.624a93705ee86996f8334fa000011012 Device Display Name: PURE Fibre Channel Disk (naa.624a93705ee86996f8334fa000011012) Storage Array Type: VMW_SATP_ALUA Storage Array Type Device Config: {implicit_support=on; explicit_support=off; explicit_allow=on; alua_followover=on; action_OnRetryErrors=on; {TPG_id=0,TPG_state=AO}{TPG_id=1,TPG_state=AO}} Path Selection Policy: VMW_PSP_RR Path Selection Policy Device Config: {policy=iops,iops=1,bytes=10485760,useANO=0; lastPathIndex=1: NumIOsPending=0,numBytesPending=0} Path Selection Policy Device Custom Config: Working Paths: vmhba3:C0:T3:L253, vmhba3:C0:T2:L253, vmhba1:C0:T4:L253, vmhba1:C0:T3:L253 Is USB: false naa.624a937073e940225a2a52bb0002b7c5 Device Display Name: PURE Fibre Channel Disk (naa.624a937073e940225a2a52bb0002b7c5) Storage Array Type: VMW_SATP_ALUA Storage Array Type Device Config: {implicit_support=on; explicit_support=off; explicit_allow=on; alua_followover=on; action_OnRetryErrors=on; {TPG_id=1,TPG_state=AO}{TPG_id=0,TPG_state=AO}} Path Selection Policy: VMW_PSP_RR Path Selection Policy Device Config: {policy=rr,iops=1,bytes=10485760,useANO=0; lastPathIndex=0: NumIOsPending=0,numBytesPending=0} Path Selection Policy Device Custom Config: Working Paths: vmhba4:C0:T2:L1, vmhba4:C0:T1:L1, vmhba2:C0:T2:L1, vmhba2:C0:T1:L1 Is USB: false |
Notice a difference? Well there isn’t much of one except for this part. policy=rr and policy=iops. But both say IOPS=1. What does that mean and how is it different?
Well let’s look at the devices in another way.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
[root@esxi–01:~] esxcli storage nmp psp roundrobin deviceconfig get –d naa.624a93705ee86996f8334fa000011012 Byte Limit: 10485760 Device: naa.624a93705ee86996f8334fa000011012 IOOperation Limit: 1 Latency Evaluation Interval: 0 milliseconds Limit Type: Iops Number Of Sampling IOs Per Path: 0 Use Active Unoptimized Paths: false [root@esxi–01:~] esxcli storage nmp psp roundrobin deviceconfig get –d naa.624a937073e940225a2a52bb0002b7c5 Byte Limit: 10485760 Device: naa.624a937073e940225a2a52bb0002b7c5 IOOperation Limit: 1 Latency Evaluation Interval: 0 milliseconds Limit Type: Default Number Of Sampling IOs Per Path: 0 Use Active Unoptimized Paths: false |
Notice the Limit Type property. One says IOPS and the other says Default. This is a little clearer.
What if I create a custom (a.k.a. user) SATP rule and then provision a new device?
1 2 3 4 5 6 7 8 9 |
[root@esxi–01:~] esxcli storage nmp satp rule add –s “VMW_SATP_ALUA” –V “PURE” –M “FlashArray” –P “VMW_PSP_RR” –O “iops=10” –e “FlashArray SATP Rule” [root@esxi–01:~] esxcli storage nmp psp roundrobin deviceconfig get –d naa.624a93705ee86996f8334fa00002aff4 Byte Limit: 10485760 Device: naa.624a93705ee86996f8334fa00002aff4 IOOperation Limit: 10 Latency Evaluation Interval: 0 milliseconds Limit Type: Default Number Of Sampling IOs Per Path: 0 Use Active Unoptimized Paths: false |
Note I made the IO Operation limit 10 instead of 1 so I know that it hit that rule. It still says Default.
So. If you see Default or RR, you know that device was configured according to a default SATP rule or a custom one. If you see IOPS, then you know it was because someone manually changed that device.