For a long time, storage has been outclassed when compared to the leaps and bounds compute performance has grown over the last ten years. With the adoption of the NVMe (non-volatile memory express) as a standard for accessing flash storage, this is no longer true. We can now exploit the levels of parallelism available in modern NVMe devices to achieve lower latency and greater performance.
With the launch of DirectFlash™ Fabric earlier in 2019, FlashArray//X™ is now capable of delivering the low latencies and performance gains for shared storage environments. Prior to any implementation of NVM Express over fabrics (NVMe-oF), those wishing to benefit from NVMe storage would need to use direct-attached storage. This is not always ideal as many applications and organizations depend on centralized storage with data services in order to reduce costs and complexity, and increase efficiency.
The purpose of this blog post is to provide the steps required to implement NVMe-oF using RDMA over Converged Ethernet (RoCE) for SUSE Enterprise Linux (SLES) 15 and subsequent releases.
An important item to note is that RoCE requires a lossless network, requiring global pause flow control or PFC to be configured on the network for smooth operation.
All of the below steps are implemented using Mellanox Connect-X4 adapters.
System and software requirements:
Step 1. Install the following packages using the zypper package manager on the host.
Step 2. Configure multipathing on the host.
product “Pure Storage FlashArray”
path_selector “queue-length 0”
Step 3. Build the Mallanox OFED package to get access to the QOS tool on the host
|tar -xvf /<path to package>/MLNX_OFED_SRC-4.6-22.214.171.124.tgz|
In the decompressed folder install the source rpm
|rpm -ivh /<path-to-decompressed-folder>/MLNX_OFED_SRC-4.6-126.96.36.199/SRPMS/mlnx-ofa_kernel-4.6-OFED.188.8.131.52.1.1.ga2cfe08.src.rpm|
|rpmbuild -bp /usr/src/packages/SPECS/mlnx-ofa_kernel.spec|
|/usr/src/packages/BUILD/mlnx-ofa_kernel-4.6/source/ofed_scripts/utils/mlnx_qos -i eth6 –pfc=0,0,0,1,0,0,0,0
/usr/src/packages/BUILD/mlnx-ofa_kernel-4.6/source/ofed_scripts/utils/mlnx_qos -i eth7 –pfc=0,0,0,1,0,0,0,0
/usr/src/packages/BUILD/mlnx-ofa_kernel-4.6/source/ofed_scripts/utils/mlnx_qos -i eth6 –trust dscp
/usr/src/packages/BUILD/mlnx-ofa_kernel-4.6/source/ofed_scripts/utils/mlnx_qos -i eth7 –trust dscp
Step 4. Set the TOS for RoCE ports
Run the following command loop to set the TOS for all RDMA interfaces to 106 :
|for f in
echo “setting TOS for IB interface:” $f
mkdir -p /sys/kernel/config/rdma_cm/$f/ports/1
echo 106 > /sys/kernel/config/rdma_cm/$f/ports/1/default_roce_tos
Step 5. Generate and get the NVMe qualified name on the host and then configure and connect some volumes to it in the Pure Storage Web GUI.
Run the following command to generate the NVMe qualified name(NQN) and retrieve it for later use. An NQN serves the same purpose as an internet qualified name (IQN) for iSCSI or world wide name(WWN) for Fiber channel.
nvme gen-hostnqn > /etc/nvme/hostnqn
Navigate to the Storage View and in the hosts tab create a host. Once this host is created navigate to its management view and in the hosts ports section select the three vertical ellipses and select “Configure NQNs…”
Using the output from cat /etc /nvme/hostnqn copy this value into the dialog and press Add.
Connect the required volumes to this host.
Note the NVMe-roce ports and IP addresses to connect to in the Settings view under the network tab.
Step 6. Load the required NVMe kernel modules and connect the FlashArray volumes using RoCE.
First load nvme-core and nvme-rdma :
Then discover the NQN for the NVMeoF target at the NVMe-roce ports noted in the FlashArray GUI.
|nvme discover -t rdma -a <IP address of initiator>|
Take note of the subnqn in the returned text as this is used while connecting to the storage array :
Discovery Log Number of Records 2, Generation counter 2
=====Discovery Log Entry 0======
For each port to connect to on the FlashArray run the following to connect to all volumes for the relevant host via multiple paths:
|nvme connect -t rdma -a <IP Address> -s 4420 -n <subnqn value>|
Ensure device-mapper multipath is enabled and check the devices which have been returned to it :
|systemctl enable multipathd
systemctl start multipathd
The devices connected will show up as below if configured correctly:
|eui.004236a5adeeca4924a9377e000114de dm-6 NVME,Pure Storage FlashArray
size=1.0T features=’0′ hwhandler=’0′ wp=rw
Step 7. Set the best practice parameters for the NVMe-oF connected devices
Run the following commands to set queue management values for the connected block storage devices :
|for d in dm-2; do
echo 0 > /sys/block/$d/queue/add_random
echo noop > /sys/block/$d/queue/scheduler
echo 512 > /sys/block/$d/queue/nr_requests;
echo 2 > /sys/block/$d/queue/rq_affinity
for c in 0 1; do
echo 0 > /sys/block/nvme$c”n”1/queue/add_random
echo 2 > /sys/block/nvme$c”n”1/queue/rq_affinity
And that is it! Now the devices can be mounted and use the same as any other, with the added benefit of lower latency, comprehensive data services and management tools offered by FlashArray™.