In this blog, I would like to show you how to optimize Apache Cassandra deployment on Pure Storage FlashArray//X. Apache Cassandra is an open-source, distributed, wide column store, NoSQL database management system designed to handle large amounts of data across many servers, providing high availability with no single point of failure.
FlashArray//X is the first all-flash, 100% NVMe storage solution designed for all your apps – both mainstream enterprise and next-gen web-scale. Delivering up to 3PB effective in 6U with support for FC, iSCSI, and NVMe over Fabrics connectivity via DirectFlashTM technology. FlashArray//X has proven 99.9999% availability, and this is very important for Apache Cassandra which is also known for high-availability.
Best practices for Apache Cassandra
Let us now look at the best practices for Apache Cassandra deployment on FlashArray//X. The operating system used was Centos 7.5 for Apache Cassandra deployment.
Cassandra data: /var/log/cassandra/data ->XFS
Commit logs: /var/log/cassandra/commitlog ->XFS
2. Configuration of udev rules: The device manager of the kernel needs to be configured as shown below. Most important parameters to be changed are nr_requests and scheduler. Please set parameters has shown below for Pure Storage as shown below:
# Use noop scheduler for high-performance solid-state storage
echo noop > /sys/block/device_name/queue/scheduler
# For nr_requests which set the max queue for read and write requests
For commit logs:(Optimized for low latency)
echo 2> sys/block/device_name/queue/nr_requests
For Cassandra data:(Optimized for Asynchronous IO/Burst IO for Memtables flushing periodically)
echo 1024> sys/block/device_name/queue/nr_requests
3. Multipathing on Pure Storage FlashArray//X: Multipathing needs to be set up to do queue-length for all PURE LUNs by configuring it in /etc/multipath.conf.
The file contents of multipath.conf are shown here:
cassandraseed:~ # cat /etc/multipath.conf
path_selector “queue-length 0”
4. Compression for Keyspaces: Turning off compression for Cassandra keyspaces produced the best possible results. It was also good to in terms of data reduction on the array as well. With compression turned-off, the data reduction on the FlashArray//X for Cassandra data was in the range of 2.4-3:1. So Cassandra data was reduced by a factor of 3 with replication factor 3.
5. Disable transparent huge pages: Apache Cassandra allocates memory based on 4K pages, so it is very important to disable transparent huge pages.
echo never > /sys/kernel/mm/transparent_hugepage/defrag
6. Disable swap: This is a very important configuration for Cassandra as it can result in very bad performance. This is due to the fact that the Apache Cassandra database has many replicas and it is preferred for a replica to die quickly when memory is low rather than swapping. This ensures high performance as Cassandra does not continue writing to the slow replica due to swapping and redirect to another node which has no issues in terms of memory.
sudo swapoff –all
7. User limit configuration: Set the following limits on the cassandra user account using ulimit
Set the nproc limits to 32768 in the /etc/security/limits.d/90-nproc.conf configuration file:
cassandra_user – nproc 32768
cassandra_user – memlock unlimited
cassandra_user – nofile 1048576
cassandra_user – as unlimited
Add the following line to /etc/sysctl.conf:
vm.max_map_count = 1048575
8. Disable CPU frequency scaling: The CPU speed scaling needs to be disabled for Apache Cassandra, if not it causes significant performance loss.
Also, for JVM configuration and heap configuration please follow the best practices by Apache Cassandra or DataStax.