Best Practices for Deploying Apache Cassandra on FlashArray//X

Learn how to optimize Apache Cassandra deployment on Pure Storage FlashArray//X.


3 minutes
image_pdfimage_print
cassandra best practices

Apache Cassandra is an open-source, distributed, wide column store, NoSQL database management system designed to handle large amounts of data across many servers, providing high availability with no single point of failure.

FlashArray//X is the first all-flash, 100% NVMe storage solution designed for all your apps – both mainstream enterprise and next-gen web-scale. Delivering up to 3PB effective in 6U with support for FC, iSCSI, and NVMe over Fabrics connectivity via DirectFlashTM technology. FlashArray//X has proven 99.9999% availability, and this is very important for Apache Cassandra which is also known for high-availability.

Best practices for Apache Cassandra on FlashArray//X

Let us now look at the best practices for Apache Cassandra deployment on FlashArray//X. The operating system used was Centos 7.5 for Apache Cassandra deployment.

  1. File System for Cassandra data and commit logs: XFS is the best file system for deploying Apache Cassandra’s data and commit logs on FlashArray//X. The default options with XFS file system was good enough for both Cassandra data and commit logs.

                    Cassandra data: /var/log/cassandra/data ->XFS

                    Commit logs: /var/log/cassandra/commitlog ->XFS

        2. Configuration of udev rules: The device manager of the kernel needs to be configured as shown below. Most important parameters to be changed are                          nr_requests and scheduler. Please set parameters has shown below for Pure                        Storage as shown below:

# Use noop scheduler for high-performance solid-state storage

                    echo noop > /sys/block/device_name/queue/scheduler

# For nr_requests which set the max queue for read and write requests

For commit logs:(Optimized for low latency)

                   echo 2> sys/block/device_name/queue/nr_requests

For Cassandra data:(Optimized for Asynchronous IO/Burst IO for Memtables                         flushing periodically)

                   echo 1024> sys/block/device_name/queue/nr_requests

      3.   Multipathing on Pure Storage FlashArray//X: Multipathing needs to be set up to do queue-length for all PURE LUNs by configuring it in /etc/multipath.conf.

The file contents of multipath.conf are shown here:

cassandraseed:~ # cat /etc/multipath.conf

                                  devices {

device {

vendor “PURE”

path_selector “queue-length 0”

path_grouping_policy multibus

path_checker tur

fast_io_fail_tmo 10

dev_loss_tmo 60

no_path_retry 0

}

}

4.  Compression for Keyspaces: Turning off compression for Cassandra keyspaces produced the best possible results. It was also good to in terms of data reduction on the array as well. With compression turned-off, the data reduction on the FlashArray//X for Cassandra data was in the range of 2.4-3:1. So Cassandra data was reduced by a factor of 3 with replication factor 3.

5. Disable transparent huge pages: Apache Cassandra allocates memory based on 4K pages, so it is very important to disable transparent huge pages.

echo never > /sys/kernel/mm/transparent_hugepage/defrag

6. Disable swap: This is a very important configuration for Cassandra as it can result in very bad performance. This is due to the fact that the Apache Cassandra database has many replicas and it is preferred for a replica to die quickly when memory is low rather than swapping. This ensures high performance as Cassandra does not continue writing to the slow replica due to swapping and redirect to another node which has no issues in terms of memory.

sudo swapoff –all

7. User limit configuration: Set the following limits on the cassandra user account using ulimit

Set the nproc limits to 32768 in the /etc/security/limits.d/90-nproc.conf configuration file:

cassandra_user – nproc 32768

cassandra_user – memlock unlimited

cassandra_user – nofile 1048576

cassandra_user – as unlimited

Add the following line to /etc/sysctl.conf:

vm.max_map_count = 1048575

8. Disable CPU frequency scaling: The CPU speed scaling needs to be disabled for Apache Cassandra, if not it causes significant performance loss.

Also, for JVM configuration and heap configuration please follow the best practices by Apache Cassandra or DataStax.