Apache Cassandra with Pure Storage—what do you need to know? Cassandra is a NoSQL database that works perfectly with Pure Storage FlashArray//X and Cloud Block Store. In this blog, I would like to summarize what advantages Pure Storage FlashArray//X and Cloud Block Store bring to Apache Cassandra deployments. The benefits whether deployed on FlashArray//X or Cloud Block Store are the same in terms of Data reduction, compaction savings, and snapshot advantages.

Data Reduction

FlashArray//X and Cloud Block Store is known for its data reducing capabilities which include compression, deduplication, zero detection, and thin provisioning. When it comes to Apache Cassandra data reduction with Replication Factor 3 is around 2.4-3.0. This means it can reduce Apache Cassandra data footprint by a factor 2.4-3.0, hence reducing the overall consumption on AWS storage. This is tested by turning off compression on Cassandra tables.

Compaction Storage savings

Compaction is a process that needs to be run on Apache Cassandra clusters all the time. This is a process that will merge the sstables ( immutable after being written to disk) to prune deleted data and merge disparate row data into new sstables. This will save a lot of disk space and increase read performance. In EBS deployments we need to allocate 50-100% more space for each Apache Cassandra node for the compaction process as it creates new sstables. But when deploying Apache Cassandra on FlashArray//X and Cloud Block Store, due to thin provisioning, space is dynamically used when compaction process runs, thereby saving all the additional space on AWS.

See the below example for an Apache Cassandra deployment on EBS vs using Cloud Block Store. In this example we are comparing 6 nodes Cassandra clusters with each node having 5TB of data. The Cassandra deployed completely on EBS would need to have 15TB more storage for the entire cluster (50% more space which is 2.5TB additional space on each node for Compaction). For Cassandra deployed on Pure Storage due to thin provisioning feature it will only translate to an additional 5TB for Compaction(Compaction is run at maximum two nodes at a time). This translates to huge storage savings and as the cluster grows storage savings become even more.

High Availability with ActiveCluster

Apache Cassandra is known for its high availability. FlashArray//X and Cloud Block Store is also a highly resilient enterprise storage on AWS providing HA in a single Availability Zone. If your Apache Cassandra clusters needs additional resiliency across multiple Availability Zones, ActiveCluster can be deployed with two Cloud Block Store instances. ActiveCluster is a fully symmetric active/active bidirectional replication solution that provides synchronous replication with RPO/RTO zero capability at the storage layer. ActiveCluster can be set up within or across multiple sites enabling clustered hosts and applications to be deployed into resilient Active/Active datacenter configurations. So now if one instance of Cloud Block Store is unavailable, your Apache Cassandra cluster will seamlessly run on the other Cloud Block Store cluster as if nothing has happened.

Instant Cluster copy/backup using snapshots

FlashArray//X and Cloud Block Store snapshots are instantaneous and does not consume any additional space.FlashArray//X and Cloud Block Store snapshots will be used to make instant Cassandra cluster copy to create a dev or test clusters or it can also be used to take an instant backup of the Cassandra cluster. Traditional approach of doing backup of the Cassandra cluster is using Cassandra snapshots. Cassandra snapshots are hard links that consume a lot of disk space, usually 30-50% more. With FlashArray//X and Cloud Block Store, snapshots for backup/cluster copies will not consume any additional space.

Here is the link to the blog which is a detailed process on performing Cluster copy: https://blog.purestorage.com/purely-technical/copying-apache-vcassandra-cluster-running-on-vmfs-datastore-pure-storage-flasharray-x-snapshots/

Instant failed node replacement using snapshot

Another use case for FlashArray//X and Cloud Block Store includes replacing a failed Cassandra node. Here I am showing the replacement of a node up for maintenance or a failed node in Apache Cassandra cluster with a new healthy node using Cloud Block Store snapshots. Without FlashArray//X and Cloud Block Store, Apache Cassandra administrators would need to use either the node repair process or would have to copy the data from the failed node to a healthy new node. This is a very slow and painful process and can also slow down your Apache Cluster considerably which in turn affects your business transactions.

Here is the detailed blog:

Apache Cassandra Rapid Node Replacement Using Snapshots


As seen above, there are many advantages in terms of Storage savings and operational benefits for Apache Cassandra on FlashArray//X and Cloud Block Store. This would help you greatly, reducing the storage costs with data reduction capabilities of Pure Storage, compaction savings and using snapshots to backup Cassandra clusters. Additionally, there are operational benefits like instant backup/recovery and instant replacement of Cassandra node which would save a lot of effort, time and resources.