Storage administrators and database administrators can take advantage of FlashArray™ data services to improve the application availability. In this blog we will explore how to effectively utilize snapshots on FlashArray with MongoDB replica set.
Volume snapshots are immutable, point-in-time images of the contents of one or more volumes.
FlashArray User Guide.
All Pure Storage® FlashArray devices support volume snapshots. Because of the innovative technology used to implement the volume snapshots, snapshot creation and its lifecycle management does not impact an array’s performance. Generally snapshots are classified as crash consistent or application consistent. Crash consistent snapshots preserve the data at the moment of snapshot but without the application awareness that snapshot is created. It is conceivable that open files will not be properly protected with a crash consistent snapshot. Application consistent snapshots ensure that all pending transactions and file system operations are completed at the time the snapshot is created. Application consistent snapshots provide reliable means of full data recovery.
MongoDB has been designed and developed with high availability features which are provided by replica sets. A replica set duplicates database data across all replica set members. At a minimum, three servers are required to configure the replica set. One of the replica set members serves as a primary with two or more secondaries (slaves). By default, the primary replica set member is responsible for read and write database operations. In the event of a primary failure, the remaining members elect a new primary. MongoDB 4.0 supports up to fifty nodes in a replica set with seven members participating in the primary member election. Besides high availability, the replica set model provides great flexibility in situations where software and hardware updates are required. Nodes can be safely evacuated from a replica set without impacting database performance and availability. Moreover, delayed replica set members can be utilized for a database point in time recovery operations or backups. Replica set members can be geographically dispersed and located in different data centers. An example of a five node MongoDB replica set is shown in Figure 1.
In most cases, MongoDB replica sets have been deployed on servers with Direct-Attached Storage (DAS). While using servers with DAS offers some benefits, there are also many challenges associated with this deployment model such as lack of integrated data services (data reduction and snapshots). The benefits of FlashArray data reduction are covered in MongoDB: (Disk) Space The Final Frontier blog. To determine how volume snapshots can benefit MongoDB FlashArray deployments, let us examine the replica set member failure or scheduled maintenance activity and corresponding recovery scenario. In a MongoDB replica set, the primary node maintains oplog (operations log) which is a rolling record of all inserts and updates. Secondary nodes also posses the oplog copy which is asynchronously updated from the primary. If the secondary node is unavailable for an extended period of time, the oplog on the primary may be overwritten without being duplicated to the secondary. Once the oplog has been overwritten or the disk subsystem on the failed node has been corrupted or lost, content of the entire database must be copied from another surviving node. This process, depending on the size of the database, workload and the network throughout can be time consuming. During the replication process, MongoDB replica set is at higher risk of failure should another node or nodes malfunction.
This risk can be easily reduced or mitigated if MongoDB data files are stored on FlashArray. By utilizing volume snapshots, the failed node may be recovered much faster than copying database files across the network. The possible MongoDB replica set with a FlashArray deployment is shown in Figure 2.
By utilizing MongoDB’s replica set high availability features and FlashArray snapshots, the failed node recovery process is simple and efficient. The steps outlined below provide a general MongoDB replica set node recovery guidelines where database disks (dbpath) are located on FlashArray volumes. See Figure 3 (animated).
Note: If the volume snapshot (step 3) is not required for longer term retention, volume copy with overwrite option (purevol copy –overwrite) may be executed eliminating step 5.
Figure 3 (animated).
The criterion for successful recovery was when the tested node reached the SECONDARY state. The node status can be obtained by executing rs.status() command.
In Pure Storage lab, the recovery of a 230 GB idle (no workload) database on the secondary replica set member with an empty dbpath (no database files) using MongoDB replication mechanism took approximately eighteen minutes. Recovering the same secondary node using FlashArray snapshots completed in less than a minute after starting mongod process.
Repeating the same test scenario with database under load (50% reads 50% updates) the recovery time on node with an empty file system (dbpath) took approximately twenty five minutes. Restoring identical node with FlashArray snapshot, the node state change from STARTUP2 to SECONDARY occurred in less than a minute. See table below.
|MongoDB Replication||FlashArray Snapshot||Improvement|
|No Workload||18 minutes||1 minute||18x|
|Workload (50%reads 50% updates)
||25 minutes||1 minute||25x|
All data services such as snapshot, Snap to NFS, CloudSnap, data reduction, synchronous and asynchronous replication are included with all FlashArray models. By taking advantage of volume snapshots, MongoDB replica set instances can be quickly recovered reducing the node downtime and without increasing the network congestion and bandwidth requirements. Furthermore, adding a new replica set node will be also greatly simplified and significantly faster than native MongoDB replication.
As an additional benefit, utilizing Snap to NFS or Purity CloudSnap™ and MongoDB delayed replica set members, FlashArray volume snapshots can be offloaded to another media providing convenient and inexpensive (lower storage cost) means of maintaining multiple point in time database recovery options.