string(7) "English"

Hey Pure! What’s Your Flavour of Cheesecake?

So I just came back from an amazing OpenStack summit in Austin, TX where over 7,500 people attended to hear about the current and future direction of OpenStack. One of the new features in the Cinder Block Storage project and part of the Mitaka release of OpenStack is a new version of block storage replication for Cinder. As this is a pretty cool new feature I thought I do a write-up (sorry it’s a bit long) on it from a Pure Storage perspective.

mitaka

 

Officially v2.1, but affectionately known as ‘Cheesecake’, this is simple, single use case, replication model. It is a starting point for much more advanced replication functionality which we hope will have its next iteration in the Newton release and is code named ‘Tiramisu’.

What we will look at here is Cheesecake, how to configure it, how it is implemented in the context of Pure Storage FlashArrays, and then how you can leverage recovery of your replicated data under the use case defined in the original OpenStack specification (https://specs.openstack.org/openstack/cinder-specs/specs/mitaka/cheesecake.html). This scenario assumes you find your array has been replaced by a smoking hole in the ground, or that your data center water-balloon fight went horribly wrong!!

Setting up Cheesecake

To enable the replication functionality in Cinder you need to provide a replication_device parameter in vendor-specific backend stanza of the primary, or source, array. This parameter defines a single replication target array and has the minimum requirement of a backend name key/value pair and then some vendor specific values to define that remote backend. In the case of a Pure Storage FlashArray, you would provide three key/value pairs defining the remote, or target, array; these key/value pairs being backend_name, san_ip and api_token. For other backend arrays you will need to consult the specific vendor’s documentation.

An example of this parameter for a Pure Storage FlashArray would be:

The Cheesecake specification allows for multiple replication_device parameters to be provided should you need, and your backend arrays can support, multiple replication targets for your volumes.

Within the Pure Storage implementation of Cheesecake there are a number of other parameters you can add into the source array stanza to define the frequency and retention period for the replication of your volumes to your remote array(s). More details on these can be found in the Pure Storage Mitaka Cinder Best Practice document available in the Pure Storage Community website (http://community.purestorage.com).

Once you have configured your Cinder configuration file to have the replication device(s) defined and you have restarted the Cinder volume service, you need to define which volumes are actually going to be replicated by Cheesecake.

This is simple enough, in that it uses volume types. You need to create a volume type that will be associated to any volume required to be replicated, with the following extra-spec defined for the volume type:

It is important to note that you MUST use the syntax exactly as above, including case, or the volume replication will not work.

Pure Implementation

Now that Cheesecake replication has been configured in Cinder how is it actually implemented on your specific backend array?

Here we will look at the implementation method for a Pure Storage FlashArray to a single target Pure Storage FlashArray. For other vendor’s implementations you will need to consult their documentation.

Before setting up the Cheesecake replication the source and target Pure Storage FlashArrays must already have been made aware of each other and are in a ‘connected array’ state. Refer to your Pure Storage User Guide for details on configuring this connection.

As soon as you (re)start the cinder volume service with a replication_device parameter in your Pure backend stanza the source array will immediately attempt to communicate with the defined remote Pure Storage FlashArray and check that the source and remote array are already connected from a replication perspective. At this point the Cinder driver will create a Pure Storage ‘protection group’ on the source array called cinder-group that is linked to the remote array. This will be empty initially, ready for volumes that need to be replicated to be added to the group. The protection group will inherit either the replication interval and retention parameters defined in the cinder configuration stanza for the source array, or the default values if nothing was defined.

The ‘protection group’ will look like this in the Pure Storage GUI after initial creation
Pure GUI showing initial PG
(Notice that the array name is visible in the top right corner of the screenshot – keep checking this to make sure I am flipping between two different arrays and not just cheating!!)

with the detail on the actual group showing the remote array to which the volumes will be replicated:
cheesecake-2

and the replication and retention schedule for the group:
cheesecake-3
From a CLI perspective this same group would be shown as:

So now our replication base configuration is ready we can start to replicate actual volumes, but as mentioned previously, we must also create a volume type for replication with the appropriate extra-spec.

So let’s create a volume type:

and now assign the correct extra-spec to it and check it is OK:

Now we can create a Cinder volume with the volume type for replication:

Notice that there is a parameter replication_status which is set to disabled. This is an artifact of a previous version of replication code within Cinder and is not used for this release of the replication code, so it can be safely ignored.

Now we have created this volume let’s look at the source Pure Storage FlashArray and see the volume has been added to the ‘protection group’
cheesecake-4

and if we look on the remote/target array you can see the replicated device copy of the source volume:
cheesecake-5

Performing a Failover

Now that we have our replicated volume on the target array we need to actually use this replica in the event of a disaster where the source array is now a ‘smoking hole in the ground’, not wanting to be too dramatic! Also we are assuming here that the target array is accessible by the same Nova availability zone as the source array using the same storage protocol. The first thing to acknowledge here is that the Cheesecake version of replication is not the most intelligent and has no hooks into Nova to let the compute instances understand that their storage backend has failed and that IO to your application will be non-existent. We hope that this will be improved upon in the Newton release, but let’s continue to look at the Mitaka implementation. We now need to issue a command to make the target array’s replicated volumes become available for read/write on your Nova instance:

Where devstack@puredriver-1 is the value in the ‘host’ field from cinder service-list.

Note: There is an optional freeze-host command than can be used after the failover which puts the host service in a read-only state from the control plane perspective. Data access for existing volumes is maintained, but all new control plane commands such as creates, deletes, extends, etc will be rejected. The reasoning behind this is that it allows the cloud administrator time to evaluate the failover and ensure that the target array is capable of dealing with all the failed over volumes and hasn’t, for example, become over utilized. If the freeze command has been issued, there is an equivalent thaw-host command that can be run when the cloud administrator is ‘happy’ with his failed over environment.

We can check that our failover has completed successfully using the following command:

We can see here that we are now in a failed-over state and the active backend is the target array name.

What has happened on the Pure Storage remote FlashArray is that the latest replication snapshot has been converted into a full read-write volume, which we can see here.
cheesecake-6

Notice that the volume name is still the same on the failover array as it was on the source array.

So what does this mean from a data recovery perspective and from continuing to use your Cinder service?

From a continued usage perspective the Pure Storage driver is aware that the primary array is no longer available and all requests to create volumes on that array will be proxy-forwarded to the failover array, even if that array is NOT managed by your OpenStack Cinder volume service, but to do this we need to re-enable the Cinder volume service for the primary Pure Storage backend, as the failover-host command will have disabled this, to allow the Pure Storage Cinder driver to correctly deal with incoming volume requests. This is done using the following command:

Once this is done all requests will be honored. For example if I issue the following command, which would normally create a volume on the source array (because our default volume type points to the source backend)

The Pure Storage volume driver will redirect the API call to the failed-over array and create the volume there, even though the output of the create thinks the volume was created on the source array. We can see the created volume on the target array here:
cheesecake-7

So what happens to the Nova compute instances where their backend storage has been failed over? This is where Cheesecake becomes a little clunky and manual intervention is required, although we are hoping this will be better managed in the Newton release.

Obviously there is now no original backend array and there is a hanging ‘zombie’ volume and associated connection to the array (either iSCSI of Fibre Channel), so we need to disconnect the volume from the compute instance using the standard nova volume-detach command, but although this command will do everything it needs to it will actually come back with a failure as it does not get back the correct response to the connection removal of the compute node to the backend storage, eg the iSCSI connection detach will fail as there is now no storage array.

If necessary you can force Cinder to realize that the volume is viable, but on a different backend by using the cinder reset-state command, but only use this if the source array is truly a ‘smoking hole’ and you aren’t just doing a test. In the test scenario given here the nova volume-detach will complete successfully.

All we now need to do is run the nova volume-attach command to reconnect the failed-over volume back to the Nova compute instance. This is, of course, assuming that the failed-over volume and array is in the same availability zone as the Nova instance.

Is there failback?

The next obvious step is to failback your volumes to the source array, but remember this implementation of replication has a use case of the source array being completely dead, so there is no mechanism for failing back.

Now if you install a new array to replace your dead one you can make that array a replication target of the failed-over array and then force a failover back to that new array, but there are all sorts of Cinder database gyrations that need to be performed as well as restarting your Cinder volume services to understand that your failed-over array is now a managed source array and your new array is now a replication target.

I’m not going to cover this here, but was a presentation at the OpenStack Spring 2016 Summit in Austin, TX talking about this and the video of that is available now on the OpenStack website.