GDPR and the Data Encryption Dilemma

There is a general perception that complying with the European Union’s General Data Protection Regulation (GDPR) can be solved by encrypting all data at the application level. This strategy is expensive and has limitations, but Pure Storage FlashArray can help.


5 minutes

This blog is a follow-up to Alexandra Gartrell’s recent blog, “Are you GDPR Compliant?”

The “Sledge Hammer” Approach to Digital Compliance

The data processor community generally believes that encryption of personal data throughout its digital lifetime—acquisition, processing, and storage—will ultimately be a practical necessity for GDPR compliance. Many in the GDPR consulting and information technology vendor communities recommend[1] encrypting data as it enters the processing chain (at the application server), and limiting access to the unencrypted form to applications that process it.

Data encrypted before it leaves the server is protected “downstream”—in primary storage, when it is sent to other servers, and when it is backed up or archived. On the surface, this approach would seem to be pretty secure, but there are two limitations to consider:

  • Key management: Data sets seldom exist in a single incarnation. They are backed up and replicated, snapshots are taken, they are sent to other servers for analytics, development testing, and other purposes. When data is encrypted at its point of origin, encryption keys must be distributed along with it for every additional usage. Every additional use of a data set widens the circle of applications, users, and administrators privy to its encryption keys. Arguably, this makes data less secure in proportion to how often it is used in different forms.
  • Storage cost: Encrypting data at the point of origin carries a high cost. Increasingly, data processors acquire storage based not on their raw capacity requirements, but on expectations of how well their data reduces. Data reduction —elimination of redundancy prior to storing data—has become a universal expectation for enterprise storage systems. Different data sets reduce differently, but the typical average is between 3:1 and 5:1 or even up to 10:1 when total data efficiency (including thin provisioning) is considered. Put another way, a given amount of data uses a fraction as much storage, rack space, and power when it is reduced.

Reduction removes redundancy from data by performing lossless compression to represent data in fewer bits than in its original form and by deduplicating sequences of incoming data blocks whose contents are already stored. Both techniques rely on being able to identify redundancy.

But encrypting data inherently turns it into random bit patterns. Encryption wouldn’t be worth much if it didn’t. Random bit patterns essentially eliminate the possibility of discovering and eliminating redundancy. Thus, encrypting data at the source virtually eliminates the storage cost advantage of reduction that data processors have come to depend on. While there are ways to deduplicate encrypted data, it’s difficult and slow.

The Cost of Encryption at the Point of Origin

Encrypting data prior to storing it is only the tip of the storage cost iceberg. When data encrypted at the source is copied — for analytics, for development testing, for backup, for archiving, or for disaster protection — two to four times as much storage and bandwidth are consumed compared with copying data encrypted by the storage system.

Moreover, encryption keys are an inherent security weakness because they must be available to systems that analyze copies of it, use copies for development testing, restore backups, and so forth. Every use of a data set widens the circle of systems and individuals with access to its contents. Worse yet, if an original data set is re-encrypted, all users of copies must track keys and use the correct one for each instance they process.

Application-side encryption of personal data is thus a “sledge hammer” solution to digital GDPR compliance—it does the job in a very blunt manner, provided that key security can be managed.  However, it carries significant cost to the processor, and ultimately to individuals because it sacrifices one of the most important storage technology advances of the past decade—data reduction. Pure Storage FlashArrays offer an alternative: they encrypt data as they store it, thus preserving the cost advantage of reduction and mitigating the key management problem.

Encrypting Data in Transit

Data processors typically protect data while it is in transit using readily available hardware and software network encryption tools.

Thus, while GDPR compliance may require some IT architecture redesign, with FlashArray always-on encryption of data at rest, data processors can provide robust end-to-end lifetime data protection while still reaping the cost benefits of data reduction.

Encrypting personal data in transit and in storage protects against “snooping”. For full compliance, such technical measures must be accompanied by policies with interlocking safeguards against misuse due to human error. These might include combinations of (a) controlled access to servers, networks, and storage systems, (b) narrow administrative roles, (c) auditing of administrative actions, and (d) scrupulous maintenance of software and firmware.

A Note on FlashArray Physical Security

Data processors frequently insist on centralized encryption key management, no matter what their purpose or location of use. FlashArrays can manage their data encryption keys internally, but for additional security, they can integrate with Data Security Management (DSM) servers that use the Key Management Interoperability Protocol (KMIP). With centralized key management, the DSM server assists the array in decrypting its key (which is stored encrypted on the array). Without a connection to the DSM server, an array cannot recover its key, and so can neither read nor store data.

For the few circumstances in which a FlashArray’s physical security may be at risk (for example, where an unauthorized individual or group could potentially physically breach the datacenter) removable smartcards can be installed in an array’s controllers. The smartcards contain tokens with which controllers reconstruct the data encryption key. If cards are removed, data can neither be read nor written after a power cycle.

In Summary

Compliance with the digital provisions of GDPR is necessarily an integration of application, server, network, and storage data protection facilities, together with data processor policies for handling and protecting data while it is in digital form. In conjunction with network encryption facilities where needed, FlashArray “always-on” data encryption can aid compliance by keeping data secure while it is “at rest,” while still retaining the cost advantages of reduction and the security advantages of minimal key management and distribution.

 

[1] See European Union Agency For Network and Information Security

(ENISA) Handbook of Security on Personal Data Processing, December 2017

https://www.enisa.europa.eu/publications/handbook-on-security-of-personal-data-processing/@@download/fullReport