PostgreSQL, also known as Postgres, is one of the most popular databases available today. According to DB-Engines, as of November 2021, it’s the fourth most popular database management system. The database is vastly popular because it’s open source, free to use, community driven, and highly extensible, allowing for flexible customization. It’s designed to handle a number of workloads, including transaction processing and data warehousing. This makes it a fantastic option for storing business-critical data.
According to IDC, in 2020, 64.2ZB of data was created or replicated, and between 2020 and 2025, data creation and replication will reach a compound annual growth rate of 23%. This projected data growth includes both structured and unstructured data but is still an enormous amount. With this massive data growth to consider and a world where new threats like ransomware are becoming more pervasive, protecting data stored in PostgreSQL databases will be one of the foremost considerations for many organizations.
Data protection with a PostgreSQL database can be implemented using a number of methods and utilities. These tools and methods are explained below:
|pg_dump/pg_restore||This is a logical backup utility included with the standard PostgreSQL installer. Data is read from the database and exported to one or more files. In some scenarios, pg_restore will use data exported by pg_dump to restore data.|
|pg_basebackup||This is a physical backup utility included with the standard PostgreSQL installer. Data is copied from the data directory and any tablespaces to an intended target location in a filesystem.|
|Storage snapshots||Storage snapshots capture the state of a database’s storage at a point in time. The captured state can then be restored or copied to the same system or a different one. Storage snapshots are created either as a part of enterprise shared storage or logical volume management capabilities.|
|Continuous archiving||Continuous archiving copies write-ahead-log (WAL) segments to an intended location. This allows for point-in-time recovery capabilities as every change is archived to a third location.|
|pgBackRest||This Crunchy Data or the GitHub repository. This utility provides significant enterprise and performance features for data protection. is a physical backup utility that can be procured from|
Comprehensive Backup and Recovery for PostgreSQL with pgBackRest
While there are a number of tools and methods to protect PostgreSQL, not all are created equal, nor do they provide the same functionality. What’s important is the target storage that is used with the method or utility. It needs to be easy to use, provide rich data services that enhance data protection strategies, and deliver the best performance and capacity at scale. Pure Storage® FlashBlade® provides all of these important characteristics for PostgreSQL backup and recovery.
pgBackRest is a reliable and easy-to-use backup and recovery solution for PostgreSQL. It’s open source, and the source code is available on GitHub. Crunchy Data provides a public repository to assist with the installation, as well as comprehensive documentation on how to use the tool.
pgBackRest provides a number of features that can’t be found in the utilities provided with the installation. These include:
- Parallel backup and restore: Provides the ability to achieve the maximum throughput capabilities for backup and recovery operations when utilizing a server with multiple cores.
- Multiple repositories: Allows for more granular control over the locality of backups. With multiple repositories, different storage mediums can be used to segregate mediums with different retention times.
- Full, incremental, and differential backups: Shortens the time it takes to complete backup operations by only backing up the files that have changed since the last full backup. Combining different backup types can lead to both resilient and efficient data protection.
- Backup integrity: Checks files during a restore to ensure that they remain consistent with the files that were backed up.
- Backup resume: Enables an interrupted backup operation to be resumed from the point where it was interrupted. There’s no need to start a backup from the beginning.
- Delta restore: Makes it possible to restore an isolated database or shorten restore times by only restoring files that are different to the point when the backup was taken. Parallel processing combined with delta restore can drastically reduce recovery times.
- Parallel, asynchronous WAL push and get: Accelerates the often time-consuming process of sending and retrieving data for the WAL archive. Typically, the WAL operates in a serialized fashion for both backup and recovery. pgBackRest allows for asynchronous sending and retrieval of WAL archive segments. This ensures that systems with a high write volume don’t slow down, or that during recovery, repositories with a higher latency don’t slow down the recovery process.
- Tablespace and link support: Fully supports clusters with tablespaces and allows them to be remapped to different locations on recovery.
- Support for multiple repository types: Allows repositories to be located in storage mediums such as block storage (with a filesystem), NFS, and object storage such as S3.
- Encryption: Encrypts repositories to ensure that sensitive data is protected wherever it’s stored.
Accelerate pgBackRest with FlashBlade
pgBackRest can create repositories on FlashBlade using either NFS or S3 object storage. FlashBlade provides its own hardware compression capabilities, but organizations can architect a backup and recovery strategy that best suits their performance and capacity requirements.
Combining FlashBlade with pgBackRest for backup and recovery capabilities provides the following benefits for PostgreSQL environments:
- Restore at hundreds of TB/hour: FlashBlade delivers Rapid Restore for production and test/dev workloads with up to 270TB/hr data recovery performance at scale.
- Multi-protocol support for file and object: FlashBlade unifies file and object, allowing for multiple pgBackRest repositories, each with a different use case such as taking advantage of file solutions such as SafeMode™ snapshots.
- Ransomware remediation with SafeMode: Recover quickly from potential ransomware attacks with the use of SafeMode snapshots. Ransomware can’t delete, modify, or encrypt a SafeMode snapshot. In the event of a ransomware attack, an organization can quickly recover to a point in time when ransomware wasn’t present on a system using a protected and reliable file share storage snapshot.
- Greater resiliency with replication: File shares and objects can be replicated to other FlashBlade systems or even an Amazon Web Services S3 (object only) bucket to provide third-site disaster recovery capabilities.
Proven Performance Excellence with FlashBlade and PostgreSQL Backup Utilities
The technical white paper, “Protecting PostgreSQL with FlashBlade,” explores the different utilities and methods for performing backup and recovery operations.
Some of the fantastic performance results have been explored at scale and are highlighted here. To prove the performance capabilities of FlashBlade with the various utilities, eight distinct PostgreSQL clusters were created on separate hardware platforms and each populated with one terabyte of database data. The technical white paper explores the performance capabilities when protecting a single cluster with different options to protect multiple clusters at the same time to a single FlashBlade.
When analyzing the results, it’s important to understand that each utility may have different terminology used to specify how a single backup or recovery operation will scale. For example, pg_dump can increase performance of a single instance through the use of the jobs argument while pgBackRest uses processes to achieve the same outcome. Each test scenario shown here showcases the best performance achieved across a range of configuration options.
Backup Performance at Scale
When analyzing backup performance between the different utilities, pgBackRest was the fastest with 38.3TB/hour backing up eight clusters in parallel with 128 processes using compression. The use of compression with pgBackRest vastly increased each host CPU usage to reach this performance capability. Without compression, pgBackRest still achieved 22.83TB/hour with significantly less CPU usage. Compression with pgBackRest is likely more useful during quiet hours when there are CPU cycles to spare.
Recovery Performance at Scale
Recovery performance is where the combination of pgBackRest and FlashBlade highlights its true value. Achieving significantly faster performance than pg_restore, pgBackRest with and without compression was capable of recovering eight PostgreSQL clusters in less than 20 minutes.
It’s Not All Performance
The ability to back up and recover PostgreSQL clusters in a timely fashion is incredibly important, especially as data footprints in these databases become larger over time. But that’s not the only consideration for a data protection strategy. FlashBlade offers rich data services that can further protect organizations and their data from disaster scenarios. For ransomware remediation capabilities, SafeMode snapshots can be used for file shares and disaster protection for object storage can be accomplished using object replication.