Better PostgreSQL Slave Backups



Written By : Bryan Traywick


December 18, 2013

There are several options to backup a PostgreSQL database. The standard pg_dump command ships with PostgreSQL and is very easy to use. Another option is to create a file system level backup by directly copying the files that PostgreSQL uses to store the data in the database. It is also possible to use PostgreSQL’s write ahead log (WAL) to continuously archive the database. However, each of these approaches have drawbacks that we will discuss in the next section.

What’s wrong with the existing PostgreSQL backup strategies?

Each of the backup strategies discussed in the PostgreSQL documention have their own set of advantages and disadvantages. I will discuss those tradeoffs and why we decided against using each one.

SQL dump with pg_dump

Creating a SQL dump of the database with pg_dump is very simple for both the backup and the restore process. Unfortunately, it can take a long time to backup and restore a large database with pg_dump. Since backups take a long time, if the backup is taken on the slave database, there is a chance that the slave could fall too far behind the master database and the backup will be aborted. You can try to reduce the chance of this happening by running the backup in the middle of the night when the site is less busy. But, if the database is large and the site is busy enough at all times of the day, it may never be possible to create a backup with pg_dump on the slave server.

File system level backup

Using file system level backup to directly copy the files that PostgreSQL uses to the store the data in the database is another option. You create a backup by using tar to archive and optionally compress the PostgreSQL data directory. For example:

tar czf backup.tar.gz /var/lib/postgresql/9.2/main

A file system level backup is fast, but it does have one major restriction: the database server must be shut down in order to get a usable backup. This restriction makes it a non-starter for anyone with a single database server. However, if you have a master/slave cluster it could be an attractive option. Just shutdown the slave, backup its data directory, and restart the slave and let it catch back up to the master using streaming replication. This will work for most people, but if the master got too far ahead of slave during the backup process then replication will have to be setup by copying a fresh basebackup from the master to the slave. If this happens you may be left with just as much, if not more, downtime to rebuild replication as taking down the master to perform the backup would have taken.

Continuous Archiving and Point-in-Time Recovery (PITR)

Continuous Archiving is a backup approach that takes advantage of PostgreSQL’s write ahead log (WAL). The log records every change made to the database’s data files and forms the basis of PostgreSQL’s streaming replication. Using continuous archiving for backup purposes involves creating a base backup as the starting point and then just archiving every WAL file created as changes are made to the database.

There are several advantages to continuous archiving backups discussed in PostgreSQL’s documentation on the subject. However, the main drawbacks are that setup is difficult and the size of the backups may be large – both due to the ever increasing number of WAL files and the fact that the default WAL format is fairly bulky.

A better approach with OmniPITR

OmniPITR is in their own words, “a set of scripts to ease setting up WAL replication, and making hot backups from both Master and Slave systems”. OmniPITR is able to manage all aspects of PostgreSQL streaming replication. It contains scripts for archiving and shipping WAL segment files from the master to the slave, restoring the WAL segments on the slave, and creating hot backups on both the master and slave database servers. It is the slave backup script, omnipitr-backup-slave, that we are most interested in today.

The omnipitr-backup-slave is able to create a ‘hot backup’ on a PostgreSQL slave server. A hot backup is a backup done by backuping up the data directory much like the file system level backup. However, unlike normal file system level backups the OmniPITR takes special care to allow the backup to be taken without shutting down the database.

The omnipitr-backup-slave script first creates a trigger file that pauses the removal of WAL archives. The script then tells the master to create a backup label. This causes PostgreSQL to create a ‘snapshot’ of the data and ensures that all of the data at the time of the snapshot is consistent on disk. Next, the script creates a file system level backup of the PostgreSQL data directory on the slave server. This is the core of the backup and will contain all of the data at the time that the snapshot was taken in a consistent manner. The script then tells the master that we are finished making backups and compresses all of the WAL archives created during the backup process. Finally, the trigger file that was created in the first step to pause the removal of WAL archives is removed so that normal WAL archive removal will take place.

The two backup files created by omnipitr-backup-slave, one containing the data directory backup and the other containg the WAL segment files created during the backup process, can simply be extracted ontop of each other on a fresh database server or on your local machine to restore the database. The backup is quick since it uses a file system level backup for the bulk of the data, and isn’t succeptible to the problem of the slave falling behind since the master server is still processing incoming WAL segment files while the backup is taking place.

Easy PostgreSQL Slave Backups with Moonshine OmniPITR

OmniPITR is a very powerful tool, and as such it provides a lot of options. At Rails Machine we like to automate the installation and configuration of every tool we use. This was no different with OmniPITR which is why we created moonshine_omnipitr.

Moonshine OmniPITR automates the installation and configuratino of the omnipitr-backup-slave script – as we learn more about OmniPITR and it’s capabilities we may add additional functionality. Installation works just like any other moonshine plugin:

# Rails 2.x.x
script/plugin install git://github.com/railsmachine/moonshine_omnipitr.git
# Rails 3.x.x
script/rails plugin install git://github.com/railsmachine/moonshine_omnipitr.git

Once it is installed, you can include it in your database manifest:

# app/manifests/database_manifest.rb
class DatabaseManifest < Moonshine::Manifest::Rails

  include Moonshine::Omnipitr

  # other recipes and configuratino omitted

  # tell DatabaseManifest to use the omnipitr recipe
  # omnipitr should only be setup on the slave database server
  if Facter.hostname =~ /^db2/
    recipe :omnipitr
  end
end

Moonshine Omnipitr automatically handles uploading the backups to your Amazon S3 bucket. For this to work it needs access to your Amazon S3 account:

# in global config/moonshine.yml
:omnipitr:
  :s3_bucket: 'your-bucket'
  :s3_prefix: 'backups/'
  :s3_key: 'YOUR_ACCESS_KEY'
  :s3_secret: 'YOUR_SECRET_KEY'

# in stage-specific moonshine.yml,
# config/moonshine/staging.yml and config/moonshine/production.yml
:omnipitr:
  :s3_bucket: 'your-bucket'
  :s3_prefix: 'backups/'
  :s3_key: 'YOUR_ACCESS_KEY'
  :s3_secret: 'YOUR_SECRET_KEY'

# `configure` call in app/manifests/database_manifest.rb
configure :omnipitr => {
  :s3_bucket => 'your-bucket',
  :s3_prefix => 'backups/',
  :s3_key    => 'YOUR_ACCESS_KEY',
  :s3_secret => 'YOUR_SECRET_KEY'
}

All of the configuration options are required except for :s3_prefix. The prefix is an option setting that can be used to organize your backups into a subfolder within your bucket. You can then use S3’s Object Expiration to automatically delete older backups or move them to Amazon Glacier.