Our backups consist of 1) Postgres database (metadata) and 2) files on S3 (original files, also derivatives for convenience). That’s it!

Original files and derivatives

These are stored in S3, and are backed up within S3 by a process managed by AWS. The backups are then copied to long-term storage by SyncBackPro, which is Windows software running on Promethium managed by Chuck and Ponce (see https://www.2brightsparks.com/syncback/sbpro.html ). (None of this will change when we get rid of Ansible.)

See more at Backups and Recovery (TODO: We need to update that page for heroku)

Heroku database backups

We have three backup mechanisms under Heroku:

1. Continuous protection

This isn’t technically a backup, but Heroku does offer a convenient way to roll back the database to the way it was before a problem occurred:Rolling back the database to a prior state in Heroku .

2. Nightly .dump backups

We supplement the above with a regular, 2am, nightly .dump database scheduled backup. These are stored by Heroku in PSQL .dumpformat, and restoring to them is convenient and takes well under a minute. Heroku retains up to 25 of these.

heroku pg:backups:schedules --app scihist-digicoll-production

List .dump backups by running heroku pg:backups.

Restoring from a nightly .dump backup

For .dump backups retained by Heroku (we retain up to 25) a restore takes about a minute and works like this (to restore to the most recent)

heroku pg:backups:restore --app scihist-digicoll-production

or, to a particular database backup:

heroku pg:backups:restore a188 --app scihist-digicoll-production

TODO: distinguish between manual and non-manual backups (see link above).

Downloading a .dump backup file:

heroku pg:backups:download a006 will produce a file like:

$ file latest.dump
latest.dump: PostgreSQL custom database dump - v1.14-0.

Note that a .dump file can be converted to a garden-variety “logical” .sql database file:

$ pg_restore -f logical_database_file.sql latest.dump.

$ file logical_database_file.sql
logical_database_file.sql: UTF-8 Unicode text, with very long lines

Restoring from a local “.dump” file

If you downloaded a .dump file which is now stored on your local machine, and want to restore from that specific file, you will first need to upload it to s3, create a signed URL for the dump, and finally run:

heroku pg:backups:restore '<SIGNED_URL_IN_S3>' DATABASE_URL # note the (DATABASE_URL is a literal, not a placeholder.)

More details on this process, including how to create a signed s3 URL: https://devcenter.heroku.com/articles/heroku-postgres-import-export#import

3. Preservation (logical) backups to s3

Finally, we maintain a rake task, rake scihist:copy_database_to_s3, which runs on a one-off Heroku dyno, via the scheduler. This uploads a logical (plain vanilla SQL) database to s3, where SyncBackPro then syncs to a local network storage mount (/media/SciHist_Digicoll_Backup), and from there to our tape backups. (SyncBackPro is managed by Chuck and Ponce.)

This workflow serves more for preservation than for disaster recovery: logical .sql files offer portability (they’re UTF8), and are useful in a variety of situations; notably, they can be used to reconstruct the database, even on other machines and other architectures using psql -f db.sql.

Given the size of the database in late 2020, the entire job (with the overhead of starting up the dyno and tearing it down) takes a bit under a minute. However, if our database grows much larger (20GB or more) we will probably have to get rid of these frequent logical backups.

Restoring from a logical (.sql) database dump.

In the unlikely event you have to restore from a logical backup:

Go to https://s3.console.aws.amazon.com/s3/buckets/chf-hydra-backup?prefix=PGSql%2F&region=us-west-2&showversions=true
Download the database file you want, and uncompress it.
heroku pg:psql --app scihist-digicoll-production < ~/Desktop/digcol_backup.sql

Historical notes

Prior to moving off our Ansible-managed servers, we used backup mechanisms that used to be performed by cron jobs installed by Ansible.Backups and Recovery contains a summary of our pre-Heroku backup infrastructure.

A script on the production server, home/ubuntu/bin/postgres-backup.sh, used to perform the following tasks nightly:

pg_dump the production database to /backups/pgsql-backup/.
aws s3 sync the contents of that directory to s3://chf-hydra-backup/PGSql.

Backups in Heroku