Original files and derivatives

These are stored in S3, and are backed up within S3 by a process managed by AWS. The backups are then copied to long-term storage by SyncBackPro, which is Windows software running on Promethium managed by Chuck and Ponce (see https://www.2brightsparks.com/syncback/sbpro.html ). (None of this will change when we get rid of Ansible.)

Heroku database backups

We have three backup mechanisms under Heroku:

1. Continuous protection

This isn’t technically a backup, but Heroku does offer a convenient way to roll back the database to the way it was before a problem occurred:Rolling back the database to a prior state in Heroku .

2. Nightly .dump backups

We supplement the above with a regular, 2am, nightly .dump database scheduled backup. These are stored by Heroku in PSQL .dumpformat, and restoring to them is convenient and takes well under a minute. Heroku retains up to 25 of these.

heroku pg:backups:schedules --app scihist-digicoll-production

List .dump backups by running heroku pg:backups.

Restoring from a nightly .dump backup

For .dump backups retained by Heroku (we retain up to 25) a restore takes about a minute and works like this:

heroku pg:backups:restore --app scihist-digicoll-production

TODO: distinguish between manual and non-manual backups (see link above).

Downloading a .dump backup file:

heroku pg:backups:download a006 will produce a file like:

$ file latest.dump
latest.dump: PostgreSQL custom database dump - v1.14-0.

Note that a .dump file can be converted to a garden-variety “logical” .sql database file:

$ pg_restore -f logical_database_file.sql latest.dump.

$ file logical_database_file.sql
logical_database_file.sql: UTF-8 Unicode text, with very long lines

Restoring from a local “.dump” file

If you downloaded a .dump file which is now stored on your local machine, and want to restore from that specific file, you will first need to upload it to s3, create a signed URL for the dump, and finally run:

heroku pg:backups:restore '<SIGNED_URL_IN_S3>' DATABASE_URL # note the (DATABASE_URL is a literal, not a placeholder.)

More details on this process, including how to create a signed s3 URL: https://devcenter.heroku.com/articles/heroku-postgres-import-export#import

3. Preservation (logical) backups to s3

Finally, we maintain a rake task, rake scihist:copy_database_to_s3, which runs on a one-off Heroku dyno, via the scheduler. This uploads a logical (plain vanilla SQL) database to s3, where SyncBackPro then syncs to a local network storage mount (/media/SciHist_Digicoll_Backup), and from there to our tape backups. (SyncBackPro is managed by Chuck and Ponce.)

This workflow serves more for preservation than for disaster recovery: logical .sql files offer portability (they’re UTF8), and are useful in a variety of situations; notably, they can be used to reconstruct the database, even on other machines and other architectures using psql -f db.sql.

Given the size of the database in late 2020, the entire job (with the overhead of starting up the dyno and tearing it down) takes a bit under a minute. However, if our database grows much larger (20GB or more) we will probably have to get rid of these frequent logical backups.

Restoring from a logical (.sql) database dump.

In the unlikely event you have to restore from a logical backup:

Go to https://s3.console.aws.amazon.com/s3/buckets/chf-hydra-backup?prefix=PGSql%2F&region=us-west-2&showversions=true
Download the database file you want
heroku pg:psql --app scihist-digicoll-production < ~/Desktop/digcol_backup.sql

Historical notes

Prior to moving off our Ansible-managed servers, we used backup mechanisms that used to be performed by cron jobs installed by Ansible.Backups and Recovery contains a summary of our pre-Heroku backup infrastructure.

A script on the production server, home/ubuntu/bin/postgres-backup.sh, used to perform the following tasks nightly:

pg_dump the production database to /backups/pgsql-backup/.
aws s3 sync the contents of that directory to s3://chf-hydra-backup/PGSql.

Backups in Heroku