Original files and derivatives
These are stored in S3, and are backed up within S3 by a process managed by AWS. The backups are then copied to long-term storage by SyncBackPro, which is Windows software running on Promethium managed by Chuck and Ponce (see https://www.2brightsparks.com/syncback/sbpro.html ). (None of this will change when we get rid of Ansible.)
Heroku database backups
We have three backup mechanisms under Heroku:
1. Continuous protection
This isn’t technically a backup, but Heroku does offer a convenient way to roll back the database to the way it was before a problem occurred:Rolling back the database to a prior state in Heroku .
2. Nightly .dump backups
We supplement the above with a regular, 2am, nightly .dump database scheduled backup. These are stored by Heroku in PSQL .dump
format, and restoring to them is convenient and takes well under a minute. Heroku retains up to 25 of these.
heroku pg:backups:schedules --app scihist-digicoll-production
List .dump backups by running heroku pg:backups
.
Restoring from a nightly .dump backup
For .dump backups retained by Heroku (we retain up to 25) a restore takes about a minute and works like this:
heroku pg:backups:restore --app scihist-digicoll-production
TODO: distinguish between manual and non-manual backups (see link above).
Downloading a .dump backup file:
heroku pg:backups:download a006
will produce a file like:
$ file latest.dump
latest.dump: PostgreSQL custom database dump - v1.14-0
.
Note that a .dump file can be converted to a garden-variety “logical” .sql
database file:
$ pg_restore -f logical_database_file.sql latest.dump
.
$ file logical_database_file.sql
logical_database_file.sql: UTF-8 Unicode text, with very long lines
Restoring from a local “.dump” file
If you downloaded a .dump
file which is now stored on your local machine, and want to restore from that specific file, you will first need to upload it to s3, create a signed URL for the dump, and finally run:
heroku pg:backups:restore '<SIGNED_URL_IN_S3>' DATABASE_URL # note the (DATABASE_URL is a literal, not a placeholder.)
More details on this process, including how to create a signed s3 URL: https://devcenter.heroku.com/articles/heroku-postgres-import-export#import
3. Preservation (logical) backups to s3
Finally, we maintain a rake task, rake scihist:copy_database_to_s3
, which runs on a one-off Heroku dyno, via the scheduler. This uploads a logical (plain vanilla SQL) database to s3, where SyncBackPro then syncs to a local network storage mount (/media/SciHist_Digicoll_Backup
), and from there to our tape backups. (SyncBackPro is managed by Chuck and Ponce.)
This workflow serves more for preservation than for disaster recovery: logical .sql
files offer portability (they’re UTF8), and are useful in a variety of situations; notably, they can be used to reconstruct the database, even on other machines and other architectures using psql -f db.sql
.
Given the size of the database in late 2020, the entire job (with the overhead of starting up the dyno and tearing it down) takes a bit under a minute. However, if our database grows much larger (20GB or more) we will probably have to get rid of these frequent logical backups.
Restoring from a logical (.sql) database dump.
In the unlikely event you have to restore from a logical backup:
Download the database file you want, and uncompress it.
heroku pg:psql --app scihist-digicoll-production < ~/Desktop/digcol_backup.sql
Historical notes
Prior to moving off our Ansible-managed servers, we used backup mechanisms that used to be performed by cron jobs installed by Ansible.Backups and Recovery contains a summary of our pre-Heroku backup infrastructure.
A script on the production server, home/ubuntu/bin/postgres-backup.sh
, used to perform the following tasks nightly:
pg_dump
the production database to/backups/pgsql-backup/
.aws s3 sync
the contents of that directory tos3://chf-hydra-backup/PGSql
.