Original files and derivatives
These are stored in S3, and are backed up within S3 by a process managed by AWS. The backups are then copied to long-term storage by SyncBackPro, which is Windows software running on Promethium managed by Chuck and Ponce (see https://www.2brightsparks.com/syncback/sbpro.html ). This will not change when we get rid of Ansible.
Heroku database backups
We have three backup mechanisms under Heroku:
1. Continuous protection
Heroku offers us a behind-the-scenes Continuous Protection mechanism for database disaster recovery. This doesn’t involve any actual backup files that we can see or download. It does ensures that, in the event of a disaster, we can roll back the database to a prior state (right before a bunch of files were mistakenly deleted, for instance) using:
heroku addons:create heroku-postgresql:standard-0 --rollback DATABASE_URL --to '2021-06-02 20:20+00' --app scihist-digicoll-production
A new rollback database will be created, and its name will be printed to stdout e.g. postgresql-curly-82727
in this example.
Restore to it like this:
heroku pg:promote postgresql-curly-82727 --app scihist-digicoll-production
Make sure you have successfully fixed the problem by checking that e.g. the mistakenly-deleted files have been restored.
Once all is well, don’t forget to get rid of the extra database(s) you are no longer using – otherwise you’ll continue getting charged for it. heroku pg:info --app scihist-digicoll-production
will show it (or them). Obviously, be really careful when doing this.
heroku addons:destroy HEROKU_POSTGRESQL_YELLOW --app scihist-digicoll-production
.
Details are at https://devcenter.heroku.com/articles/heroku-postgres-rollback .
For our database, this performs the rollback in under an hour. (The site remains up and usable while the rollback is being prepared and executed.)
2. Nightly physical backups
We supplement the above with a regular, 2am, nightly physical database scheduled backup. These are stored by Heroku, and restoring to them is very fast and convenient.
heroku pg:backups:schedules --app scihist-digicoll-production
(Physical backups are binary files that include dead tuples, bloat, indexes and all structural characteristics of the currently running database.)
The backups are stored by Heroku and can be listed by running heroku pg:backups
.
You can check the metadata on the latest physical backups like this: heroku pg:backups
Restoring from a nightly physical backup
For physical backups retained by Heroku (we retain up to 25) a restore takes about a minute and works like this:
heroku pg:backups:restore --app scihist-digicoll-production
Downloading a physical backup to a local “.dump” file
heroku pg:backups:download a006
will produce a file like:
$ file physical.dump
physical.dump: PostgreSQL custom database dump - v1.14-0
Note that a physical dump can be converted to a garden-variety “logical” .sql
database file:
$ pg_restore -f logical_database_file.sql physical.dump
.
$ file logical_database_file.sql
logical_database_file.sql: UTF-8 Unicode text, with very long lines
Restoring from a physical backup stored as a local “.dump” file
If you downloaded a physical backup which is now stored on your local machine, and want to restore from that specific file, you will first need to upload it to s3, creating a signed URL for the dump, and then run:
heroku pg:backups:restore '<SIGNED_URL_IN_S3>' DATABASE_URL # note the (DATABASE_URL is a literal, not a placeholder.)
More details on this process, including how to create a signed s3 URL: https://devcenter.heroku.com/articles/heroku-postgres-import-export#import
3. Preservation (logical) backups to s3
Finally, we maintain a rake task, rake scihist:copy_database_to_s3
, which runs on a one-off Heroku dyno, via the scheduler. This uploads a logical (plain vanilla SQL) database to s3, where SyncBackPro then syncs to tape (this process, again, is managed by Chuck and Ponce.)
This workflow serves more for preservation than for disaster recovery: logical .sql
files offer portability (they’re UTF8), and are useful in a variety of situations; notably, they can be used to reconstruct the database, even on other machines and other architectures using psql -f db.sql
.
Given the size of the database in late 2020, the entire job (with the overhead of starting up the dyno and tearing it down) takes a bit under a minute. However, if our database grows much larger (20GB or more) we will probably have to get rid of these frequent logical backups.
SyncBackPro on Promethium (managed by Chuck and Ponce) finally copies the S3 file to a local network storage mount (/media/SciHist_Digicoll_Backup
), and that gets backed up to tape.
Restoring from a logical (.sql) database dump.
In the unlikely event you have to restore from a logical backup, because you can’t use the continuous protection rollback, and all the dumps managed by Heroku are somehow unavailable: download the logical backup you want to use from s3, and run:
heroku pg:psql --app scihist_digicoll_production < logical_backup.sql
Historical notes
Prior to moving off our Ansible-managed servers, we used backup mechanisms that used to be performed by cron jobs installed by Ansible.Backups and Recovery contains a summary of our pre-Heroku backup infrastructure.
A script on the production server, home/ubuntu/bin/postgres-backup.sh
, used to perform the following tasks nightly:
pg_dump
the production database to/backups/pgsql-backup/
.aws s3 sync
the contents of that directory tos3://chf-hydra-backup/PGSql
.