Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

These are stored in S3, are backed up within S3 by a process managed by AWS, and end up in are then copied to long-term storage [ask Chuck about details post-Dubnium retirement]. No Ansible cron jobs were ever used in this workflow, so there is thus no need to make any changes to our existing setup.

Database backups

A script on the production server, home/ubuntu/bin/postgres-backup.sh, performed the following tasks nightly:

  • pg_dump the production database to /backups/pgsql-backup/.

  • aws s3 sync the contents of that directory to s3://chf-hydra-backup/PGSql.

The above script will need to be discarded.

A second mechanism [ask Chuck for details] copies the S3 file to a local network storage mount (/media/SciHist_Digicoll_Backup). This then gets backed up to tapeby SyncBackPro, which is Windows software running on Promethium managed by Chuck and Ponce (see https://www.2brightsparks.com/syncback/sbpro.html ).

Heroku database backups

We have three backup mechanisms under Heroku:

1. Continuous protection

 Every professional tier Heroku Postgres database comes with Heroku offers us a behind-the-scenes Continuous Protection mechanism for database disaster recovery. This doesn’t involve any actual backup files that we can see , but it or download. It does ensures that, in the event of a disaster, we can roll back the database to a prior state using a command like:

heroku addons:create heroku-postgresql:standard-0 --rollback DATABASE_URL --to '2021-06-02 20:20+00' --app scihist-digicoll-production

Details are at https://devcenter.heroku.com/articles/heroku-postgres-rollback .

For our database, this performs the rollback in under an hour.

2. Nightly physical backups

We also have supplement the above with a regular, 2am, physical database scheduled backup scheduled:

heroku pg:backups:schedules --app scihist-digicoll-production

(Physical backups on Heroku Postgres are binaries Note that physical backups are binary files that include dead tuples, bloat, indexes and all structural characteristics of the currently running database.

$ file physical.dump
latest.dump: PostgreSQL custom database .)dump - v1.14-0

You can check the metadata on the latest physical backups like this: heroku pg:backups
heroku pg:backups:download a006 will produce a physical database dump.

(lightbulb) Note that a physical dump can easily be converted to a “logical” .sql database file:

$ pg_restore -f logical_database_file.sql physical.dump.

$ file logical_database_file.sql
logical_database_file.sql: UTF-8 Unicode text, with very long lines

Restoring from a nightly physical backup

For physical backups retained by Heroku (we retain 25) a restore works like this:

heroku pg:backups:restore --app scihist-digicoll-production

Restoring from a physical backup stored locally involves uploading it to s3, creating a signed URL for the dump, and then running:

...

If our database grows much larger (20GB or more) we will probably have to get rid of these frequent logical backups.

SyncBackPro on Promethium (managed by Chuck and Ponce) finally copies the S3 file to a local network storage mount (/media/SciHist_Digicoll_Backup), and that gets backed up to tape.

Historical notes

Prior to moving off our Ansible-managed servers, we used backup mechanisms that used to be performed by cron jobs installed by Ansible.Backups and Recovery contains a summary of our pre-Heroku backup infrastructure.

A script on the production server, home/ubuntu/bin/postgres-backup.sh, used to perform the following tasks nightly:

  • pg_dump the production database to /backups/pgsql-backup/.

  • aws s3 sync the contents of that directory to s3://chf-hydra-backup/PGSql.