Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Better known as: Backups without Ansible.

We are considering getting As we get rid of our Ansible-managed servers as we move to Heroku. This means that we can no longer rely on certain backup mechanisms which were , we are also replacing backup mechanisms that used to be performed by cron jobs installed by Ansible.Backups and Recovery contains a summary of our pre-Heroku backup infrastructure.

Original files and derivatives

These are stored in S3, are backed up within S3 by a process managed by AWS, and are pulled from S3 to Dubniumend up in long-term storage [ask Chuck about details post-Dubnium retirement]. No Ansible cron jobs are were ever used in this workflow, and and so there is thus no need to make any changes to our existing setup.

Database backups

A script on the production server, home/ubuntu/bin/postgres-backup.sh, performed the following tasks nightly:

...

The above script will need to be discarded.

A second cronjob running on our dubnium backup server (/home/dsanford/bin/scihist-digicoll-backup.sh) then mechanism [ask Chuck for details] copies the S3 file to a local network storage mount (/media/SciHist_Digicoll_Backup). This then gets backed up to tape.

Heroku database

...

It’s easy to setup a regular database backup in Heroku, as follows:

...

backups

We have three backup mechanisms under Heroku:

1. Continuous protection

 Every professional tier Heroku Postgres database comes with a behind-the-scenes Continuous Protection mechanism for disaster recovery. This ensures that, in the event of a disaster, we can roll back the database to a prior state using a command like:

heroku addons:create heroku-postgresql:standard-0 --rollback DATABASE_URL --at to '2021-06-02 20:00 America/New_York'20+00' --app scihist-digicoll-production

Details are at https://devcenter.heroku.com/articles/heroku-postgres-rollback .

2. Nightly physical backups

We also have a regular physical database backup scheduled:

heroku pg:backups:schedules --app scihist-digicoll-production

(Physical backups on Heroku Postgres are binaries that include dead tuples, bloat, indexes and all structural characteristics of the currently running database.)

You can check the metadata on the latest physical backups like this: heroku pg:backups
heroku pg:backups:download a006 will produce a “logical” physical database dump – a binary file – that .

Note that a physical dump can easily be converted to a “physical” (i.e. garden variety SQL file) dump as follows: “logical” .sql database file:

pg_restore -f mydatabaselogical_database_file.sql latestphysical.dump.

Heroku retains daily backups for 7 days, and weekly backups for 4 weeks. (more details re: retention schedule)

Additional backup to S3

To supplement the regular logical backups Heroku gives us, we intend to replace the Ansible-managed scriptpostgres-backup.sh Restoring from a physical backup involves uploading it to s3, creating a signed URL for the dump, and then running:

heroku pg:backups:restore '<SIGNED_URL_IN_S3>' DATABASE_URL # note the (DATABASE_URL is a literal, not a placeholder.)

More details on this process: https://devcenter.heroku.com/articles/heroku-postgres-import-export#import

3. Logical backups to s3

We supplement the above with a rake task, rake scihist:copy_database_to_s3, which can run will regularly on a one-off Heroku dyno, via the scheduler. This downloads the production database to a temp file, then uploads that uploads a logical (plain vanilla SQL) database to s3, where it can wait to be harvested by the Dubnium scriptand put onto tape. This workflow serves preservation goals more than disaster recovery: logical .sql files offer portability (they’re UTF8), and are useful in a variety of situations, unlike the physical backups.

Given the size of the database in late 2020, the dump rake task takes 13 seconds, and the upload another 13. The entire job (with the overhead of starting up the dyno and tearing it down) takes a bit under a minute.

If our database grows much larger (20GB or more) we will probably have to get rid of these frequent logical backups.