Overview

Stay tuned – Eddie will be crafting an overview for external audience today.

Details

The details below are intended for an internal Science History Institute audience.

Our backups consist of 1) Postgres database (metadata) and 2) files on S3 (original files, also derivatives for convenience). That’s it!

Original files and derivatives

These are stored in S3, and are backed up within S3 by a process managed by AWS. The backups are then copied to long-term storage by SyncBackPro, which is Windows software running on Promethium managed by Chuck and Ponce (see https://www.2brightsparks.com/syncback/sbpro.html ). (None of this will change when we get rid of Ansible.)

See more at Digital CollecS3 Bucket Setup and Architecture and https://sciencehistory.atlassian.net/wiki/pages/createpage.action?spaceKey=HDCSD&title=Backups%20and%20Recovery%20%28Historical%20notes%29

Heroku database backups

We have three backup/restore mechanisms under Heroku:

1. Nightly .dump backups

We use heroku’s built-in postgres backup functionality to make regular backups that are stored in heroku’s system. This is the most convenient backup to restore from, when it is available and meets your needs.

These backups end up stored in postgres -Fc or “dump” format, which postgres says is a compact, fast, and flexible recommended format for postgres backups; but is not human-readable and may be less portable between postgres versions.

To verify that we have scheduled backups, run heroku pg:backups:schedules --app scihist-digicoll-production, to see that we have a 2AM backup ever night.
List what backups exist by running heroku pg:backups -a scihist-digicoll-production Note the first section is “backups” (which may scroll off screen), and the first column is a backup ID, such as a189.
With the backup ID, you can restore production to a past backup (eg id a189), with heroku pg:backups:restore a189 -a scihist-digicoll-production
- Warning: this will overwrite current production data, with the restored backup!
- Warning: see note below re: --extensions.
Maybe instead you want to restore a production backup to staging, to just look at the data, without actually (yet?) restoring to and overwriting current production? You can do this too:
- heroku pg:backups:restore scihist-digicoll-production::a189 -a scihist-digicoll-staging
💡 Warning: the above command may fail if the database you are restoring from has extensions installed in the public schema, subsequent to some changes in how Heroku works with extensions). There is a workaround: using the extensions flag as in the example below allows you to pg:restore from a database that has extensions in public(like the production DB before Sept 2022)

heroku pg:backups:restore scihist-digicoll-production::a661 DATABASE_URL \
	--extensions 'public.pg_stat_statements,public.pgcrypto' \
	--app scihist-digicoll-staging

💡To find out what extensions are installed and in what schemas, just execute \dx at the psql prompt.

For our standard-0 heroku postgres plan, heroku will keep 7 daily backups, and four weeks of one-per-week backups.

You can also download heroku backups to store them in your own location, and then load your local copies into heroku. See Heroku docs for more info.

2. Preservation (logical) backups to s3

We don’t want to rely solely on backups stored inside heroku’s system. We also would like a postgres backup in the more human-readable and transportable plain .sql format, instead of the postgres -Fc .dump format.

We have our own rake task, rake scihist:copy_database_to_s3, which we also run nightly via the heroku scheduler. This task connects to heroku postgres to make an postgres human-readable .sql dump, then uploads it to our s3 chf-hydra-backup bucket, where SyncBackPro then syncs to a local network storage mount (/media/SciHist_Digicoll_Backup), and from there to our tape backups. (SyncBackPro is managed by Chuck and Ponce.)

You can log into the heroku scheduler add-on via Heroku “resources” tab to verify the copy_database_to_s3 task is scheduled nightly.

Given the size of the database in late 2020, the entire job (with the overhead of starting up the dyno and tearing it down) takes a bit under a minute. However, if our database grows an order of magnitude larger and slower to dump/transfer to S3, we may have to reconsider this approach.

The more portable .sql format stored and backed up outside of heroku is motivated primarily for preservation purposes, but it can also serve as a last-ditch or alternative disaster recovery. It can be restored to heroku using the heroku pg:psql command to run arbitrary psql commands on the heroku postgres.

Restoring from a logical (.sql) database dump.

In the unlikely event you have to restore from a logical backup:

Go to https://s3.console.aws.amazon.com/s3/buckets/chf-hydra-backup?prefix=PGSql%2F&region=us-west-2
Download the database file you want (note the “versions” tab if you want a past version still on S3)
Uncompress it from the .gz format. On a unix or MacOS command line, that’s gzip -d heroku-scihist-digicoll-backup.sql.gz
Load it into heroku database: heroku pg:psql --app scihist-digicoll-production < heroku-scihist-digicoll-backup.sql

Note: This will overwrite your database, and won’t warn/prompt you about that fact first! It will run in your terminal and take a bit of time.

3. Heroku postgres “rollback”

Heroku can rollback postgres database to an arbitrary moment in time, based on postgres log files. For our current postgres standard-0 plan, there are four days past of logs kept. See: https://devcenter.heroku.com/articles/heroku-postgres-rollback , and the section “Common Use Case: Recovery After Critical Data Loss”

This is a somewhat more complicated process, and requires some more care to get right, but it is very powerful to be able to go back to any moment in time in the last 4 days!

To do this requires creating a new postgres “rollback” database; switching the app to use it; then deleting the old no-longer in use database. From a terminal with the heroku CLI:

heroku addons:create heroku-postgresql:standard-0 --rollback DATABASE_URL --to '2021-06-02 20:20 America/New_York' --app scihist-digicoll-production
The site remains up. The new database’s name will be printed to the terminal, and you can see it in the Resources section of the Heroku admin. It might be something like postgresql-curly-07169
It might take a few minutes or more for the newly restored database to be ready, you can follow instructions the command gives you to check progress, such as heroku pg:wait
Once the rollback database – which has been restored to a past moment in time – is ready, you can switch the app to use that new restored database by using the database name:
heroku pg:promote postgresql-curly-07169 --app scihist-digicoll-production
Make sure you have successfully fixed the problem.
Once all is well, don’t forget to get rid of the extra database(s) you are no longer using. Consider leaving this step for the next day; it will only cost a couple dollars over 24 hours.
1. How do you know which db is the “old” one? Run heroku addons to see all your heroku-postgresql databases; the one currently used by the app is marked as DATABASE. So the other one is the old no longer used one, which also has an AS name.
2. To remove it run eg heroku addons:destroy HEROKU_POSTGRESQL_YELLOW --app scihist-digicoll-production. Be careful you are removing the correct one!

NOTE: Is it possible to rollback to a past production snapshot, but do it in the staging app first, to see what it looks like without touching production? We need to look into that, it could be a safer way to do it.

Historical notes

Prior to moving off our Ansible-managed servers, we used backup mechanisms that used to be performed by cron jobs installed by Ansible.https://sciencehistory.atlassian.net/wiki/pages/createpage.action?spaceKey=HDCSD&title=Backups%20and%20Recovery%20%28Historical%20notes%29 contains a summary of our pre-Heroku backup infrastructure.

A script on the production server, home/ubuntu/bin/postgres-backup.sh, used to perform the following tasks nightly:

pg_dump the production database to /backups/pgsql-backup/.
aws s3 sync the contents of that directory to s3://chf-hydra-backup/PGSql.

Backup strategy for the Digital Collections