Overview

This document is intended mainly for an internal Science History Institute audience, but here’s a bit of context:

The Science History Institute's Digital Collections offer highlights from our library, archives, and museum collections.The purpose of our Digital Collections is to manage, preserve, and provide access to our digital assets all in one location. Although the Digital Collections include only a small portion of the Science History Institute’s entire collection, new material is added every day. (See our About and FAQ pages for more details.)

...

a set of digital representations, intended for a Web audience, of physical objects, which range from museum objects of all descriptions to books to taped audio interviews to VHS tapes. (Go to https://digital.sciencehistory.org/catalog and try limiting your search by genre, format, or medium, to get a sense of how broad the range is.) In the description below, when we talk about “original files” we are talking about these digital representations, which take the form of computer files. We store the original files in Amazon S3.
descriptions of the files above, which allow us to find them, keep them in order, search them, and describe them to the public. We store the descriptions in a PostGreSQL database hosted and managed by Heroku.

Backup summary

We store the backups for the original files in a separate S3 bucket that automatically mirrors the contents of the originals. On a nightly basis, these are copied to a local server, and our IT staff is responsible for making regular copies of these backups to a local disk, and then stores a tape copy of them offsite on tape.

We store nightly backups for the database in a dedicated S3 bucket. Our IT staff also makes copies of this s3 bucket to local disk. From there it joins backups of the original files in offsite tape storage.

Backup details

The details below are intended for an internal, technical Science History Institute audience, and discuss how we back up the “original files” (hereinafter “originals”) and the PostGreSQL database (hereinafter “database”).

...

See more at Digital CollecS3 Bucket Setup and Architecture and https://sciencehistory.atlassian.net/wiki/pages/createpage.action?spaceKey=HDCSD&title=Backups%20and%20Recovery%20%28Historical%20notes%29

Heroku database backups

We have three backup/restore mechanisms under Heroku:

1. Nightly .dump backups

We use heroku’s built-in postgres backup functionality to make regular backups that are stored in heroku’s system. This is the most convenient backup to restore from, when it is available and meets your needs.

...

You can also download heroku backups to store them in your own location, and then load your local copies into heroku. See Heroku docs for more info.

2. Preservation (logical) backups to s3

We don’t want to rely solely on backups stored inside heroku’s system. We also would like a postgres backup in the more human-readable and transportable plain .sql format, instead of the postgres -Fc .dump format.

...

The more portable .sql format stored and backed up outside of heroku is motivated primarily for preservation purposes, but it can also serve as a last-ditch or alternative disaster recovery. It can be restored to heroku using the heroku pg:psql command to run arbitrary psql commands on the heroku postgres.

Restoring from a logical (.sql) database dump.

In the unlikely event you have to restore from a logical backup:

...

Note: This will overwrite your database, and won’t warn/prompt you about that fact first! It will run in your terminal and take a bit of time.

3. Heroku postgres “rollback”

Heroku can rollback postgres database to an arbitrary moment in time, based on postgres log files. For our current postgres standard-0 plan, there are four days past of logs kept. See: https://devcenter.heroku.com/articles/heroku-postgres-rollback , and the section “Common Use Case: Recovery After Critical Data Loss”

...

NOTE: Is it possible to rollback to a past production snapshot, but do it in the staging app first, to see what it looks like without touching production? We need to look into that, it could be a safer way to do it.

Historical notes

Prior to moving off our Ansible-managed servers, we used backup mechanisms that used to be performed by cron jobs installed by Ansible.https://sciencehistory.atlassian.net/wiki/pages/createpage.action?spaceKey=HDCSD&title=Backups%20and%20Recovery%20%28Historical%20notes%29 contains a summary of our pre-Heroku backup infrastructure.

...

Version	Old Version 38	New Version 39
Changes made by	Eddie Rubeiz	Eddie Rubeiz
Saved on	Feb 17, 2023	Feb 17, 2023

Content Comparison

Versions Compared

Key