Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Overview

Stay tuned – Eddie will be crafting an overview for external audience today.

...

This document is intended mainly for an internal Science History Institute audience, but here’s a bit of context:

The Science History Institute's Digital Collections offer highlights from our library, archives, and museum collections.The purpose of our Digital Collections is to manage, preserve, and provide access to our digital assets all in one location. Although the Digital Collections include only a small portion of the Science History Institute’s entire collection, new material is added every day. (See our About and FAQ pages for more details.)

The Digital Collections consists of:

  1. a set of digital representations, intended for a Web audience, of physical objects, which range from museum objects of all descriptions to books to taped audio interviews to VHS tapes. (Go to https://digital.sciencehistory.org/catalog and try limiting your search by genre, format, or medium, to get a sense of how broad the range is.) In the description below, when we talk about “original files” we are talking about these digital representations, which take the form of computer files. We store the original files in Amazon S3.

  2. descriptions of the files above, which allow us to find them, keep them in order, search them, and describe them to the public. We store the descriptions in a PostGreSQL database hosted and managed by Heroku.

Backup summary

We store the backups for the original files in a separate S3 bucket that automatically mirrors the contents of the originals. On a nightly basis, these are copied to a local server, and our IT staff is responsible for making regular copies of these backups to a local disk, and then stores a tape copy of them offsite on tape.

We store nightly backups for the database in a dedicated S3 bucket. Our IT staff also makes copies of this s3 bucket to local disk. From there it joins backups of the original files in offsite tape storage.

Backup details

The details below are intended for an internal, technical Science History Institute audience.

Our backups consist of 1) Postgres database (metadata) and 2) files on S3 (original files, also derivatives for convenience). That’s it!

Original files and derivatives

These , and discuss how we back up the “original files” (hereinafter “originals”) and the PostGreSQL database (hereinafter “database”).

Original files are stored in S3, and are backed up within S3 by a process managed by AWS. The backups are then copied to long-term storage by SyncBackPro, which is Windows software running on Promethium managed by Chuck and Ponce (see https://www.2brightsparks.com/syncback/sbpro.html ). (None of this will change when we get rid of Ansible.)

See more at Digital CollecS3 Bucket Setup and Architecture and https://sciencehistory.atlassian.net/wiki/pages/createpage.action?spaceKey=HDCSD&title=Backups%20and%20Recovery%20%28Historical%20notes%29

...

Prior to moving off our Ansible-managed servers, we used backup mechanisms that used to be performed by cron jobs installed by Ansible.https://sciencehistory.atlassian.net/wiki/pages/createpage.action?spaceKey=HDCSD&title=Backups%20and%20Recovery%20%28Historical%20notes%29 contains a summary of our pre-Heroku backup infrastructure.

...