Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Finally in cases of issues affecting all on-line storage systems, another copy of the data is held on our in-house storage system so that we can potentially recover data even in case of a full loss of AWS. This only holds the original files and all other aspects will need to be rebuild, a process that can take days in addition to the time taken to upload the files again. Using the local backup means recovery could take a week or more. This is not paid for/managed by our team as Institute IT handles these systems.

Technical Notes

Kithe Digital Collections currently (March 2019) has a small set of data to be handled for recovery.

...

The first two are the ones that are needed. If they were lost, the derivatives can be easily recreated based on the first two, but it would take over a day. Thus, we keep a backup anyway because it's cheap and minimizes downtime.

What Backups Exist

  • The postgres database is to be backed up to S3, with a version history of the last 30 versions of the file representing roughly a month of backups. The binary files are replicated via S3 replication to a second location in US-WEST rather than US-EAST in case of outages. (During development, as of summer 2019, these binaries are stored at https://s3.console.aws.amazon.com/s3/buckets/scihi-kithe-stage-originals .) When, as part of launching the site, we actually switch over to production, these will also be backed up over to local on-site storage. The binary files are also versioned and prior versions are held for 30 days before being cleared away to reduce storage costs. This offers a month period to revert a file back if something is damaged.(TODO exact bucket locations)
    • A primary copy on S3, saved from a nightly cronjob on the database server that exports postgres to file
    • A second copy on an on-premises Institute file server (backed up to tape), made by a nightly cronjob on the "dubnium" server, copying from S3 to local server location. 
  • The binary files are replicated via S3 replication rules to a second location in US-WEST rather than US-EAST in case of outages. (TODO add exact bucket locations)
    • US-WEST primary backup location
    • A second copy on an on-premises Institute file server (backed up to tape), made by a nightly cronjob on the "dubnium" server, copying from S3 to local server location. 
  • The derivative files will also be replicated via S3 replication to a US-WEST location. They can also be regenerated by the application though this takes days to do if all the files are lost. Replication requires versioning, so this is enabled but unlikely to be used.
    • TODO: S3 bucket names

Recovery Overview

Minimal public recovery requires the following data:

...