Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

The first two are the ones that are needed. If they were lost, the derivatives can be easily recreated based on the first two, but it would take over a day. Thus, we keep a backup anyway because it's cheap and minimizes downtime.

For each of these, we have TWO levels of backups: 1. in an S3 bucket, 2. on local on-premises Institute storage, which is also backed up to tape. 

What Backups Exist

  • The postgres database
    • is
    to be
    • backed up
    to S3, with a version history of the last 30 versions of the file representing roughly a month of backups. (TODO exact bucket locations)
    • A primary copy on S3, saved from a nightly cronjob on the database server that exports postgres to file
    • A second copy on an on-premises Institute file server (backed up to tape), made by a nightly cronjob on the "dubnium" server, copying from S3 to local server location. 
  • The binary files are replicated via S3 replication rules to a second location in US-WEST rather than US-EAST in case of outages. (TODO add exact bucket locations)
    • US-WEST primary backup location
    • A second copy on an on-premises Institute file server (backed up to tape), made by a nightly cronjob on the "dubnium" server, copying from S3 to local server location. 
  • The derivative files will also be replicated via S3 replication to a US-WEST location. They can also be regenerated by the application though this takes days to do if all the files are lost. Replication requires versioning, so this is enabled but unlikely to be used.
    • TODO: S3 bucket namesby a cronjob scihist-digicoll-backup.sh running on the database server, to S3 disk at: s3://chf-hydra-backup/PGSql/digcol_backup.sql
      • This S3 bucket is "versioned" and keeps 30 days worth of past versions. 
    • cronjob running on "dubnium" backup server then copies that nightly from that S3 location to local network storage mount. So the network storage mount will nightly ensure latest copy is at standard location, to be backed up to tape.  
  • Original ingested assets (binary files)
    • use S3 replication rules to replicate to a bucket scihist-digicoll-production-originals-backup
      • This is intentionally in a different AWS region, US West (Oregon)
      • The replication rules do not replicate deletes, so deleted files should still exist in the backup bucket
      • Both the live production bucket and this backup bucket are versioned, and keep 30 days worth of past versions. We do not keep complete version history, the backups are intended mostly for handling corruption and disaster, not recovering from user or software error. Although 30 days of version history allows some limited recovery from software or user error. 
      • a cronjob scihist-digicoll-backup.sh running on "dubnium" backup server then uses an "rsync" like functionality to sync to a on-premises network mount, which is backed up to tape. 
      • In a disaster recovery scenario, the live app could temporarily be pointed to use bucket scihist-digicoll-production-originals-backup as it's source of files. However, cross-region data transfer will be more expensive than usual same-region.  You may want to turn off ingest to prevent new data from being written to the backup bucket. 
  • Derivatives
    • Both the "ordinary" derivatives bucket and the "dzi" bucket (used for "deep zoom" viewer) use S3 replication rules to replicate to backup buckets in US West (Oregon). 
    • In a disaster recovery scenario, the live app could be switched over to use these buckets as a source of data, but cross-region data transfer will be more expensive than same-region. You may want to turn off ingest to prevent new data from being written to the backup bucket. 
    • All of these buckets also keep 30 days of version history. 
    • scihist-digicoll-production-dzi => scihist-digicoll-production-dzi-backup
    • scihist-digicol-production-originals => scihist-digicol-production-originals-backup

Recovery Overview

Minimal public recovery requires the following data:

...

  1. Go to the Work with the damaged files
  2. Select the Members tab and click on the file name
  3. If the Fixity Check shows an error it should have a link to the file in S3, if not you will need to get the UUID. This can be found via the rails console or by stripping it out from the Download Original or Derivatives links.
  4. Log onto AWS.
  5. If the Fixity check shows an error, simply click on the link. In the S3 Web console select the Versions Show option. If the fixity check does not show an error or the link does not work:
    1. Log into S3 and go to the bucket scihist-digicoll-production-originals. In the S3 web console, select the Show button for versions.
    2. Search for the UUID in the prefix
    3. Select the UUID "Folder"
  6. If the file has changed, been deleted, or corrupted within the last 30 days you should see prior versions. If the fixity check has a date and time for when the file changed, you can simply select all the newer versions by clicking the check box next to them and then use the Actions button to Delete them. The old version will become the current version. Run a fixity check to confirm the fix. If it was deleted, you may see a "Delete Marker." Simply delete it like a file and the old file become the current version.
  7. If the file is missing because it has been in error for more then 30 days or something has gone wrong, you will need to use If the file version you need is not available from version history, it may be available on the backup bucket. This is called scihist-digicoll-production-originals-backup and is in the US-WEST region. Deleted files should remain in the backup bucket forever. But changes are replicated to backup bucket, so a version prior to a change made more than 30 days ago will not generally be available on backup bucket either. 
  8. Confirm the damaged file is in scihist-digicoll-production-originals-backup, you may either use the S3 web console and going to the bucket then searching for the UUID to confirm the file in in that "folder" or you may use the AWS CLI or some other tool to make a head request.
  9. If the file is there, you may sync it to the the scihist-digicoll-production-originals bucket. Make sure that scihist_digicoll thinks there is a file there already. Syncing a file that is not in the postgres database will not add it to the application. You may use any preferred sync method, here is an example via the AWS CLI SDK. 

    Code Block
    languagebash
    firstline1
    titleSync
    aws s3 sync s3://scihist-digicoll-production-originals-backup/asset/UUIDHERE  s3://scihist-digicoll-production-originals/asset/UUIDHERE --source-region us-west-2 --region us-east-1

    This sample will move the file from backups to the originals production bucket, inside the asset key (part of our application) and to the UUID location. If the UUID key is missing this will make the needed key. If you're unsure of a command, the --dryrun option allows for a safe test.

  10. Run the fixity check to confirm the file is fixed.

...