Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Here is the current backup strategy as a diagram:

Image Removed

Recovery Options

Fedora:

S3:

Currently we are using the s3 sync tool (akin to rsync for S3) to pull over key fedora data into the chf-hydra-backup (https://s3.console.aws.amazon.com/s3/buckets/chf-hydra-backup) bucket. This is a slight misnomer as it handles backups for ArchivesSpace as well now, but Fedora data is pulled over into s3 key FedoraBackup (https://s3.console.aws.amazon.com/s3/buckets/chf-hydra-backup/FedoraBackup/?region=us-east-1&tab=overview) which contains all Fedora binary data.

PGSql (https://s3.console.aws.amazon.com/s3/buckets/chf-hydra-backup/PGSql/ ) contains the Fedora Postgres database fcrepo_backup)

Both https://s3.console.aws.amazon.com/s3/buckets/chf-hydra-backup/FedoraBackup/ and https://s3.console.aws.amazon.com/s3/object/chf-hydra-backup/PGSql/fcrepo_backup.sql are needed to do a full restore.

Note: As a reminder, while S3's visual interface uses folders, those locations are actually just the first step in a path of individual block stored objects. Folders do not exist in S3.

How to restore Fedora:

  1. Stop Tomcat
  2. Download the Postgres database fcrepo_backup.sql to an arbitrary location on the Fedora machine.
  3. Fedora might still have active connections to postgres, so run a postgres restart to kill them: sudo service postgresql restart .

  4. Import the database: psql fcrepo < fcrepo_backup.sql
    1. If the database already exists, such as when you are running a sync, you will want to drop the existing database and then run the command.
  5. Check that the user trilby has permissions to access and use the newly made fcrepo database.
  6. Delete the existing folder(s) inside /opt/fedora-data (This step is not always required but makes it simpler)
  7. Using screen or tmux start an aws s3 sync to copy all the data over in the FedoraBackup "folder" to /opt/fedora-data : aws s3 sync s3://chf-hydra-backup/FedoraBackup /opt/fedora-data/
  8. Wait a while for all the data (>800 GB) to copy over.
  9. Run chown -R tomcat8:tomcat8 /opt/fedora-data to give ownership on the new files to the tomcat user so Fedora can access them.
  10. Restart Tomcat: sudo service tomcat8 restart OR sudo systemctl tomcat8 restart
  11. This will restore the Fedora database.  Current cost estimates (2/18) are about $.10 to do this restore.

How to restore users:

  1. Go to S3 and download the postgres backup files to an arbitrary location on the app server.
  2. Stop Apache
  3. Restart the postgres service (see above). This should remove the default connection to the Sufia database that Sufia has when it's running, so you can change it.
  4. In Postgres, delete the automatically generated chf_hydra database as follows:
    1. Log in via psql -u postgres
      1. The postgres account password is in ansible-vault (groupvars/all)
    2. Run: DROP DATABASE chf_hydra;
    3. Run: CREATE DATABASE chf_hydra;
  5. Then import the downloaded database
    1. Either:
      1. pg_restore -d chf_hydra -U postgres chf_hydra.dump
      2. psql chf_hydra < chf_hydra_dump.sql
  6. Then set permissions
    1. psql -U postgres
    2. GRANT Create,Connect,Temporary ON DATABASE chf_hydra TO chf_pg_hydra;
  7. You may now restart postgres and Apache2. systemctl restart apache2

Note: the minter is now part of postgres: no need to take any extra steps. just restore the chf_hydra database for the users to app and the minter will be restored.

How to restore redis:

Redis keeps a database in memory which handles the transaction record data such as the history of edits on a record. It does not contain the actual data, simply the timeline of changes. Losing this causes the history of object edits to be lost, but the objects themselves will be fine.

  1. Download redis-dump/dump.rdb to an arbitrary location on the app server.
  2. It must be changed to be owned by the redis user as follows:
    1. sudo chmod -R redis:redis filename
  3. Stop the redis server as follows:
    1. sudo service redis-server stop
  4. Move redis-dump.rdb to /var/lib/redis/dump.rdb . This will overwrite the existing file there called dump.rdb
  5. Restart redis
    1. sudo service redis-server start
  6. When starting, redis will read the .rdb dump file and copy that data back into the in-memory database.

Indexing:

The index is being backed up to speed up the time to recovery for DR or migrations. If you cannot access it, a manual reindex can be done with the instructions in Application administration. This process takes at least one business day, so is not recommended versus rebuilding from the backup.

  1. In the chf-hydra-backup, pull down the solr-backup.tar.gz file under Solr to the Solr server.
  2. Extract the archive
  3. Use the solr restore commands at Application administration 

Costs

A quick cost analysis has restoration costing $30-35 dollars, this is as of 6/11/2018 with approximately 1 TB of data. Approximately 66% of the cost was due to inter-region transfer fees (moving data from US-WEST to US-EAST). The rest is standard LIST, GET, and related fees.

Scihist_digicoll Backup and Recovery

...

This has not been tested. Instructions here are general guides but do not have the step by step quality of our other instructions. This is to be run when we have lost all access to data in S3. Note this will likely take a long time so should expectations should be set with the project owner and library on how to notify people of the outageshould be set with the project owner and library on how to notify people of the outage.

Access to the local backups must be done in the building or via Citrix (https://sciencehistory.org/citrix) using the PuTTY app

The local backup server (Dubnium) can be found at 65.170.7.86

Right now the account holders are Daniel, Jonathan, and Eddie, each of whom uses a password to access the server.

  • The original files and postgres database exist on-site on the shared drive, and also offline in tape backups made from that shared drive. Ideally we do not need the tape backups. If they are needed work with IT (Chuck and Ponce) to follow their procedures for recovering and loading the tapes.This part of the shared drive is only accessible on-site. Recovery (or Backup checks) cannot be done remotely.
  • An aws s3 sync will need to be run targeting the local backup directory aws s3 sync /media/scihist_digicoll_backups/asset s3://scihist-digicoll-production-originals/asset --region us-east-1
    • This may take multiple days to run
  • While that runs, copy the postgres database from it's location at /media/scihist_digicoll_backups to the database server.
  • Follow step 5e in the Full Recovery from S3 section to load the database.
  • Follow step 5f to reindex Solr
  • Once all of the original files are moved you can regenerate the derivatives with the rake task. Derivative files are not backed up locally so syncing is not an option.

...