Page Comparison

...

a) Rake task: Replace the Ansible-managed scriptpostgres-backup.sh with a rake task run regularly on a one-off Heroku dyno. This would obtain the latest database URL and then push it up to s3, where it can wait to be harvested by the Dubnium script.

Pro:

minimal change from our existing workflow;
easy to check on by ensuring the date on the appropriate S3 bucket.

Con:

requires a part of our code to have S3 credentials that allows it to write to our backup directory;
requires the Heroku CLI to be accessible to the rake task (so it can obtain the URL of the latest dump).

b) cron job on Dubnium: Dispense with the S3 portion of the workflow entirely, and set up the cron job on Dubnium to obtain its database backup directly from Heroku

Pro: simpler;

does not require the scihist_digicoll code to know anything about the backup s3 setup; thus safer;

Con:

assumes we trust the Heroku database backup workflow;
less transparent: it’s more legwork to log into Dubnium and check that the database backed up there is current (Dubnium is only accessible by logging into Citrix Workspace);
Dubnium is not managed by Ansible, and needs to be manually updated;
One less copy: instead of having copies in the database server, in S3, on Dubnium and on tape, we would only have copies in Heroku, on Dubnium, and on tape;
Dubnium needs to have access to the Heroku CLI and the appropriate credentials.

Versions Compared

Old Version 5

New Version 6

Key