...
a) Rake task: Replace the Ansible-managed scriptpostgres-backup.sh
with a rake task run regularly on a one-off Heroku dyno. This would obtain the latest database URL and then push it up to s3, where it can wait to be harvested by the Dubnium script.
Pro:
minimal change from our existing workflow;
easy to check on by ensuring the date on the appropriate S3 bucket.
Con:
requires a part of our code to have S3 credentials that allows it to write to our backup directory;
requires the Heroku CLI to be accessible to the rake task (so it can obtain the URL of the latest dump).
b) cron job on Dubnium: Dispense with the S3 portion of the workflow entirely, and set up the cron job on Dubnium to obtain its database backup directly from Heroku
Pro: simpler;
does not require the scihist_digicoll
code to know anything about the backup s3 setup; thus safer;
Con:
assumes we trust the Heroku database backup workflow;
less transparent: it’s more legwork to log into Dubnium and check that the database backed up there is current (Dubnium is only accessible by logging into Citrix Workspace);
Dubnium is not managed by Ansible, and needs to be manually updated;
One less copy: instead of having copies in the database server, in S3, on Dubnium and on tape, we would only have copies in Heroku, on Dubnium, and on tape;
Dubnium needs to have access to the Heroku CLI and the appropriate credentials.