Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

You can check the metadata on the latest backups as followslike this: $ heroku pg:backups
=== Backups
ID Created at Status Size Database
──── ───────────────────────── ─────────────────────────────────── ─────── ────────
a006 2020-12-14 07:30:18 +0000 Completed 2020-12-14 07:30:56 +0000 64.37MB DATABASE
heroku pg:backups:download a006 will produce a “logical” database dump – a binary file – that can easily be converted to a “physical” (i.e. garden variety SQL file) dump as follows: pg_restore -f mydatabase.sql latest.dump.

More simply, you can run: curl -o 'latest.dump' heroku pg:backups:url to get the latest logical dump.

Heroku retains daily backups for 7 days, and weekly backups for 4 weeks. (more details re: retention schedule)

Options

...

Additional backup to S3

We intend to replace the Ansible-managed scriptpostgres-backup.sh with a rake task, rake scihist:copy_database_to_s3, run regularly on a one-off Heroku dyno. This would obtain downloads the latest database URL and then push it up production database to a temp file, then uploads that to s3, where it can wait to be harvested by the Dubnium script.

Pro:

  • minimal change from our existing workflow;

  • easy to check on by ensuring the date on the appropriate S3 bucket.

Con:

  • requires a part of our code to have S3 credentials that allows it to write to our backup directory;

  • requires the Heroku CLI to be accessible to the rake task (so it can obtain the URL of the latest dump).

b) cron job on Dubnium: Dispense with the S3 portion of the workflow entirely, and set up the cron job on Dubnium to obtain its database backup directly from Heroku

Pro: simpler;

does not require the scihist_digicoll code to know anything about the backup s3 setup; thus safer;

Con:

...

assumes we trust the Heroku database backup workflow;

...

less transparent: it’s more legwork to log into Dubnium and check that the database backed up there is current (Dubnium is only accessible by logging into Citrix Workspace);

...

Dubnium is not managed by Ansible, and needs to be manually updated;

...

One less copy: instead of having copies in the database server, in S3, on Dubnium and on tape, we would only have copies in Heroku, on Dubnium, and on tape;

...

Given the size of the database in late 2020, the dump takes 13 seconds, and the upload another 13. The entire job (with the overhead of starting up the dyno and tearing it down) takes a bit under a minute.