Copy from LEGACY EC2 postgres to heroku

This is mainly only for our transition phase, once we are fully on heroku we won’t need to do this anymore.

 

  1. ssh to the (production or staging) EC2 database server.

    1. You can find it’s URL by looking it up on AWS console

    2. Eddie and jonathan should have SSH key passwordless login to it

  2. Export a database file by (from heroku instructions to get the right format for heroku import)

    1. PGPASSWORD=$PASSWORD pg_dump -Fc --no-acl --no-owner -h localhost -U tophat digcol > scihist-staging-yyyymmdd.dump
    2. The password can be found in ansible-vault file as key hydra_db_pass

  3. Copy the file to a S3 location – not a public one, contains sensitive info! The chf-hydra-backup bucket is a good place.

    1. Copy directly from the EC2 db server will be a lot quicker than copying to your workstation (not on AWS network) first.

    2. You will need AWS credentials that have access to the chf-hydra-backup server. They can be found in our P drive location for sensitive data.

    3. AWS_ACCESS_KEY_ID=$ACCESS_KEY_ID AWS_SECRET_ACCESS_KEY=$SECRET_KEY aws s3 cp ./scihist-staging-yyyymmdd.dump s3://chf-hydra-backup/scihist-pgsql-temp/scihist-staging-yyyymmdd.dump
    4. Of course scihist-staging-yyyymmdd.dump is a stand-in for a file with an actual yyyymmdd timestamp, recommended to avoid confusion.

  4. Get a signed S3 URL, using the AWS cli

    AWS_ACCESS_KEY_ID=$ACCESS_KEY_ID AWS_SECRET_ACCESS_KEY=$SECRET_KEY aws s3 presign s3://chf-hydra-backup/scihist-pgsql-temp/scihist-staging-yyyymmdd.dump
  5. Restore that to heroku with:

    1. Yes, DATABASE_URL is a literal, that’s what you type. <SIGNED URL> is from above step

    2. This is from Heroku instructions.

    3. If you have more than one git remote (eg staging and production) you may need to use the --remote arg.

  6. Delete the temporary .dump file from the chf-hydra-backup S3 bucket, cause it’s a big file (with sensitive info), we don’t need to leave it lying around.