Rebuilding production (Obsolete)

Our recipe for rebuilding all the production scihist_digicoll servers from scratch.

1. Build

There’s no need for downtime during this phase, as building new servers should in no way interfere with the existing ones. Why?

  • The create_kithe playbook does not affect existing servers or change them. (That is the job of the update_kithe playbook.)

  • As long as the new IP settings are not checked into Bitbucket, the automatic updates on Management will not update the existing production servers.

If there’s no rush, consider performing the build and switch phases on consecutive days.

Setup

Just some shell substitutions to help avoid typos and make the commands easier to read.

PRIVATE_KEY="~/.ssh/chf_prod.pem"
PASS_FILE="~/.ansible_password.txt"
SECURITY_OPTS="--vault-password-file $PASS_FILE --private-key=$PRIVATE_KEY"
CREATE_CMD="ansible-playbook create_kithe.yml”

DB
$CREATE_CMD $SECURITY_OPTS --extra-vars "role=database tier=production"

Add DB’s private IP to group_vars/kithe_production

redis_ip: new_private_ip_of_scihist_digicoll-database1-production
postgres_ip: new_private_ip_of_scihist_digicoll-database1-production

Jobs
$CREATE_CMD $SECURITY_OPTS --extra-vars "role=jobs tier=production"

Web
$CREATE_CMD $SECURITY_OPTS --extra-vars "role=web tier=production" --extra-vars "@group_vars/kithe_web_production_override"

2. Switch

  • Turn off automatic updates on the management server.

  • Point the prod URL in your etc/hosts file locally to the web server.

  • Spin up the downtime server in Amazon (actions instance state start)

  • Point the elastic IP for the prod web server to the downtime server

    • Select Elastic IPs

    • Select Temp-Down

    • Actions Associate address

    • Under the "Resource type", Choose "Instance".

    • Fill in the appropriate instance (the new prod web server)

    • Check Allow this Elastic IP address to be reassociated if already attached"

  • Spin down the existing prod servers (actions instance state stop)

  • Deploy master branch to all the new servers.

    bundle exec cap production deploy --trace

    At this stage, if you point the prod URL in your etc/hosts file locally to the web server, you should see an empty production website.

  • Copy the old database over to the new DB server

    • TODO Fill this in

  • Index

    • bundle exec cap production invoke:rake TASK="scihist:solr:reindex"

    • At this stage, if you point the prod URL in your etc/hosts file locally to the web server, you should be able to search the new site and get full results.

  • Commit the changes to Ansible into both the staging and master branches on Bitbucket.

    • This is important: if you don’t do this, the old IP addresses will be automatically reapplied to the new servers. And the old IP addresses are inactive by now.

  • Point elastic IP to the new web server:

    • Select Elastic IPs

    • Select digicoll-production

    • Actions Associate address

    • Under the "Resource type", Choose "Instance".

    • Fill in the appropriate instance (the new prod web server)

    • Check Allow this Elastic IP address to be reassociated if already attached"

    • Click Associate.

  • Remove the etc/hosts entry and check that traffic is indeed routed to the new server

  • Turn on automatic updates on the management server.

  • Announce the switch is complete.

3. Wrap up

  • Wait a couple days

  • Turn off termination protection

  • Delete the old servers in AWS console

  • Delete the old disks left orphaned by the old servers