Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Our recipe for rebuilding all the production scihist_digicoll servers from scratch.

1. Build

There’s no need for downtime during this phase, as building new servers should in no way interfere with the existing ones. Why?

  • The create_kithe playbook does not affect existing servers or change them. (That is the job of the update_kithe playbook.)

  • As long as the new IP settings are not checked into Bitbucket, the automatic updates on Management will not update the existing production servers.

(lightbulb) If there’s no rush, consider performing the build and switch phases on consecutive days.

...

Setup

Just some shell substitutions to help avoid typos and make the commands easier to read.

PRIVATE_KEY="~/.ssh/chf_prod.pem"
PASS_FILE="~/.ansible_password.txt"
SECURITY_OPTS="--vault-password-file

...

$PASS_FILE --private-key

...

=$PRIVATE_KEY"
CREATE_CMD="ansible-playbook create_kithe.yml”

DB
$CREATE_CMD $SECURITY_OPTS --extra-vars "role=database tier=production"

Add DB’s private IP to group_vars/kithe_production

redis_ip: new_private_ip_of_scihist_digicoll-database1-production
postgres_ip: new_private_ip_of_scihist_digicoll-database1-production

...

-

...

production

...

Add SOLR’s private IP to group_vars/kithe_production

solr_ip: new_private_ip_of_scihist_digicoll-solr1-production

...

Jobs
$CREATE_CMD $SECURITY_OPTS --extra-vars "role=jobs tier=production"

Web

...

$CREATE_CMD $SECURITY_OPTS --extra-vars "role=web tier=production" --extra-vars "@group

...

_vars/kithe_web_production_override"

Deploy

Deploy master branch to all servers.

(lightbulb) At this stage, if you point the prod URL in your etc/hosts file locally to the web server, you should see an empty production website.

...

2. Switch

  • Turn off automatic updates on the management server.

  • Point the prod URL in your etc/hosts file locally to the web server.

  • Spin up the downtime server in Amazon (actionsinstance statestart)

  • Point the elastic IP for the prod web server to the downtime server

    • Select Elastic IPs

    • Select Temp-Down

    • ActionsAssociate address

    • Under the "Resource type", Choose "Instance".

    • Fill in the appropriate instance (the new prod web server)

    • Check Allow this Elastic IP address to be reassociated if already attached"

  • Spin down the existing prod servers (actionsinstance statestop)

  • Deploy master branch to all the new servers.

    bundle exec cap production deploy --trace

    (lightbulb) At this stage, if you point the prod URL in your etc/hosts file locally to the web server, you should see an empty production website.

  • Copy the old database over to the new DB server.

    • TODO Fill this in

  • Index

    • bundle exec cap production invoke:rake TASK="scihist:solr:reindex"

    • (lightbulb) At this stage, if you point the prod URL in your etc/hosts file locally to the web server, you should be able to search the new site and get full results.

  • Commit the changes to Ansible into both the staging and master branches on Bitbucket.

    • This is important: if you don’t do this, the old IP addresses will be automatically reapplied to the new servers. And the old IP addresses are inactive by now.

  • Point elastic IP to the new web server:

    • Select "Elastic IPs"

    • Select "digicoll-production"

    • Actions -> Associate address

    • Under the "Resource type", Choose "Instance".

    • Fill in the appropriate instance (the new prod web server)

    • Check " Allow this Elastic IP address to be reassociated if already attached"

    • Click "Associate".

  • Remove the etc/hosts entry and check that traffic is indeed routed to the new server

  • Turn on automatic updates on the management server.

  • Announce the switch is complete.

3. Wrap up

  • Wait a couple days (question)

  • Turn off termination protection

  • Delete the old servers in AWS console

  • Delete the old disks left orphaned by the old servers