AWS / Ansible / capistrano - notes from conversations with Alicia

* These notes supplement DCE DOCS.pdf

AWS Web Console

Administration & security > IAM

  • issue and revoke access keys
  • ansible_access policy is copied from a full access policy. 
    • Need to disallow some things – 
      • anything that allows termination of services (destroy. note: stopping is okay)
      • "describe" commands also not great
      • The user can't create users, manage policies, etc, so that's a good start.
  • delete snapshot, volume are useful for backups management (note: Alicia says there is no cost associated with snapshots)
  • These can also be made more granular, e.g. under 'Resource' -> that's the entire account but can be broken into individual machines or groups

Instance: keypair column. key pair -> ssh connection for the instance.

  • it's set once and cannot be changed. 
  • you can add more keys on the machine itself. 
  • You could also create an image from your instance, spin up another server w/ a new keypair, and use the image (e.g. if you lose the key)

Security group: the firewall configuration. any machine can have multiple security groups. the most permissive rule on each port is used.

  • Alicia tends to name this after the keypair.
  • if on CentOS (which we are not) you must remember to make iptables match these rules
  • do we still need to close port 8080 or has that been done in ec2 scripts? - done via console

User - can change access keys if these credentials are leaked

 

AWS Architecture

pricing - unless you make a specific decision it's pay-as-you-go

Architecture design: 1 dev machine, 1 higher end. Fedora data on separate volume. b/u every 24 hrs. Keep every 7 days + first of every month (keep 2)

Instances

Root device (disc at '/') typically ~8G, just for OS. EBS -> persistent storage

Block devices (e.g. /dev/sdb) mounted at '/opt' to hold solr and fedora

3rd device for fedora data

Use DNS or elastic IPs to swap machines

Volumes

See mounted discs. State shows whether they're attached

Misc

  • snapshots
  • elastic IPs - external IPs you can move between machines to avoid messing with DNS
  • AMI - Amazon Machine Image. like a snapshot of a machine. Can be used to script ways to do as-needed load balancing, e.g.

Security

Keep names of users secret, along with keys, other obvious things

The stack

Web server will be Apache, due to Alicia's greater experience with that server

Installing Ruby from source

  • Advantages: standard, stable, known version of ruby (no problems with apt updates coming at a bad time)
  • Disadvantage: security bugs – always stay a version behind

Postgres uses peer authentication

  • superuser is the postgres user, who can log in to db without a password (auto-authenticates when you are this user)
  • Alicia changes default auth to MD5, makes a new user, with restricted permissions, for the rails database
  • settings in database.yml

Solr

app/config and tomcat both need to know the name of the solr instance. Alicia has called it hydra.xml, but she's making it a variable. Not a security issue b/c we close the port (8080) to all but localhost. Not sure why i have a note that says $curl localhost:8080/my_context/

Use an ssh tunnel if you want to access your solr instance via the web without messing with the firewall!

  • $ ssh -L 2020:localhost:8080 target_machine
  • set up target_machine in sshconfig or remember you have to specify user, ip, cert, etc.
  • in browser, to go localhost:2020/solr_context
  • (things might break if there are redirects with ports in them)

Capistrano

shared files (linked files) are for two types of things

1. things that are stored in the repo without production configuration
2. things that for other reasons you don't want to change from one deployment to the next (this is mostly directories, like tmp/pids - you don't want old tmp/pids directories hanging around with processes in limbo)


there's room for various interpretations - for example, on sandbox deployments I like to keep the log directories per-deployment

that way if a deployment fails, I can easily isolate the relevant log messages, even after reverting to an earlier, functioning deployment so testing can continue

but for production environments, sharing the log directory and rotating the files makes more sense - especially if you test everything in a staging environment first so you only deploy successful code

Ansible

FYI, the ansible.cfg file included in the repo turns off host checking for ec2 instances. if you get rid of that, your playbook will fail after prompting you to add the RSA fingerprint to your known hosts list

there are two ec2-related bits

  • the launch-ec2 one does the heavy lifting - creates the instance and then creates and attaches a volume
  • then the ec2 role puts aws-specific tools on the box for backups - for obvious reasons, that had to come later in the process
  • Backup script pulls the same variable used to create the backups, so it can vary by machine (i.e. the production machine could use "CHF-prod" and the staging machine "CHF-stage") and each machine will delete its own backups over time.

vault

The credentials in the vaulted files are all new. For backups, I generated a new IAM user, new credentials, and a policy that only has access to snapshots. For creating instances, I generated new credentials on the existing IAM user and turned off the old credentials (because they will be on GitHub now if you know how to find them).