AWS / Ansible / capistrano - notes from conversations with Alicia
* These notes supplement DCE DOCS.pdf
AWS Web Console
Administration & security > IAM
- issue and revoke access keys
- ansible_access policy is copied from a full access policy.
- Need to disallow some things –
- anything that allows termination of services (destroy. note: stopping is okay)
- "describe" commands also not great
- The user can't create users, manage policies, etc, so that's a good start.
- Need to disallow some things –
- delete snapshot, volume are useful for backups management (note: Alicia says there is no cost associated with snapshots)
- These can also be made more granular, e.g. under 'Resource' -> that's the entire account but can be broken into individual machines or groups
Instance: keypair column. key pair -> ssh connection for the instance.
- it's set once and cannot be changed.
- you can add more keys on the machine itself.
- You could also create an image from your instance, spin up another server w/ a new keypair, and use the image (e.g. if you lose the key)
Security group: the firewall configuration. any machine can have multiple security groups. the most permissive rule on each port is used.
- Alicia tends to name this after the keypair.
- if on CentOS (which we are not) you must remember to make iptables match these rules
- do we still need to close port 8080 or has that been done in ec2 scripts? - done via console
User - can change access keys if these credentials are leaked
AWS Architecture
pricing - unless you make a specific decision it's pay-as-you-go
Architecture design: 1 dev machine, 1 higher end. Fedora data on separate volume. b/u every 24 hrs. Keep every 7 days + first of every month (keep 2)
Instances
Root device (disc at '/') typically ~8G, just for OS. EBS -> persistent storage
Block devices (e.g. /dev/sdb) mounted at '/opt' to hold solr and fedora
3rd device for fedora data
Use DNS or elastic IPs to swap machines
Volumes
See mounted discs. State shows whether they're attached
Misc
- snapshots
- elastic IPs - external IPs you can move between machines to avoid messing with DNS
- AMI - Amazon Machine Image. like a snapshot of a machine. Can be used to script ways to do as-needed load balancing, e.g.
Security
Keep names of users secret, along with keys, other obvious things
The stack
Web server will be Apache, due to Alicia's greater experience with that server
Installing Ruby from source
- Advantages: standard, stable, known version of ruby (no problems with apt updates coming at a bad time)
- Disadvantage: security bugs – always stay a version behind
Postgres uses peer authentication
- superuser is the postgres user, who can log in to db without a password (auto-authenticates when you are this user)
- Alicia changes default auth to MD5, makes a new user, with restricted permissions, for the rails database
- settings in database.yml
Solr
app/config and tomcat both need to know the name of the solr instance. Alicia has called it hydra.xml, but she's making it a variable. Not a security issue b/c we close the port (8080) to all but localhost. Not sure why i have a note that says $curl localhost:8080/my_context/
Use an ssh tunnel if you want to access your solr instance via the web without messing with the firewall!
- $ ssh -L 2020:localhost:8080 target_machine
- set up target_machine in sshconfig or remember you have to specify user, ip, cert, etc.
- in browser, to go localhost:2020/solr_context
- (things might break if there are redirects with ports in them)
Capistrano
shared files (linked files) are for two types of things
1. things that are stored in the repo without production configuration
2. things that for other reasons you don't want to change from one deployment to the next (this is mostly directories, like tmp/pids - you don't want old tmp/pids directories hanging around with processes in limbo)
there's room for various interpretations - for example, on sandbox deployments I like to keep the log directories per-deployment
that way if a deployment fails, I can easily isolate the relevant log messages, even after reverting to an earlier, functioning deployment so testing can continue
but for production environments, sharing the log directory and rotating the files makes more sense - especially if you test everything in a staging environment first so you only deploy successful code
Ansible
FYI, the ansible.cfg file included in the repo turns off host checking for ec2 instances. if you get rid of that, your playbook will fail after prompting you to add the RSA fingerprint to your known hosts list
there are two ec2-related bits
- the launch-ec2 one does the heavy lifting - creates the instance and then creates and attaches a volume
- then the ec2 role puts aws-specific tools on the box for backups - for obvious reasons, that had to come later in the process
- Backup script pulls the same variable used to create the backups, so it can vary by machine (i.e. the production machine could use "CHF-prod" and the staging machine "CHF-stage") and each machine will delete its own backups over time.
vault
The credentials in the vaulted files are all new. For backups, I generated a new IAM user, new credentials, and a policy that only has access to snapshots. For creating instances, I generated new credentials on the existing IAM user and turned off the old credentials (because they will be on GitHub now if you know how to find them).