See also list on basecamp - not yet integrated here.
"Future Refinements" from DCE report also not yet fully integrated
| Short-term needs | Best Practice | Ongoing |
---|
Sysadmin | - Walkthrough performing an actual backup recovery. Document the steps and how we determine whether data has been lost.
| Note these tasks result in related ongoing maintenance. - set up service monitoring
- set up log analysis
- Perform risk assessments and business impact analyses (BIA); keep these up-to-date
- Help design and implement redundancies (e.g. failover server) for needs identified by BIA. Execute redundancies as-needed
| - OS-level updates and upgrades
- Security patches / monitoring this space
- Backup script maintenance
- AWS expertise
- Own and maintain deployment scripts
- Help coordinate and perform large-scale upgrades (e.g. those that require spinning up new boxes and doing switch-overs of drives or DNS entries)
- Keep tabs on storage use over time and coordinate projections thereof
- Create and manage SSL certs
- Manage user (server) accounts
- Firewall configuration
|
Grey area: responsibility shared, unclear, or variable | | - database administration / tuning
- Integrate Hydra user accounts with CHF LDAP server
- monitor and benchmark JVM, make heap size, garbage collection adjustments as needed
| |
Ops | | - set up CI server or service
- set up security filters for incoming / outgoing code
- modify new ansible project to work with vagrant to create a development environment.
| - configure differences between staging, prod, and test environments in ansible and capistrano
|
Conversation topics - sysadmin friends:
- Scope of duties
- Current projects
- Do you do any "ops"-y stuff? Would you want to?
- Project back log
- Routine duties
- Do you also do coding?
- How many boxes do you manage?
- Have you experienced or simulated data loss & recovery?
- Do you have/do the things in our "best practice" column?
- How do you define sysadmin vs. developer responsibilities?
- AWS: delete volumes on instance termination? for attached as well as root volumes. My instinct is to turn this off and manually delete volumes (or schedule a job to delete all "available" volumes older than X days?). But wondering if there is a scenario in which it makes sense to keep it on.