Server / Maintenance tasks, Operations

See also list on basecamp - not yet integrated here.

"Future Refinements" from DCE report also not yet fully integrated

 Short-term needsBest PracticeOngoing
Sysadmin
  • Walkthrough performing an actual backup recovery. Document the steps and how we determine whether data has been lost.

Note these tasks result in related ongoing maintenance.

  • set up service monitoring
  • set up log analysis
  • Perform risk assessments and business impact analyses (BIA); keep these up-to-date
  • Help design and implement redundancies (e.g. failover server) for needs identified by BIA. Execute redundancies as-needed
  • OS-level updates and upgrades
  • Security patches / monitoring this space
  • Backup script maintenance
  • AWS expertise
  • Own and maintain deployment scripts
  • Help coordinate and perform large-scale upgrades (e.g. those that require spinning up new boxes and doing switch-overs of drives or DNS entries)
  • Keep tabs on storage use over time and coordinate projections thereof
  • Create and manage SSL certs
  • Manage user (server) accounts
  • Firewall configuration
Grey area: responsibility shared, unclear, or variable 
  • database administration / tuning
  • Integrate Hydra user accounts with CHF LDAP server
  • monitor and benchmark JVM, make heap size, garbage collection adjustments as needed
 
Ops 
  • set up CI server or service
  • set up security filters for incoming / outgoing code
  • modify new ansible project to work with vagrant to create a development environment.
  • configure differences between staging, prod, and test environments in ansible and capistrano
  
Conversation topics - sysadmin friends:
  • Scope of duties
  • Current projects
  • Do you do any "ops"-y stuff? Would you want to?
  • Project back log
  • Routine duties
  • Do you also do coding?
  • How many boxes do you manage?
  • Have you experienced or simulated data loss & recovery?
  • Do you have/do the things in our "best practice" column?
  • How do you define sysadmin vs. developer responsibilities?
  • AWS: delete volumes on instance termination? for attached as well as root volumes. My instinct is to turn this off and manually delete volumes (or schedule a job to delete all "available" volumes older than X days?). But wondering if there is a scenario in which it makes sense to keep it on.