Cloud Hosting Options
Our Initial needs
- In-house only -- We will concentrate on accepting objects to support an upcoming large-scale scanning project
- High tolerance for downtime, which will only affect staff
- Storage estimates TBD
Our initial strategy
Production
- All components on one VM (with multiple volumes for Solr Index and Fedora datastore)
- Cloud hosted
- AWS or Google would leave us with the most options / flexibility moving forward for increasing redundancy, ease of scaling
Development / Sandbox machine
- Hosted locally
- Deployed by Chuck and ponce
- Ubuntu
Local development
- Vagrant
- code hosted on bitbucket
Focused on web services
Amazon Web Services (AWS) - http://aws.amazon.com/ Run one or multiple Linux servers in the Cloud and deploy applications; Also offers S3 and Glacier options for data storage, back-up and redundancy; elastic storage and pricing that grows as your collection expands
Rackspace - http://www.rackspace.com/ Managed DevOps Cloud service for running Linux servers in the Cloud
Linode - Hosted linux box
Digital Ocean - https://www.digitalocean.com/ Run Linux servers in the Cloud and deploy web applications. Pricing model based on Solid State Drives (SSD)
Engine Yard - Platform-as-a-service host. Support options.
Google cloud - Pretty much like AWS but apparently easier to figure out.
- https://cloud.google.com/developers/articles/compute-engine-disks-price-performance-and-persistence/
- https://cloud.google.com/compute/docs/containers
- http://docs.ansible.com/guide_gce.html
Heroku? - Easy to use; set it up as a git branch. do not have to manage the details at all. Would only be possible to use this for the rails piece of the application, so probably not a useful simplification.
Focused on data preservation and storage
DuraCloud - http://www.duracloud.org/ Managed by DuraSpace (Hydra partners!) provides managed Amazon S3 storage for archiving in the Cloud, now with Amazon Glacier option for long-term preservation
EVault from Seagate - http://www.evault.com/ Hybrid Cloud service with onsite and cloud storage options; recovery and archive options
Preservica Cloud Edition - http://preservica.com/edition/cloud-edition/ OAIS compliant, Amazon S3 storage, out of the box, pay as you go, public access/discovery [NB: Preservica cannot run Fedora, so not an option for Hydra. But we could look into using it for born digital materials after Hydra is stable if we want. Hagley is looking at a Preservica/Islandora combo. I did a site visit and talked with the Preservica people, so I can talk more about pros and cons later. -MD]
This is a dated thread on the Hydra Tech list serve that began with a discussion about possibly using Hydra with Amazon Glacier for long-term preservation, and then develops into a larger discussion about preservation. It’s a bit old now, but I just found it interesting! https://groups.google.com/forum/#!searchin/hydra-tech/preservation/hydra-tech/Ww2FXqWjUZc/-vWOGm7HZGMJ
Question / Meeting agenda
- What are our needs profiles?
- Development / sandbox server
- in cloud - linode?
- locally
- Serve a website (host in cloud)
- Store data for presentation purposes
- Store data for preservation purposes - Look at NDSA chart.
- Back up data
- Back up systems
- To code repository/ies
- Take snapshots to store locally??
- Development / sandbox server
- Can fedora deposit data to a remote server?
- If yes, what are the pros and cons?
- Administration services
- Rackspace e.g., offers some administration services - do we want/need this?
- Storage services offer some administration services around file checking / "repair"
- What is the distribution of responsibilities?
For reference: our components
- Backlight / rails app
- I assume there’s a database associated with this app
- Solr Index - served by jetty
- Fedora - also served by jetty
- db?