Cloud Hosting Options

Our Initial needs

  • In-house only -- We will concentrate on accepting objects to support an upcoming large-scale scanning project
  • High tolerance for downtime, which will only affect staff
  • Storage estimates TBD

Our initial strategy

Production

  • All components on one VM (with multiple volumes for Solr Index and Fedora datastore)
  • Cloud hosted
  • AWS or Google would leave us with the most options / flexibility moving forward for increasing redundancy, ease of scaling

Development / Sandbox machine

  • Hosted locally
  • Deployed by Chuck and ponce
  • Ubuntu

Local development

  • Vagrant
  • code hosted on bitbucket

Focused on web services

Amazon Web Services (AWS) -  http://aws.amazon.com/  Run one or multiple Linux servers in the Cloud and deploy applications; Also offers S3 and Glacier options for data storage, back-up and redundancy; elastic storage and pricing that grows as your collection expands

 

Rackspace http://www.rackspace.com/ Managed DevOps Cloud service for running Linux servers in the Cloud

 

Linode - Hosted linux box

 

Digital Ocean - https://www.digitalocean.com/ Run Linux servers in the Cloud and deploy web applications. Pricing model based on Solid State Drives (SSD)

 

Engine Yard - Platform-as-a-service host. Support options.

 

Google cloud - Pretty much like AWS but apparently easier to figure out.

Heroku? - Easy to use; set it up as a git branch. do not have to manage the details at all. Would only be possible to use this for the rails piece of the application, so probably not a useful simplification.

 

 

Focused on data preservation and storage

DuraCloud - http://www.duracloud.org/ Managed by DuraSpace (Hydra partners!) provides managed Amazon S3 storage for archiving in the Cloud, now with Amazon Glacier option for long-term preservation 

 

EVault from Seagate http://www.evault.com/ Hybrid Cloud service with onsite and cloud storage options; recovery and archive options

 

Preservica Cloud Edition - http://preservica.com/edition/cloud-edition/ OAIS compliant, Amazon S3 storage, out of the box, pay as you go, public access/discovery  [NB: Preservica cannot run Fedora, so not an option for Hydra. But we could look into using it for born digital materials after Hydra is stable if we want. Hagley is looking at a Preservica/Islandora combo. I did a site visit and talked with the Preservica people, so I can talk more about pros and cons later. -MD]

 

This is a dated thread on the Hydra Tech list serve that began with a discussion about possibly using Hydra with Amazon Glacier for long-term preservation, and then develops into a larger discussion about preservation. It’s a bit old now, but I just found it interesting! https://groups.google.com/forum/#!searchin/hydra-tech/preservation/hydra-tech/Ww2FXqWjUZc/-vWOGm7HZGMJ

 

Question / Meeting agenda

  • What are our needs profiles?
    • Development / sandbox server 
      • in cloud - linode?
      • locally
    • Serve a website (host in cloud)
    • Store data for presentation purposes
    • Store data for preservation purposes - Look at NDSA chart.
    • Back up data
    • Back up systems
      • To code repository/ies
      • Take snapshots to store locally??
  • Can fedora deposit data to a remote server?
    • If yes, what are the pros and cons?
  • Administration services
    • Rackspace e.g., offers some administration services - do we want/need this?
    • Storage services offer some administration services around file checking / "repair" 
  • What is the distribution of responsibilities?

For reference: our components

  • Backlight / rails app
    • I assume there’s a database associated with this app
  • Solr Index - served by jetty
  • Fedora - also served by jetty
    • db?