Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Our data can be broken into two categories, one is data that is potentially irrecoverable. This includes our original binary files (images, audio, other) and the metadata about them (a SQL database). The other data is restorable but needed for normal site operation but takes significant time to restore, such as the derived download files and viewer files. The second set of data may be worth backing up to shorten recovery times for public users when data is lost.

As an estimate, our cost to hold extra backups for our current scihistcoll staging environment costs less than $2 a month out of a total $70.07 spent on data storage inclusive of data transfer and storage. While a production environment will have slightly higher cost ratios, it should not be massively higher. Thus by spending an additional 2-3% cost on S3, we can mount a full public recovery in an afternoon from a massive failure of our entire infrastructure. While we currently are not backing up our viewer tiles, an examination of our old application shows the cost for production averages around 5 dollars for storage. Adding a second copy of the viewer files should roughly double the cost, with a slight reduction for less use, so will add another 5 dollars to the cost, so for about $7-12 dollars a month we can be widely covered for data inaccessibility or other failures of S3 in a specific region. While it is hard to get specific details, there have been multiple outages or issues in a region whose duration lasted over an hour, and at least one major outage in the last two years lasting around 6 hours. Assuming about 8 hours of problems every two years, we can estimate that a rough cost of $36/hour of outage spent to avoid being down. Shorter outages may not be worth the difficulty of switching over.

In cases of small scale data loss, such as corrupted files or user error, the application will be working fine but a limited set of data will have a problem. In these cases we can locate the problematic data and use a backup copy to restore any damaged original files or use versioning to restore an earlier version of the file. Derived files can either be regenerated or copied from backups as well. This is the most common expected use case at requires only that we keep versions of our original files and backups of files in different locations (another S3 bucket and an on-site copy).

In cases of broad data loss most or all of the data is rendered unavailable. In these cases we will suffer a loss of service until we can recover the data. This can be thought of as two recoveries, one is to get the digital collection site back as soon as possible for the public and the second is to restore all functionality. Getting the site back for the public is our primary concern, so as noted above for outages we have a few methods to speed up recovery at an additional cost to our backup costs. Both the derivative and original files are backed up to a region on the West Coast in S3 like our actual use fileswith the same configuration details that our files in US-EAST (the original originals and original derivatives) have. We can recover public access by using these backup files directly while we spend more time working on a full recovery for staff functionality. Due to current setups we would not want staff to add new works, but this allows us to rapidly restore public facing access to our site should the normal data sources be unavailable. A longer process allows us to restore the data back to the original locations while leaving public access up, once the data is restored to its original place full staff functionality will likewise be restored.

...