Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • I am concerned that the “concurrent request limit” of “5” in “Standard small” might be a problem, considering updates/reindexes as well as searches are concurrent requests. I don’t entirely understand how this is metered, and what happens if it’s exceeded

  • Doing lots of bulk operations might make lots of requests per day, since our app is set up to do a solr update on every ‘save’.

  • Larger plans can get expensive quickly. “Standard Medium” with 150K daily requests and 10 concurrent requests (which still seems small to me) is $189/month

  • If we ended up with anything running on raw AWS instead of heroku, Solr would probably be the first thing. It’s what I’m most worried about being able to run affordably on heroku.

  • ElasticSearch is a competitor to Solr; it has many more heroku plugin offerings from multiple vendors at differnet price points, which are not typically metered by concurrent connections (the most troublesome meter in WebSolr plans). But getting our app to run on ElasticSearch instead of Solr would take significant development (we are not as familiar with it; it is unclear to what extent Blacklight supports it, or we’d need to develop new stuff on top of or instead of Blacklight; we are definitely using some Blacklight plugins like date_range_limit that are Solr-only).

  • There mightbe other “Solr as a service” offerings that aren’t Heroku plugins but still don’t require us to run our own server at the OS. That could be cheaper. But unclear. Definitely not as many as ElasticSearch. The only one I can find is opensolr, which does seem cheaper… but they meter you for bandwidth, which I’m not sure how to predict how much we’ll be using. https://opensolr.com/pricing We could talk to a salesperson.

Scheduled Tasks

We have just a handful of nightly scheduled tasks. They need a means of scheduling them, and also will take “dyno” resources to run.

There is a free heroku scheduler; there is also an Advanced Heroku Scheduler with more features. Haven’t totally figured out when/why you’d need the advanced one, but just to be safe let’s budget it.

Production

Staging

Advanced Scheduler Premium 0

Advanced Scheduler Premium 0

$15/month

$15/month

We also have to pay for dyno hours for actually executing scheduled tasks. We have a couple that take only a few minutes to run and probably aren’t even worth accounting for (create Google SiteMap, clear some blacklight tables of old data).

The main issue is the Fixity Check code. We run nightly Fixity Checks, which usually take 4.5 hours to complete. (They could be slower on heroku hardware).

  • It’s unclear if we can get away with a 4.5 hour single job on heroku, heroku does not like long-running jobs like this, it can restart your dynos at any time. But it could work.

  • This time will go up linearly as we ingest more into collection

  • We try to fixity check everything weekly, if we switch to (eg) monthly could be 1/4th as much time fixity checking (May have to adjust S3 version retention too, to make sure we’re keeping things for more than a month)

  • Could switch from a single batch job to using or BG Job Queue job-per-check, to not have long-running tasks so heroku restarts don’t interfere, and could use one worker from BG Worker infrastructure.

One way or another, we’ll have to pay for resources: compute time, possibly larger redis to hold queue. Let’s budget a full standard-1x up 24 hours, although this could be more than we need (prob don’t need that much compute), or less than we need (if we need to upgrade redis to hold larger queue could cost more than this). We may not also really need to do this in staging, but we always have before.

Production

Staging

standard-1x dyno for fixity compute

standard-1x dyno for fixity compute

$15/month

$15/month

Total Cost Estimate