Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • Has 8000 docs (maxdocs 10,000). Basically scales with number of Works we have.

  • Listed as 70MB size

  • Trying to figure out how many solr queries we do per day or hour, since some solr heroku add-ons are metered requests… I’m not sure we’re logging the right stuff and am not sure I’m interpreting the logs right, but I am going to say somewhere under 10K requests per day.

Heroku Estimate

There is one plugin for Solr available on heroku marketplace, from third-party vendor “WebSolr”. The “Standard Small” plan seems to be sufficient for us.

...

We currently have a gap in monitoring of our app. There are some heroku-specific monitoring concerns. We may want to choose a heroku monitoring plugin; there are some sophisticated ones, but they aren’t cheap. (could end up paying $50-$200/month).

...

  • While heroku gets us out of the systems administrator business, developers will need to spend time on “operations”, especially to plan and implement migration but also ongoing.

  • As our usage changes (volume, or functionality), heroku costs could increase at a steep slope, we might not be prepared for.

  • Heroku plug-ins and SaaS offerings can be usage-metered in ways running our own on raw EC2 are not, for instance in terms of number of connections or number of requests per time period. This kind of metering tries to give you the “right” price for your “size” of use, but our usage patterns could be a weird match for how they are trying to price things, leading to unaffordable pricing.

    • The SaaS Solr offerings in particular are kind of expensive and potentially metered in ways that will be a problem for us. We might end up wanting to still run Solr on our own EC2, meaning we’d still need to have in-house or out-sourced systems administration competencies to some extent.

  • Might need to rewrite/redesign some parts of our app to work better/more affordably on heroku infrastrcuture. For instance, our fixity checking. This should be do-able, just will take some time, we might find more things in migration process, and might involve some additional heroku expenses (eg bigger redis holding fixity checks in queue)infrastructure – or could find some things we aren’t yet predicting to be simply infeasible.

    • Our ingest process is very CPU-intensive. (File analysis, derivative and DZI creation). This may not be a good fit for the shared infrastructure of heroku “standard” dynos? Is it possible heroku will get mad at us for using “too much CPU” for sustained periods? I don’t think so? But we may find it slower than we expect/current?

    • We may just run into See more below

  • We require some custom software for media analysis/conversion (imagemagick, vips, mediainfo, etc). It should be possible to get these installed on heroku dynos using custom “buildpacks”, but if they are maintained by third-parties as open source they may be less reliable, or may require us to get into the systems administration task of “getting packages compiled/installed” after all.

  • Need to make sure our heroku deploy will reliably remain on AWS us-east-1, because if heroku were to move it, it would deleteriously effect our S3 access costs and performance.

Existing functionality with specific known Heroku challanges

  • Our routine to create a combined PDF of all pages in a scanned book uses an excessive amount of RAM which will be impossible (or infeasibly expensive) on heroku. We would need to find different tools/usage that can create this PDF in constant RAM regardless of number of pages, or eliminate this functionality.

  • File downloads and uploads can not be directly to the app in heroku, because of 30 second max timeout on requests. Currently we do have direct-to-S3 uploads and downloads, but we’d be locked into that (or some other non-heroku process, such as a CDN for downloads).

    • This makes access-control implementation options more limited. We were considering proxying file downloads directly through our app and/or nginx for access control, but would have to rule out that option (unless we hosted an nginx direct on EC2 ourselves, which we probably don’t have the current in-house expertise to set up).

  • Current fixity check routine is a long-running process, which doesn’t work great on heroku. Probably have to divide it into separate bg ActiveJobs, which might require bigger redis and/or more/more complex background worker setup – might involve some additional heroku expenses.

  • Could be other things we aren’t even thinking of yet that we run into, in initial implementation, or even down the line a year or two now

    that end up being difficult or infeasibly expensive on heroku-style deploy.
  • Some of our potential plans for dealing with access-controlled originals and derivatives for OH involved using nginx web server features – in a way that doesn’t really work in heroku. (heroku responses are limited to 30 seconds max, so you can’t really use it for delivering large files via your app)

  • We require some custom software for media analysis/conversion (imagemagick, vips, mediainfo, etc). It should be possible to get these installed on heroku dynos using custom “buildpacks”, but if they are maintained by third-parties as open source they may be less reliable, or may require us to get into the systems administration task of “getting packages compiled/installed” after all.

  • Need to make sure our heroku deploy will reliably remain on AWS us-east-1, because if heroku were to move it, it would deleteriously effect our S3 access costs and performanceas we add functionality.

Reasons that moving from sufia/hyrax to local more-standard-Rails app made this more feasible

  • No fedora. If we still had a fedora, it would still need to be direct EC2 hosted, no way to do it in heroku ecosystem.

  • Our sufia app handled file uploads and downloads via the app itself, which is not tenable on heroku, rather than direct-to-S3 uploads and downloads.

  • Solr usage is much more limited in new app, fewer objects and attributes in solr, really just used for searching, which should help keep Solr SaaS more affordable.

  • Generally more efficient use of CPU in new app will help keep heroku more affordable, relatively.

Costs which would remain on AWS even with Heroku deploy

...