Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Need to verify we have as much customization of schema.xml (and solrconfig.xml?) as we need.

Current AWS

  • Runs on AWS t3a.small, which is 2GB RAM, and 2 virtual CPUs

  • Has 8000 docs (maxdocs 10,000). Basically scales with number of Works we have.

  • Listed as 70MB size

  • Trying to figure out how many solr queries we do per day or hour, since some solr heroku add-ons are metered requests… I’m not sure we’re logging the right stuff and am not sure I’m interpreting the logs right, but I am going to say somewhere under 10K requests per day.not sure I’m interpreting the logs right, but I am going to say somewhere under 10K requests per day.

    • update: I don’t think that was right. looking at production app logs, and all requests for /catalog , /focus , or /collection pages (each of which triggers a solr query), on Tuesday November 17th… we get around 35K requests for the day. This is maybe a subset of all solr requests, but gives us an order of magnitude for now?

    • We’d basically have to spend a bunch of time figuring out/changing our logging to be able ot really answer this question (this is why we want to get out of sysadmin business!)

Heroku Estimate

There is one plugin for Solr available on heroku marketplace, from third-party vendor “WebSolr”. The “Standard Small” plan seems to be sufficient for us.

...

  • While heroku gets us out of the systems administrator business, developers will need to spend time on “operations”, especially to plan and implement migration but also ongoing.

  • As our usage changes (volume, or functionality), heroku costs could increase at a steep slope, we might not be prepared for.

  • Heroku plug-ins and SaaS offerings can be usage-metered in ways running our own on raw EC2 are not, for instance in terms of number of connections or number of requests per time period. This kind of metering tries to give you the “right” price for your “size” of use, but our usage patterns could be a weird match for how they are trying to price things, leading to unaffordable pricing.

    • The SaaS Solr offerings in particular are kind of expensive and potentially metered in ways that will be a problem for us. We might end up wanting to still run Solr on our own EC2, meaning we’d still need to have in-house or out-sourced systems administration competencies to some extent.

  • Might need to rewrite/redesign some parts of our app to work better/more affordably on heroku infrastructure – or could find some things we aren’t yet predicting to be simply infeasible.

    • Our ingest process is very CPU-intensive. (File analysis, derivative and DZI creation). This may not be a good fit for the shared infrastructure of heroku “standard” dynos? Is it possible heroku will get mad at us for using “too much CPU” for sustained periods? I don’t think so? But we may find it slower than we expect/current?

    • See more below

  • We require some custom software for media analysis/conversion (imagemagick, vips, mediainfo, etc). It should be possible to get these installed on heroku dynos using custom “buildpacks”, but if they are maintained by third-parties as open source they may be less reliable, or may require us to get into the systems administration task of “getting packages compiled/installed” after all.

  • Need to make sure our heroku deploy will reliably remain on AWS us-east-1, because if heroku were to move it, it would deleteriously effect our S3 access costs and performance.

  • We have not really been able to find any samvera/library-archives peers using Heroku, so we wouldn’t be able to get advice and knowledge-sharing from them.

Existing functionality with specific known Heroku challanges

...