Heroku Consideration
Heroku Overview
The way we currently use AWS EC2 is sometimes called “Infrastructure as a Service” – we don’t deal with any hardware at all, but all the software from the OS up is completely our responsibility.
Heroku is at a “higher” level (further removed from the actual computing resources), sometimes called “Platform as a Service”.
When you deploy to Heroku, you don’t deal with OS installs or upgrades or security patches or creating user accounts, you just can tell Heroku “Here is an app I want to deploy”, as well as ask for the quantity of resources you need for deployment (CPU, processors, etc), expressed in terms of a Heroku abstraction called a “dyno”, which is a kind of “container”, available in different sizes.
The “standard” dynos actually run on shared infrastructure, other customers may be on the same hardware/VM as you. Which means actual CPU power can be variable. You have to pay a serious premium for exclusive “peformance” dynos, that we don’t plan to do.
In addition to sparing you the OS-level management of your dynos, heroku provides additional “plumbing”, to for instance support different components easily finding each other within your heroku deploy, or load-balancing a horizontally-scaled web app (you don’t have to, and essentially can’t, run your own load balancer as you’d have to on EC2 directly).
What if you need custom software installed on heroku running on a dyno? For us, that might mean things like imagemagick and vips. Heroku has come a long way in supporting this via a feature they call “buildpacks”. There are buildpacks contributed by the community for many common packages; but this means they are maintained by the community not heroku, so may be less reliable to keep working. You can combine multiple buildpacks to get whatever software you need. There is even a buildpack for “apt” that will hypothetically then let you install any apt-get package; not sure how well it works.
You can deploy whatever you want to a heroku dyno, but they do not have persistent file system, so you can’t deploy something like a database (postgres, solr, redis). To deal with this, or just to get a “commodity” service at an even higher level of abstraction, you generally combine your heroku dynos (containing your custom special to you application code) with various supporting servers delivered as “Software as a Service”. (This amazon page with simple descriptions of IaaS vs PaaS vs SaaS may be useful; it can also sometimes be a continuum rather than discrete categories. https://aws.amazon.com/types-of-cloud-computing/)
Many such SaaS provisions are available as heroku “add-ons”, which gives you unified billing, and integration into the Heroku “plumbing” that lets everything integrate seamlessly. However, there’s nothing stopping you from a “hybrid” deploy with some things in heroku universe integrated with other resources either from third-party non-heroku SaaS, or your own installs on EC2 or otherwise. Heroku actually runs on AWS region us-east-1 itself, so should have fast and cheap network access to other things running there. (need to double-check this is reliable to assume it will continue to be true, to integrate with our data we plan to leave in S3). However, I think a lot of the benefit of Heroku is getting out of the “systems administration” business entirely; having even just one thing still running on EC2 requires you to still have that skillset in-house to some extent, whereas the greatest value of heroku is arguably in not having to have that skillset in house at all.
Heroku Benefits
You just have to do a LOT LESS, heroku takes care of a lot of the “lower level” things. Not just hardware like EC2, but a lot of plumbing and software infrastructure that isn’t really special to our application.
With our “systems administrator” position eliminated, this is essentially a means of “out-sourcing” that task to Heroku. Not exactly literally, but kind of equivalently, by putting it on a different sort of platform where “systems administration” isn’t something we have to do. While we’ll still have to pay attention to “operations”, it will be over the long term fewer hours, and using tools which are within software developers competencies, rather than requiring systems administrator competencies.
Heroku makes it super easy to scale out your resources to a higher level to handle temporary or permanent usage bumps.
Heroku is a mature, well-established, and well-regarded offering. The things we’d be doing ourselves on EC2 “IaaS” heroku will probably do better and more reliably than we could.
Heroku Challenges
Heroku is expensive. You definitely pay a lot more than you would for equivalent resources form EC2 “IaaS” – you are paying for them to manage a lot for you, and for their track-record in doing so well.
This applies not just to initial expense, but to scaling as you need more resources because of more usage or functionality. The slope of the increase of expense as resource needs increase will be a lot steeper with heroku. (Does not apply to scaling bytes of assets stored though, since we plan to leave those on S3).
Heroku can be more limited. Since it’s at a higher-level of abstraction, there is somewhat less flexibility. You sometimes have to do things in different ways to work well and/or affordably and/or at all in the heroku environment, or might find things that are just challenges.
Web workers (dynos)
Runs what we think of as the actual “our app”. The more workers, the more traffic we can handle. (much of our traffic might be bots, such as googlebot. We like bots like googlebot though).
Current AWS:
Run on one t2.medium EC2, which is 4G of RAM, and 2 virtual cores
Running 10 web workers (passenger). Looking at passenger-stats, each workers RAM usage can range from 115M to 250M
Considerations
Would probably switch to use puma instead of passenger. Somewhat more standard also somewhat more flexible. One reason we weren’t using it is has slightly more management overhead we hadn’t figure dout, in a way that isn’t an issue on heroku. Puma lets you run multiple processes each of which has multiple threads, which can give you more total workers in less RAM. Although there are limitations, may allow us to get by with LESS RAM, but for now we’ll estimate similar RAM needs.
Estimated Heroku
To get to 4G RAM would be 4 “standard 2x” dynos, each of which has 1GB RAM and (maybe?) 4 vcpus. We could probably run 3-4 non-threaded workers in each, for 12-16 workers, actually more than we have now? This may be over-provisioned with capacity, but we could possibly use more capacity than we have on current AWS, and we’ll estimate this for now.
Production | Staging |
---|---|
4 standard-2X @ $50/month | 1 standard-1X @ $25/month |
1GB RAM * 4 == 4GB RAM | 512MB RAM (1-2 workers) |
4(?) cores each * 4 == 16(?) cores | 4(?) cores |
$200/month | $25/month |
Background Job Workers (dynos)
Run any slower “background” work, currently mainly ingests and on-demand expensive derivative creation.
Current AWS
One t2.large EC2 which is 8GB of RAM, and 2 virtual cores. Running 12 seperate jobs workers, some reserved for specialty queues.
Considerations
This is actually a pretty huge amount of RAM, expensive on heroku
One reason we needed this much RAM is that our “package all pages into a PDF” routine takes a lot of RAM that scales with number of pages – since there's not really an affordable way to get 8GB of RAM on one dyno on heroku anyway – we may just have to do development work to maintain the “all pages as PDF” feature regardless, figuring out how to make it use constant RAM, or eliminating the feature.
We would possibly also (eventually?) switch from resque to sidekiq, which, using multi-threaded concurrency, can in some cases have more workers in less RAM. (Very similar to passenger to puma switch)
But for now just for a rough estimate, we’ll estimate we really do need 8GB of RAM, split between various dynos
Estimated Heroku
8 standard-2x each with 1GB of RAM, we could probably run at least 16 workers, if not 24 or 32. We might still not be able to run our current all-page-PDF routine without figuring out how to make it use constant RAM. if we go tiny on staging, it will be very slow to ingest, is that ok? For now, we’ll say yes, but still do at least a standard-2X
Production | Staging |
---|---|
8 standard-2x @ $50/month | 1 standard-2X @ $50/month |
1GB RAM * 8 == 8 GB RAM | 1GBGB of RAM. (2-4 workers, ingests will be slow.) |
$400/month | $50/month |
Postgres
(standard relational database, holds all our metadata)
Current AWS
Runs on a server that also runs redis, and is a t3a.small EC2, with 2GB RAM, and 2 virtual CPUs
SELECT pg_size_pretty( pg_database_size('digcol') ) => 635 MB
Seems to be configured for 100 maximum connections. When I try to figure out how many connections it currently has, seems to be only 4. (I’d expect at least one for every web worker and jobs worker, so 22, so curious. I am not an experienced postgres admin though).
Estimated Heroku
https://elements.heroku.com/addons/heroku-postgresql
Heroku postgres “Standard 0” seems just fine. 4GB RAM (at least 2x current). 64 GB storage capacity (10x our current use). 120 connection limit. (slightly more than our current postgres, should be plenty for our current use). No row limit. Can retain 25 backups for you and rollback to previous states for you. DB-level encryption at rest.
Smaller than Standard 0 is labelled “Hobby” and has limitations making it not great for production, I think it makes sense to spring for Standard 0 when it’s only $50/month.
However, beware if we do need to go up beyond Standard 0 (because we have more data and/or more traffic or more bg workers), the next price point is at $200/month. It would probably take a major change in our usage patterns (beyond the OH transcript storage), or significant increase in ingest rate, to run out of capacity within the next 5 years, but we eventually could.
For this one, it probably makes sense to run the same on staging and production, especially if it’s Standard 0.
Production | Staging |
---|---|
Postgres Standard 0 | Postgres Standard 0 |
$50/month | $50/month |
Redis
A faster store than postgres, generally used for temporary/transitory less-structured data. Currently we mainly (only?) use it for storing our background job queues. And make very little use of it.
Another common use would be for caches, including caching rendered HTML, or any other expensive to calculate values that make sense to cache for a certain amount of time. We might want to do that in the future, increasing our redis usage.
Current AWS
Runs on same server as DB, but is using so little resources currently barely worth mentioning. Current used_memory_human
== 1.15M
Considerations
If we start putting a lot more things into bg job queue (perhaps to replace current long-running processes with a lot of smaller jobs in queue, see fixity check), it could significantly increase our redis needs.
For heroku redis plans priced in terms of number of connections, somewhere around our anticipated web workers plus bg workers (22-30?) is probably our need. Although depending on what happens if connection limit is exceeded, and if Rails apps and bg workers hold persistent connections or only take out a connection for a quick operation – perhaps we could get by with fewer connections. Unsure.
If we decide to use redis as a cache (not just for bg job queues), we might actually need a SECOND redis instance, and the ability to set the maxmemory-policy
(auto-evict when memory capacity is exceeded, which you want for a cache, vs not for queue persistence).
Heroku Estimate
There are a variety of different vendors offering redis as heroku plugins; not sure what the differentiating factors are. We’ll start out looking at heroku’s own in-house redis offering, rather than the third parties. It’s possible other third-party heroku marketplace vendors could give us better price for what we need, but heroku’s seems fine for now.
Heroku redis has a maxmemory-policy default of “noeviction”, but is configurable.
There is a free level “Hobby Dev” with 25MB of RAM, which is enough for our current tiny 1.15M usage. However, the 20 connection limit is tight; and it does not offer persistence which is inappropriate for our bg queue usage.
The Premium 0
pricing level seems appropriate with 50MB of RAM, on-disk persistence, 40 connections. Only $15/month.
Production | Staging |
---|---|
Heroku redis Premium 0 | Heroku redis Premium 0 |
$15/month | $15/month |
Solr
The “search engine” that gives us a “google-like” search, powers the search functionality.
Need to verify we have as much customization of schema.xml (and solrconfig.xml?) as we need.
Current AWS
Runs on AWS t3a.small, which is 2GB RAM, and 2 virtual CPUs
Has 8000 docs (maxdocs 10,000). Basically scales with number of Works we have.
Listed as 70MB size
Trying to figure out how many solr queries we do per day or hour, since some solr heroku add-ons are metered requests… I’m not sure we’re logging the right stuff and am not sure I’m interpreting the logs right, but I am going to say somewhere under 10K requests per day.
update: I don’t think that was right. looking at production app logs, and all requests for /catalog , /focus , or /collection pages (each of which triggers a solr query), on Tuesday November 17th… we get around 35K requests for the day. This is maybe a subset of all solr requests, but gives us an order of magnitude for now?
We’d basically have to spend a bunch of time figuring out/changing our logging to be able ot really answer this question (this is why we want to get out of sysadmin business!)
Heroku Estimate
There is one plugin for Solr available on heroku marketplace, from third-party vendor “WebSolr”. The “Standard Small” plan seems to be sufficient for us.
1 million document limit => plenty
40K Daily requests => should be enough, although not an order of magnitude extra, about right
Storage capacity 1GB – we are only using 70MB currently (including transcript full text indexing, no problem.
“5 concurrent connection” limit could be a problem.
$59/month, next step up is a big step to $189 (150K requests/day, 10 concurrent connections) or $299/month (250K requests/day, 15 concurrent requests)
Production | Staging |
---|---|
WebSolr “Standard Small” | WebSolr “Standard Small” |
$69/month | $69/month |
Considerations
I am concerned that the “concurrent request limit” of “5” in “Standard small” might be a problem, considering updates/reindexes as well as searches are concurrent requests. I don’t entirely understand how this is metered, and what happens if it’s exceeded
WebSolr seems to suggest custom logic to queue/throttle solr queries, which would be something we’d have to develop additionally in our app and could be a pain. https://docs.websolr.com/article/178-http-429-too-many-requests
Doing lots of bulk operations might make lots of requests per day, since our app is set up to do a solr update on every ‘save’.
Larger plans can get expensive quickly. “Standard Medium” with 150K daily requests and 10 concurrent requests (which still seems small to me) is $189/month
If we ended up with anything running on raw AWS instead of heroku, Solr would probably be the first thing. It’s what I’m most worried about being able to run affordably on heroku.
ElasticSearch is a competitor to Solr; it has many more heroku plugin offerings from multiple vendors at differnet price points, which are not typically metered by concurrent connections (the most troublesome meter in WebSolr plans). But getting our app to run on ElasticSearch instead of Solr would take significant development (we are not as familiar with it; it is unclear to what extent Blacklight supports it, or we’d need to develop new stuff on top of or instead of Blacklight; we are definitely using some Blacklight plugins like date_range_limit that are Solr-only).
There might be other “Solr as a service” offerings that aren’t Heroku plugins but still don’t require us to run our own server at the OS. That could be cheaper. But unclear. Definitely not as many as ElasticSearch. The only one I can find is opensolr, which does seem cheaper… but they meter you for bandwidth, which I’m not sure how to predict how much we’ll be using. https://opensolr.com/pricing We could talk to a salesperson.
Scheduled Tasks
We have just a handful of nightly scheduled tasks. They need a means of scheduling them, and also will take “dyno” resources to run.
There is a free heroku scheduler; there is also an Advanced Heroku Scheduler with more features. Haven’t totally figured out when/why you’d need the advanced one, but just to be safe let’s budget it.
Production | Staging |
---|---|
Advanced Scheduler Premium 0 | Advanced Scheduler Premium 0 |
$15/month | $15/month |
We also have to pay for dyno hours for actually executing scheduled tasks. We have a couple that take only a few minutes to run and probably aren’t even worth accounting for (create Google SiteMap, clear some blacklight tables of old data).
The main issue is the Fixity Check code. We run nightly Fixity Checks, which usually take 4.5 hours to complete. (They could be slower on heroku hardware).
It’s unclear if we can get away with a 4.5 hour single job on heroku, heroku does not like long-running jobs like this, it can restart your dynos at any time. But it could work.
This time will go up linearly as we ingest more into collection
We try to fixity check everything weekly, if we switch to (eg) monthly could be 1/4th as much time fixity checking (May have to adjust S3 version retention too, to make sure we’re keeping things for more than a month)
Could switch from a single batch job to using or BG Job Queue job-per-check, to not have long-running tasks so heroku restarts don’t interfere, and could use one worker from BG Worker infrastructure.
One way or another, we’ll have to pay for resources: compute time, possibly larger redis to hold queue. Let’s budget a full standard-1x up 24 hours, although this could be more than we need (prob don’t need that much compute), or less than we need (if we need to upgrade redis to hold larger queue could cost more than this). We may not also really need to do this in staging, but we always have before.
Production | Staging |
---|---|
standard-1x dyno for fixity compute | standard-1x dyno for fixity compute |
$15/month | $15/month |
Additional Heroku services?
We have sometimes been constrained by only having one “staging” server. Heroku would make it very easy to bring up transitory per-feature-being-demo’d servers when we have multiple independendent things we want to demo. But we’d have to pay for the dyno’s.
Especially if we did that, we might want to use something like the heroku “auto-idle” add-on to automatically shut down staging/demo dynos when not in use, so we wouldn’t need to remember to do it manually to save money.
When we ramp up email-delivery to end users for Oral Histories, we may want to choose to use a Heroku plug-in for email delivery.
We currently have a gap in monitoring of our app. There are some heroku-specific monitoring concerns. We may want to choose a heroku monitoring plugin; there are some sophisticated ones, but they aren’t cheap. (could end up paying $50-$200/month).
As the app continues to be developed, we may realize we need other kinds of heroku add-ons or other SaaS offerings to deal with new functionality.
We are not currently including any of these possible “Additional Heroku services” in our cost estimate.
Total Heroku Cost Estimate
Monthly
Category | Production | Staging |
---|---|---|
Web Workers | $200 | $25 |
Background Job Workers | $400 | $50 |
Postgres | $50 | $50 |
Redis | $15 | $15 |
Solr | $69 | $69 |
Scheduled Tasks (very guess) | $15 | $15 |
Total | $749 | $224 |
For investigating feasibility and beginning to develop out the infrastructure, the staging infrastructure alone could keep us going for a few months, at which point we could re-evaluate, so estimated $224/month for that period.
Total staging + production, estimated at $973/month.
Heroku costs are all pay as you go, with (I think?) no pre-pay discounts. Some resources can be turned on and off and any time and are billed by the minute, can even be turned off at night or whatever to save money, although that takes some extra work. (Definitely applies to dynos, does not really apply to postgres realistically, not sure about solr or redis).
Some Heroku Challenges/Risks/Concerns
While heroku gets us out of the systems administrator business, developers will need to spend time on “operations”, especially to plan and implement migration but also ongoing.
As our usage changes (volume, or functionality), heroku costs could increase at a steep slope, we might not be prepared for.
Heroku plug-ins and SaaS offerings can be usage-metered in ways running our own on raw EC2 are not, for instance in terms of number of connections or number of requests per time period. This kind of metering tries to give you the “right” price for your “size” of use, but our usage patterns could be a weird match for how they are trying to price things, leading to unaffordable pricing.
The SaaS Solr offerings in particular are kind of expensive and potentially metered in ways that will be a problem for us. We might end up wanting to still run Solr on our own EC2, meaning we’d still need to have in-house or out-sourced systems administration competencies to some extent.
Might need to rewrite/redesign some parts of our app to work better/more affordably on heroku infrastructure – or could find some things we aren’t yet predicting to be simply infeasible.
Our ingest process is very CPU-intensive. (File analysis, derivative and DZI creation). This may not be a good fit for the shared infrastructure of heroku “standard” dynos? Is it possible heroku will get mad at us for using “too much CPU” for sustained periods? I don’t think so? But we may find it slower than we expect/current?
See more below
We require some custom software for media analysis/conversion (imagemagick, vips, mediainfo, etc). It should be possible to get these installed on heroku dynos using custom “buildpacks”, but if they are maintained by third-parties as open source they may be less reliable, or may require us to get into the systems administration task of “getting packages compiled/installed” after all.
Need to make sure our heroku deploy will reliably remain on AWS us-east-1, because if heroku were to move it, it would deleteriously effect our S3 access costs and performance.
We have not really been able to find any samvera/library-archives peers using Heroku, so we wouldn’t be able to get advice and knowledge-sharing from them.
Existing functionality with specific known Heroku challanges
Our routine to create a combined PDF of all pages in a scanned book uses an excessive amount of RAM which will be impossible (or infeasibly expensive) on heroku. We would need to find different tools/usage that can create this PDF in constant RAM regardless of number of pages, or eliminate this functionality.
File downloads and uploads can not be directly to the app in heroku, because of 30 second max timeout on requests. Currently we do have direct-to-S3 uploads and downloads, but we’d be locked into that (or some other non-heroku process, such as a CDN for downloads).
This makes access-control implementation options more limited. We were considering proxying file downloads directly through our app and/or nginx for access control, but would have to rule out that option (unless we hosted an nginx direct on EC2 ourselves, which we probably don’t have the current in-house expertise to set up).
Current fixity check routine is a long-running process, which doesn’t work great on heroku. Probably have to divide it into separate bg ActiveJobs, which might require bigger redis and/or more/more complex background worker setup – might involve some additional heroku expenses.
Could be other things we aren’t even thinking of yet that we run into, in initial implementation, or even down the line a year or two now as we add functionality.
Reasons that moving from sufia/hyrax to local more-standard-Rails app made this more feasible
No fedora. If we still had a fedora, it would still need to be direct EC2 hosted, no way to do it in heroku ecosystem.
Our sufia app handled file uploads and downloads via the app itself, which is not tenable on heroku, rather than direct-to-S3 uploads and downloads.
Solr usage is much more limited in new app, fewer objects and attributes in solr, really just used for searching, which should help keep Solr SaaS more affordable.
Generally more efficient use of CPU in new app will help keep heroku more affordable, relatively.
Costs which would remain on AWS even with Heroku deploy
We plan to leave all our file storage on S3, so existing S3 costs will be unchanged. (Staging and production).
I don’t think after a heroku migration we would have any remaining EC2 use for digital collections. Unless there’s something I'm not thinking of. Or we decided eg we were unhappy with Solr SaaS offerings, and wanted to leave Solr running on our own EC2 in a hybrid infrastructure.
We might still choose to use AWS SES for email delivery services.
Any non-Digital Collections use of AWS would of course be unchanged by a migration of Digital Collections to Heroku. (ArchiveSpace, etc).
I am not certain how to understand our current AWS billing to estimate Digital Collections AWS costs that would remain. (Recall that we get some free AWS credit annually, and pay for some AWS resources with a year contract in advance).