Heroku Overview

https://devcenter.heroku.com/articles/dyno-types

Web workers

Runs what we think of as the actual “our app”. The more workers, the more traffic we can handle. (much of our traffic might be bots, such as googlebot. We like bots like googlebot though).

Current AWS:

Run on one t2.medium EC2, which is 4G of RAM, and 2 virtual cores
Running 10 web workers (passenger). Looking at passenger-stats, each workers RAM usage can range from 115M to 250M

Considerations

Would probably switch to use puma instead of passenger. Somewhat more standard also somewhat more flexible. One reason we weren’t using it is has slightly more management overhead we hadn’t figure dout, in a way that isn’t an issue on heroku. Puma lets you run multiple processes each of which has multiple threads, which can give you more total workers in less RAM. Although there are limitations, may allow us to get by with LESS RAM, but for now we’ll estimate similar RAM needs.

Estimated Heroku

To get to 4G RAM would be 4 “standard 2x” dynos, each of which has 1GB RAM and (maybe?) 4 vcpus. We could probably run 3-4 non-threaded workers in each, for 12-16 workers, actually more than we have now? This may be over-provisioned with capacity, but we could possibly use more capacity than we have on current AWS, and we’ll estimate this for now.

Production	Staging
4 standard-2X @ $50/month	1 standard-1X @ $25/month
1GB RAM * 4 == 4GB RAM	512MB RAM (1-2 workers)
4(?) cores each * 4 == 16(?) cores	4(?) cores
$200/month	$25/month

Background Job Workers

Run any slower “background” work, currently mainly ingests and on-demand expensive derivative creation.

Current AWS

One t2.large EC2 which is 8GB of RAM, and 2 virtual cores. Running 12 seperate jobs workers, some reserved for specialty queues.

Considerations

This is actually a pretty huge amount of RAM, expensive on heroku
One reason we needed this much RAM is that our “package all pages into a PDF” routine takes a lot of RAM that scales with number of pages – since there's not really an affordable way to get 8GB of RAM on one dyno on heroku anyway – we may just have to do development work to maintain the “all pages as PDF” feature regardless, figuring out how to make it use constant RAM, or eliminating the feature.
We would possibly also (eventually?) switch from resque to sidekiq, which, using multi-threaded concurrency, can in some cases have more workers in less RAM. (Very similar to passenger to puma switch)
But for now just for a rough estimate, we’ll estimate we really do need 8GB of RAM, split between various dynos

Estimated Heroku

8 standard-2x each with 1GB of RAM, we could probably run at least 16 workers, if not 24 or 32. We might still not be able to run our current all-page-PDF routine without figuring out how to make it use constant RAM. if we go tiny on staging, it will be very slow to ingest, is that ok? For now, we’ll say yes, but still do at least a standard-2X

Production	Staging
8 standard-2x @ $50/month	1 standard-2X @ $50/month
1GB RAM * 8 == 8 GB RAM	1GBGB of RAM. (2-4 workers, ingests will be slow.)
$400/month	$50/month

Postgres

(standard relational database, holds all our metadata)

Current AWS

Runs on a server that also runs redis, and is a t3a.small EC2, with 2GB RAM, and 2 virtual CPUs

SELECT pg_size_pretty( pg_database_size('digcol') ) => 635 MB

Seems to be configured for 100 maximum connections. When I try to figure out how many connections it currently has, seems to be only 4. (I’d expect at least one for every web worker and jobs worker, so 22, so curious. I am not an experienced postgres admin though).

Estimated Heroku

https://elements.heroku.com/addons/heroku-postgresql

Heroku postgres “Standard 0” seems just fine. 4GB RAM (at least 2x current). 64 GB storage capacity (10x our current use). 120 connection limit. (slightly more than our current postgres, should be plenty for our current use). No row limit. Can retain 25 backups for you and rollback to previous states for you. DB-level encryption at rest.

Smaller than Standard 0 is labelled “Hobby” and has limitations making it not great for production, I think it makes sense to spring for Standard 0 when it’s only $50/month.

However, beware if we do need to go up beyond Standard 0 (because we have more data and/or more traffic or more bg workers), the next price point is at $200/month. It would probably take a major change in our usage patterns (beyond the OH transcript storage), or significant increase in ingest rate, to run out of capacity within the next 5 years, but we eventually could.

For this one, it probably makes sense to run the same on staging and production, especially if it’s Standard 0.

Production	Staging
Postgres Standard 0	Postgres Standard 0
$50/month	$50/month

Redis

A faster store than postgres, generally used for temporary/transitory less-structured data. Currently we mainly (only?) use it for storing our background job queues. And make very little use of it.

Another common use would be for caches, including caching rendered HTML, or any other expensive to calculate values that make sense to cache for a certain amount of time. We might want to do that in the future, increasing our redis usage.

Current AWS

Runs on same server as DB, but is using so little resources currently barely worth mentioning. Current used_memory_human == 1.15M

Considerations

If we start putting a lot more things into bg job queue (perhaps to replace current long-running processes with a lot of smaller jobs in queue, see fixity check), it could significantly increase our redis needs.

For heroku redis plans priced in terms of number of connections, somewhere around our anticipated web workers plus bg workers (22-30?) is probably our need. Although depending on what happens if connection limit is exceeded, and if Rails apps and bg workers hold persistent connections or only take out a connection for a quick operation – perhaps we could get by with fewer connections. Unsure.

If we decide to use redis as a cache (not just for bg job queues), we might actually need a SECOND redis instance, and the ability to set the maxmemory-policy (auto-evict when memory capacity is exceeded, which you want for a cache, vs not for queue persistence).

Heroku Estimate

There are a variety of different vendors offering redis as heroku plugins; not sure what the differentiating factors are. We’ll start out looking at heroku’s own in-house redis offering, rather than the third parties. It’s possible other third-party heroku marketplace vendors could give us better price for what we need, but heroku’s seems fine for now.

Heroku redis has a maxmemory-policy default of “noeviction”, but is configurable.

There is a free level “Hobby Dev” with 25MB of RAM, which is enough for our current tiny 1.15M usage. However, the 20 connection limit is tight; and it does not offer persistence which is inappropriate for our bg queue usage.

The Premium 0 pricing level seems appropriate with 50MB of RAM, on-disk persistence, 40 connections. Only $15/month.

Production	Staging
Heroku redis Premium 0	Heroku redis Premium 0
$15/month	$15/month

Heroku investigation (DRAFT IN PROGRESS)

Heroku Overview

Web workers

Current AWS:

Considerations

Estimated Heroku

Background Job Workers

Current AWS

Considerations

Estimated Heroku

Postgres

Current AWS

Estimated Heroku

Redis

Current AWS

Considerations

Heroku Estimate

Solr