Heroku Proposal

January 2020.

Heroku is a “Platform As a Service” (PaaS) product, which we could deploy our Digital Collections application on, replacing most current use of manually-managed AWS services.

Currently, we deploy to “self-managed” AWS (Amazon Web Services) resources. The current approach is not long-term sustainable with the termination of the Systems Administrator position who took care of managing the AWS resources.

Heroku is one option for paying for a more expensive platform that will do for us some of the things the Systems Administrator position used to do. Heroku is also a significant cost. (estimated at around $980/month for our needs, but possibly anywhere between $660-$1240/month). Heroku will replace our existing AWS EC2 costs - eliminating approximately $150/month of AWS charges.

(More background at previous Heroku Consideration document, that led us to spend more time examining Heroku as a solution).

The main trade-off of Heroku is:

Elimination of operations and maintenance tasks formerly handled by systems administrator position, for
Significant budget, and in some cases increased challenge to remaining developers

After some months spent examining and experimenting with heroku, although it is not perfect, the tech team thinks it makes sense to try heroku, if we can afford it, as a solution to the unsustainable absence of the terminated Systems Administrator position.

In the rest of this document are a list of high-level pro’s and con’s of heroku; itemized details of how we arrived at cost estimate; and a list of other possible solutions to explore as alternatives to heroku.

Responses to this proposal could include: budget and directives to go forward with migration to heroku; directive to instead devote additional time to exploring other non-heroku alternatives; hold off for now, stick with current infrastructure and return to app development instead of focusing on infrastructure (not sustainable long-term, but could be for a time-limited period, although involves some risk).

Pro

Eliminate need for local skillset and time to manage AWS resources directly. This is the main motivation for heroku. We would no longer need to maintain the ansible playbook at all. We would no longer need the “management” server that we have now trying to orchestrate our self-managed AWS. We would still be managing our S3 buckets directly in AWS, and maybe some other minor simple services, like SES for email delivery – much simpler AWS products.
Heroku is a popular and high-quality product. It’s feature set, developer UI, documentation tend to be very good, and we found this true in our testing. It’s been around for a while, has many customers (often startup/ecommerce), specifically including other Rails apps, and is generally considered a top-of-the-line service.
Easier scaling. With heroku we can – if willing to pay for the resources – easily scale up infrastructure to handle traffic spikes, or high ingest loads. This was something we hadn’t had the capacity to create reliably in our current infrastructure, and example of what heroku gives you.
No-contract, pro-rated billing. Heroku has no contract and all billing is pro-rated to the minute. If we ever choose to shift our infrastructure yet again, we can switch at any time. Some additional components we will need, as part of a move to PaaS managed infrastructure, are available as heroku add-ons, meaning convenient single-heroku-invoice billing for third-part services with no-contract pro-rated billing.

Con

Cost. This is the main trade-off. Heroku is known to be a high-quality solution but one of the most expensive ways to deploy. We estimate the monthly cost for our application will be in the range of $660-$1240/month – probably near the middle around $980. But if some our guesses or assumptions prove wrong, it could move towards the ends of that estimate. This is mostly on top of our existing infrastructure costs – it will replace only our existing digital collections EC2 costs, approximately $150/month. However, this is still significantly less expensive than an FTE systems administrator.
Some loss of flexibility/customization. While heroku is a very flexible platform, it is still a platform with constraints. Some customization may take more developer work to accomplish, or be impossible.
- Example: We need to install the vips software package, and it turns out needed the most recent version to avoid a bug affecting us. We had more trouble getting this done on heroku, although ultimately succeeded. There could be other packages we want in the future we have trouble or are unable to get installed. Another way to look at this is that heroku doesn’t entirely get us out of sysadmin skillset/tasks, when we need custom software installed.
- Example: Heroku has less flexiblity in how much RAM you have available, fewer configurations. Some of our existing functionality was using too much RAM to fit in certain heroku dyno sizes; we ultimately were able to refactor it to fit, but this will not always be guaranteed, and shows added development challenges are possible in heroku (trade-off for reduced operations challenges).
Some developer tasks take longer to execute.
- Opening up a “console” on heroku infrastructure – a common developer task – can take 50-60 seconds of staring at the screen waiting for boot. (Compare to ~5 seconds in current infrastructure). This interrupts developer “flow” and is just kind of annoying.
- Same applies to executing maintenance tasks like a Solr reindex – developer spends ~50s staring at screen waiting for start.
- deploying new code takes longer. to get a code change out after pushing button to do so, on heroku can take 3-10 minutes. On current infrastruture, only 1-2 minutes. Also heroku can involve either more downtime on deploy, or increased developer work to mitigate.
Loss of infrastructure-as-code. Currently theoretically our entire infrastructure can be re-created from zero from source code that is checked into our git repository. This gives transparency and change-history and reproducibility of our architecture. (“theoretically”, I bet we’d have trouble going from 0 to 100, but still a lot/most of it is in code).
- The basic simple easy way to use heroku, we end up losing this. Much of the configuration is done via manual web UI on heroku (and add-ons), and isn’t captured in code in source control.
- So, recreating it, figuring out what’s there, copying it, is a manual operation, involving looking through various screens in heroku and add-on UI to see what’s configured — assuming you still have access to it!
- While there are potentially ways to use heroku to mitigate this, upon analysis they mostly look like they significantly increase the energy we’d need to put into heroku, without even getting us to 100% infrastruture-as-code. While we might explore some later, for now we think it makes sense to consider this just a trade-off, one of the things we lose with heroku. Heroku is simple enough, that we think it will be ok, with some mitigation from a small amount of manually maintained documentation. See more at https://github.com/sciencehistory/scihist_digicoll/issues/875
Some sense that heroku is stagnating a bit. There haven’t been new significant new heroku features in a while, or changes to some parameters (dyno sizes) that seem out-dated to us. We have a sense heroku’s current owner (salesforce) may be resting on the service they got, and we should not expect any new features or fixes or changes. However, the current system is pretty reliable and stable and relatively feature-complete, so we can handle that, at least for now.

Cost Estimate Detail

Component	Low-end/month	High-end
Web workers Production: Low-end, 2 standard-2x dynos, with-auto-scaling up so we’ll add on a bit for that. (not recommended, let’s call it $120/month). A single performance-m dyno with auto-scale to 2 ($250/month maybe), a performance-l ($500/month) Staging: 1 standard-2x dyno, at $50 This is deciding that staging will be slower than production – can always temporarily increase it if we need to for testing things, and pay per-minute.	$170	$550
Background workers Production: 2 standard-2x ($100) with auto-scaling up. Hard to predict actual auto-scaling charges, let’s call it $150 total. Staging: 1 standard-2x ($50) Maybe with auto-scaling? Should we call it $80 on the high-end?	$200	$230
Database Heroku postgres standard-0 ($50), 2x for production and staging.	$100	$100
Redis One heroku redis `premium-0` ($15), 2x for production+staging Future: We might want a second redis premium-0 to use for caching.	$30	$60
Solr SearchStax. Either an NDN1 ($20/month with annual contract) or NDN2 ($40 with annual contract). x2 for production and staging.	$40	$80
Autoscaling hirefire.io. $15/month, or $25/month with “overclocking”. Does not include minutes of scaled dynos, just the autoscaler itself. Don’t plan to use autoscaling on staging.	$15	$25
Logging Papertrail, not entirely clear what tier we need for our traffic, we’re guessing the $16 or $30 plan.	$16	$30
Sub-total	$571	$1075
Add 15% margin Hard to estimate costs including: one-off dynos charged per-minute for scheduled tasks, developer console, rake tasks. Minutes in use of auto-scaled dynos beyond minimums. Just things we didn’t account for, put in some wiggle-room.	$85	$161

TOTAL:	$656	$1236

Alternatives to Heroku

We do not think the current situation, our infrastructure as we have it now but with no systems administrator position, is sustainable.

However Heroku isn’t the only alternative that could require less systems administration/operations work so be sustainable without a systems administrator/operations position. There are other possible routes we could take, here are some of potential interest. Most of them will be less expensive (sometimes very) but require more hands-on management than heroku (sometimes very).

hatchbox.io. Basically a Rails-focused service that sets up your AWS infrastructure for you, including postgres database, redis, Rails web and background workers (with horizontal-scaling and load-balancing). Also has a built-in system for distributing your config/env vars to all machines, like heroku.
- You still have access to the underlying AWS, it just sets it up for you.
- We’d still have to do Solr separatel, probably still with SearchStax.
- Unclear how mature/complete it is, there are some things we’d need to do that it’s not clear to me how it would support, there would definitely be more manual hands-on than heroku, and probably require some custom solutions.
- But it is much less expensive than heroku, $50-$100/month on top of the AWS resources you pay for yourself.
another new heroku-esque alternative I haven’t explored much, but looks pretty good: render.com
- And another I’ve heard of, http://fly.io , not sure it looks as foolproof as render.com
Stay on AWS, but with higher-level “managed” services.
- We were using AWS in a very “basic” way, raw EC2 instances we installed things on, for instance install postgres ourselves on an EC2 as if it were our machine. Instead, amazon has a bunch of managed services, for instance RDS Postgres or ElasticCache Redis, where you just get a postgres.
- Instead of raw EC2 for compute (web and dyno), we could try some higher-level Amazon offerings, such as “Elastic Compute” to give us built-in scaling and load-balancing.
- Or better yet, probably “serverless” option like lambda, for which there are now solutions to run fairly standard Rails apps on. (Does require us to get into Docker I think!)
- Or maybe involve AWS Fargate?
- Price would probably be only a bit more than present; much more hands-on, someone would still need to devote significant time to operations, but it would at a higher level with many fewer details to be responsible for than current.
Stay on current infrastructure BUT with more support from part-time/contract help
- Look maybe in our library tech communities for person or vendor that could contract with us on a retainer or as-needed basis to help us do operations things as they come up
- Might start with a larger project to help us re-design/re-architect our current setup to be based on more contemporary best-practices and be easier to manage
Some complicated docker/kubernetes based solution, using a hosted kubernetes platform.
- This is very trendy right now, but I don’t know much about it. Ostensibly would take care of many operational concerns, but I suspect it would end up requirement more time investment than in utopian portrayals, beginning with investment to understand what it really is and how to set up our app for it.