Journal of heroku investigations. Most recent entries on top? See also Heroku Consideration

Tuesday Oct 6

For future: Asset delivery

Needs to be investigated, heroku recommends CDN, we hadn’t accounted for that in cost or complexity of setup. https://github.com/sciencehistory/scihist_digicoll/issues/874

For future: Production vs Staging

While heroku has ways of creating production and staging environments, we aren’t going to worry about that for now, just working on getting a demo app up with a limited staging-like environment, following piece-by-piece plan from Monday.

For future: backups

Heroku postgres has it’s own built-in backups, including ability to easily rollback to previous point in time. Do we still want to do our own backups of postgres? Probably! But we should wrap our head the heroku backups and how they relate to ours, and update our documentation. https://github.com/sciencehistory/scihist_digicoll/issues/876

Software/configuration steps done

By “heroku dashboard” I mean the web GUI.

Install heroku CLI on my Mac
Run heroku login to auth heroku CLI on my local machine
Create scihist-digicoll app in heroku dashboard
Provision heroku postgresadd-on. For now we’re going to do a hobby-basic at $9/month, although this won’t be enough for production, we plan a standard-0 at $50/month eventually. https://elements.heroku.com/addons/heroku-postgresql
Import database from our staging instance to our heroku db (https://devcenter.heroku.com/articles/heroku-postgres-import-export)
1. Do a new export on staging, since heroku asks for a certain format
  1. Tricky cause pg_dump doesn’t live on staging jobs server! Need to figure out how to ssh to database server maybe… ok can find it in EC2 console, and ssh there as jrochkind. Now need to figure out how to connect to database… can’t find database backup cronjob on database server, what user does it run under? not in ansible… but managed to pg_dump using credentials from local_env.yml on staging.
  2. Per heroku instructions, we need to put it on a private S3 bucket. We’ll use chf-hydra-backup, file digcol-for-heroku.dump. (Pretty slow to upload from my local network, figuring out how to put it in private bucket from the database server itself is beyond me right now though)
    1. Having trouble getting a properly signed URL to that location! hackily reverse engineered from S3 console, not the right way, but getting me there.
  3. Succesfully imported! heroku pg:psql -a scihist-digicoll drops me into PSQL console where I can see tables and data to confirm. Deleted extra backup from our S3 bucket.
Try to deploy app to heroku?
1. add heroku remote to my local git, in local git directory: heroku git:remote -a scihist-digicoll, verify what it did with git remote -v
2. git push heroku
  1. Asset compilation failed, “TypeError: No value was provided for app_url_base “. We need that local_env config value for asset compilation apparently? (Based on stack trace, cause in order to boot the app, it tries to look it up to set config.action_mailer.default_url_options. We could make that be okay if the value isn’t present…). Anyway, we can add the config var in heroku dashboard to our current heroku non-custom url, APP_URL_BASE=https://scihist-digicoll.herokuapp.com/, and try again.
  2. Failed again cause it needs a local_env solr_url value. I can see this is going to be a slow process of discovering additional ones, as it takes a couple minutes to fail each time. But we’ll try adding a heroku config SOLR_URL=http://localhost/dummy/nonexisting
  3. “Lockbox master key is missing in production.” – there’s a lot of ENV we need just to get assets to compile! Try LOCKBOX_MASTER_KEY=000000000000000000000000000000000000000000000000000000000000000
  4. OK, now it’s complaining about missing bucket names. We should just go copy ALL config vars from staging local_env.yml over. Anything sensitive we will replace with dummy values. Basic pattern is eg s3_bucket_originals in local_env.yml turns into S3_BUCKET_ORIGINALS in heroku config, to be picked up as ENV by our Env class.
  5. Got it deployed but with an error! have to figure out how to access logs to see what error was… console doesn’t have enough lines!
    1. `heroku logs -n 1000`
    2. Looks like Rails app can’t connect to postgres. Error may look like: /app/vendor/bundle/ruby/2.6.0/gems/activerecord-6.0.3.3/lib/active_record/connection_adapters/postgresql_adapter.rb:49:in `include?': no implicit conversion of nil into String (TypeError)`
    3. We may not be supplying config properly to get heroku postgres, need to look into it more. Yep. https://github.com/sciencehistory/scihist_digicoll/pull/880
  6. DEPLOYED!!! No bg jobs, so solr, so much not working, but basic app! https://scihist-digicoll.herokuapp.com/

Mon Oct 5

Heroku has a LOT of docs, usually well-written. It is pretty well googled. Some heroku overview and getting started docs:

Intersting heroku add-on I noticed, rails-autoscale – instead of needing to build out as many dynos as we might need to handle maximum traffic or ingest, we can have the add-on scale up automatically with use. Works for both web dynos (with traffic), and background job dynos (when we do a big ingest, it can scale up more workers!). Does cost money, price based on how high you want it to be able to scale I think.

I think I will try to get our app on heroku piece by piece…

Get app deployed to heroku with postgres small web dyno only – no bg jobs yet, no solr yet. (Solr functions won’t work!)
Add in bg jobs – including heroku buildpacks with all the software they need (vips, imagemagick, ffmpeg, egc).
Add in solr – not sure whether to start by trying to have it connect to existing staging solr (which would require a heroku add-on for a static outgoing IP via SOCKS, so we could let it through our solr firewall, and/or other solr changed config), OR move right away to a SaaS solr – which would cost money, have to identify which one we need.
App substantially working at this point, but still lots of little pieces to get in place, such as nightly jobs, and various problem cases (out of memory for PDF generation etc).

For future: Infrastructure as code?

Deploying to Heroku involves configuring some things on the platform. For instance what I know about now includes mainly a list of config variables (such as what we have in our local_env.yml), and add-ons selected and their configuration.

You can do this in the heroku console, but I’m nervous about that living only inside heroku’s system. How do we get it in source code, “infrastructure as code”, as we always tried to do with ansible, having our infrastructure re-runnable from files on disk, not just living in live system? This isn’t something that needs to be solved now, but something I want to attend to as part of this process, ask around for what others are doing.

Looks like one solution might be using terraform with heroku, documented by heroku. To look into more later.

https://github.com/sciencehistory/scihist_digicoll/issues/875

jrochkind Heroku Journal