Journal of heroku investigations. Most recent entries on top? See also Heroku Consideration
Tuesday Oct 6
For future: Asset delivery
Needs to be investigated, heroku recommends CDN, we hadn’t accounted for that in cost or complexity of setup. https://github.com/sciencehistory/scihist_digicoll/issues/874
For future: Production vs Staging
While heroku has ways of creating production and staging environments, we aren’t going to worry about that for now, just working on getting a demo app up with a limited staging-like environment, following piece-by-piece plan from Monday.
For future: backups
Heroku postgres has it’s own built-in backups, including ability to easily rollback to previous point in time. Do we still want to do our own backups of postgres? Probably! But we should wrap our head the heroku backups and how they relate to ours, and update our documentation. https://github.com/sciencehistory/scihist_digicoll/issues/876
Software/configuration steps done
By “heroku dashboard” I mean the web GUI.
Install heroku CLI on my Mac
Run
heroku login
to auth heroku CLI on my local machineCreate scihist-digicoll app in heroku dashboard
Provision
heroku postgres
add-on. For now we’re going to do ahobby-basic
at $9/month, although this won’t be enough for production, we plan astandard-0
at $50/month eventually. https://elements.heroku.com/addons/heroku-postgresqlImport database from our staging instance to our heroku db (https://devcenter.heroku.com/articles/heroku-postgres-import-export)
Do a new export on staging, since heroku asks for a certain format
Tricky cause
pg_dump
doesn’t live on staging jobs server! Need to figure out how to ssh to database server maybe… ok can find it in EC2 console, and ssh there asjrochkind
. Now need to figure out how to connect to database… can’t find database backup cronjob on database server, what user does it run under? not in ansible… but managed topg_dump
using credentials from local_env.yml on staging.Per heroku instructions, we need to put it on a private S3 bucket. We’ll use chf-hydra-backup, file
digcol-for-heroku.dump
. (Pretty slow to upload from my local network, figuring out how to put it in private bucket from the database server itself is beyond me right now though)Having trouble getting a properly signed URL to that location! hackily reverse engineered from S3 console, not the right way, but getting me there.
Succesfully imported!
heroku pg:psql -a scihist-digicoll
drops me into PSQL console where I can see tables and data to confirm. Deleted extra backup from our S3 bucket.
Try to deploy app to heroku?
add heroku remote to my local git, in local git directory:
heroku git:remote -a scihist-digicoll
, verify what it did withgit remote -v
git push heroku
Asset compilation failed, “TypeError: No value was provided for
app_url_base
“. We need that local_env config value for asset compilation apparently? (Based on stack trace, cause in order to boot the app, it tries to look it up to setconfig.action_mailer.default_url_options
. We could make that be okay if the value isn’t present…). Anyway, we can add the config var in heroku dashboard to our current heroku non-custom url,APP_URL_BASE=https://scihist-digicoll.herokuapp.com/
, and try again.Failed again cause it needs a local_env solr_url value. I can see this is going to be a slow process of discovering additional ones, as it takes a couple minutes to fail each time. But we’ll try adding a heroku config
SOLR_URL=http://localhost/dummy/nonexisting
Mon Oct 5
Heroku has a LOT of docs, usually well-written. It is pretty well googled. Some heroku overview and getting started docs:
Intersting heroku add-on I noticed, rails-autoscale – instead of needing to build out as many dynos as we might need to handle maximum traffic or ingest, we can have the add-on scale up automatically with use. Works for both web dynos (with traffic), and background job dynos (when we do a big ingest, it can scale up more workers!). Does cost money, price based on how high you want it to be able to scale I think.
I think I will try to get our app on heroku piece by piece…
Get app deployed to heroku with postgres small web dyno only – no bg jobs yet, no solr yet. (Solr functions won’t work!)
Add in bg jobs – including heroku buildpacks with all the software they need (vips, imagemagick, ffmpeg, egc).
Add in solr – not sure whether to start by trying to have it connect to existing staging solr (which would require a heroku add-on for a static outgoing IP via SOCKS, so we could let it through our solr firewall, and/or other solr changed config), OR move right away to a SaaS solr – which would cost money, have to identify which one we need.
App substantially working at this point, but still lots of little pieces to get in place, such as nightly jobs, and various problem cases (out of memory for PDF generation etc).
For future: Infrastructure as code?
Deploying to Heroku involves configuring some things on the platform. For instance what I know about now includes mainly a list of config variables (such as what we have in our local_env.yml), and add-ons selected and their configuration.
You can do this in the heroku console, but I’m nervous about that living only inside heroku’s system. How do we get it in source code, “infrastructure as code”, as we always tried to do with ansible, having our infrastructure re-runnable from files on disk, not just living in live system? This isn’t something that needs to be solved now, but something I want to attend to as part of this process, ask around for what others are doing.
Looks like one solution might be using terraform with heroku, documented by heroku. To look into more later.
https://devcenter.heroku.com/articles/using-terraform-with-heroku
https://medium.com/rackbrains/manage-heroku-infrastructure-with-terraform-4a167b850300
https://github.com/sciencehistory/scihist_digicoll/issues/875