We don’t currently really have “infrastructure-as-a-service” with our heroku setup, it’s just set up on the heroku system (and third-party systems) via GUI and/or CLI’s, there isn’t any kind of script to recreate our heroku setup from nothing.
...
Delete all failed jobs in the rescue admin pages.
Make a rake task to enqueue all the jobs to the
special_jobs
queue.The task should be smart enough to skip items that have already been processed. That way, you can interrupt the task at any time, fix any problems, and run it again later without having to worry.
Make sure you have an easy way to run the task on individual items manually from the admin pages or the console.
The job that the task calls should print the IDs of any entities it’s working on to the Heroku logs.
It’s very helpful to be able to enqueue a limited number of items and run them first, before embarking on the full run. For instance you could add an extra boolean argument
only_do_10
(defaulting tofalse
) and add a variation on:Code Block scope = scope[1..10] if only_do_10
Test the rake task in staging with
only_do_10
set to true.Run the rake task in production but
only_do_10
for a trial run.Spin up a single
special_jobs
dyno and watch it process 10 items.Run the rake task in production.
The jobs are now in the
special_jobs
queue, but no work will actually start until you spin up dedicated dynos.2 workers per
special_jobs
dyno is our default, which works nicely withstandard-2x
dynos, but if you want, try settingSPECIAL_JOB_WORKER_COUNT
env variable to 3.Our redis setup is capped at 80 connections, so be careful running more than 10 Max
special_jobs
dynos at oncewill be limited by the smaller of max postgres connections and max redis connections, including connections in use by web workers. Currently we have 500 max redis connections, and 120 max postgres connections. You may want to monitor the redis statistics during the job.Manually spin up a set of
special_worker
dynos of whatever type you want at Heroku's "resources" page for the application. Heroku will alert you to the cost. (10standard-2x
dynos cost roughly $1 per hour, for instance; with the worker count set to two, you’ll see up to 20 items being processed simultaneously).Monitor the progress of the resulting workers. Work goes much faster than you are used to, so pay careful attention to:
the Papertrail logs
the redis statistics for the app in Heroku (go to the resource page then click “Heroku data for redis”.
If there are errors in any of the jobs, you can retry the jobs in the Rescue pages, or rerun them from the console.
Monitor the number of jobs still pending in the
special_jobs
queue. When that number goes to zero, it means the work will complete soon and you should start getting ready to turn off the dynos. It does NOT mean the work is complete, however!When all the workers in the
special_jobs
queue complete their jobs and are idle:rake scihist:resque:prune_expired_workers
will get rid of any expired workers, if neededSet the number of
special_worker
dynos back to zero.Remove the
special_jobs
queue from the resque pages.
...
Heroku has a list of key/values that are provided to app, called “config vars”. They can be seen and set in the Web GUI under the settings tab, or via the heroku command line heroku config
, heroku config:set
, heroku config:get
etc.
Note:
Some config variables We need to disable the heroku nodejs buildpack from “pruning development dependencies”, because our rails setup needs our dev dependencies (such as vite) at asset:precompile time, at which they would otherwise be gone. See vite-ruby docs and heroku docs. To do this we set
config:set YARN_PRODUCTION=false
Note:
Some config variables are set by heroku itself/heroku add-ons, such as
DATABASE_URL
(set by postgres add-on), andREDIS_URL
(set by Redis add-on). They should not be . They should not be edited manually. Unfortunately there is no completely clear documentation of which is which.Some config variables include sensitive information such as passwords. If you do a
heroku config
to list them all, you should be careful where you put/store them, if anywhere.
...
The Heroku node.js buildpack, heroku-buildpack-nodejs. See this ticket for context.
https://github.com/brandoncc/heroku-buildpack-vips to install the libvips image procesing library/command-line
An
apt
buildpack at https://buildpack-registry.s3.amazonaws.com/buildpacks/heroku-community/apt.tgz, which provides for installation viaapt-get
of multiple packages specified in an Aptfile in our repo. We install several other image/media tools this way.(we could not get vips sucessfully installed that way, is why we used a seperate buildpack for that)
For tesseract (OCR) too, see Installing tesseract
We need
ffmpeg
and had a lot of trouble getting it built on heroku! Didn’t work via apt, didn’t find a buildpack that worked and gave us a recent ffmpeg version. Until we discovered that sinceffmpeg
is a requirement of Railsactivestorage
'spreview
functionality this heroku-maintained one gave us ffmpeg:https://github.com/heroku/heroku-buildpack-activestorage-preview
That buildpack is mentioned, along with mentioning it installs ffmpeg, at: https://devcenter.heroku.com/articles/active-storage-on-heroku
We don’t actually use activestorage or its preview feature, just use this buildpack to get ffmpeg installed.
If looking for an alternative in the future, you could try: https://github.com/jonathanong/heroku-buildpack-ffmpeg-latest (we haven’t tried that yet)
Buildpack to get
exiftool
CLI installed – installs the most recent exiftool available on every build, unless we configure for specific version. https://github.com/fnando/heroku-buildpack-exiftoolvelizarn/heroku-buildpack-exiftoolrequires heroku config
EXIFTOOL_URL_CUSTOM
to be set to URL with .tar.gz of linux exiftool source, such ashttps://exiftool.org/Image-ExifTool-12.76.tar.gz
`exiftool source url can be easily found from from https://exiftool.org/ , may make sense to update now and then
We previously tried using a buildpack that tried to find most recent exiftool source release automatically from exiftool RSS feed, but it was fragile.
The standard heroku python buildpack, so we can install python dependencies from
requirements.txt
. (Initiallyimg2pdf
). It is first, so the ruby one will be “primary”. https://www.codementor.io/@inanc/how-to-run-python-and-ruby-on-heroku-with-multiple-buildpacks-kgy6g3b1e
...
Heroku postgres (an rdbms) (the
standard-0
size plan is enough for our needs)Note: Does our postgres plan offer enough connections for our web and worker dynos? See this handy tool to calculate.
Heroku Stackhero redis (redis is a key/value store used for our bg job queue)
We are currently using
premium-1
plan – our StackHero redis through heroku marketplace, their smallest $20/plan. Our redis needs are modest, but we seemed want enough redis connections to be able to have lots of temporary bg workers without running out of redis connetions on hirefire autoscale up of workers with the premium-0 planconnections, and at 500 connections this plan means postgres is the connection bottleneck not redis.Note that “not enough connections” error in redis can actually show up as
OpenSSL::SSL::SSLError
we are pretty sure. https://github.com/redis/redis-rb/issues/980The numbers don’t quite add up for this, I think resque_pool may be temporarily using too many connections or something. But for now we just pay for premium-1 ($30/month)
Memcached via the Memcached Cloud add-on
Used for Rails.cache in general – the main thing we are using Rails.cache for initially is for rack-attack to track rate limits. Now that we have a cache store, we may use Rails.cache for other things.
In staging, we currently have a free memcached add-on; we could also just NOT have it in staging if the free one becomes unavailable.
In production we still have a pretty small memcached cloud plan, if we’re only using it for rack-attack we don’t need hardly anything.
Heroku scheduler (used to schedule nightly jobs; free, although you pay for job minutes).
Papertrail – used for keeping heroku unified log history with a good UX. (otherwise from heroku you only get the most recent 1500 log lines, and not a very good UX for viewing them!). We aren’t sure what size papertrail plan we’ll end up needing for our actual log volume.
Heroku’s own “deployhooks” plugin used to notify honeybadger to track deploys. https://docs.honeybadger.io/lib/ruby/getting-started/tracking-deployments.html#heroku-deployment-tracking and https://github.com/sciencehistory/scihist_digicoll/issues/878
...