Manually executing heroku redis maintenance

You might get an email from heroku that begins:

Your database redis-word-number premium-1 (REDIS on scihist-digicoll-production) must undergo maintenance….

If you do nothing, the maintenance will happen automatically, usually about a week after you receive the email.

We use redis mainly for holding the queue for our background jobs. When the maintenance happens, there will be a very brief redis outage, in which the app can’t queue background jobs, and workers working background jobs can’t connect to redis to get jobs. The workers will report an error when this happens, but can (we think) recover and reconnect to redis when it’s back.

This isn’t a disaster, our app seems to recover from the redis outage fine. But if you’d like to take more control of the situation and do it manually, this might minimize chances of anything going wrong with background jobs related to ingest, and keep those errors out of our logs/error monitoring.

(See Heroku developer setup for instructions on setting up heroku command line, including with -r production configuration).

Disable staff access, so staff can’t trigger ingest with bg jobs that might not be able to be enqueued: heroku set LOGINS_DISABLED=true
Temporarily disable our hirefire auto-scaling manager, so it will allow our worker count to be scaled down to zero. Just toggle the “enabled” toggle for worker at https://manager.hirefire.io/
Scale down workers to zero: heroku ps:scale worker=0 -r production
Run the maintenance now per heroku instructions, on production, eg: heroku -r production redis:maintenance --run REDIS
Enable staff logins again: heroku set LOGINS_DISABLED=false
Scale workers back up to their default, probably 2 (if you get it wrong hirefire will fix it): heroku ps:scale worker=2 -r production
Turn on hirefire manager again at https://manager.hirefire.io/

While this process should avoid it – if you wound up with any redis-related errors in Honeybadger despite yourself, go “resolve” them.

While we turned off staff logins to avoid ingest background job enqueues while maintenance can happen, some user actions can still trigger background job enqueues, like asking for an “on-demand derivative”. If someone does this at just the wrong/right time, there could be an error. This probably won’t happen at our current level of traffic; if we wanted to avoid the chance absolutely, we’d have to disable the public-facing app too, which can be done with heroku maintenance:on -r production and heroku maintenance:off -r production