Manually executing heroku redis maintenance
Caution: redis:maintenance
will be deprecated soon, so the instructions on this page will have to be updated once better documentation is available.
Data Maintenance CLI Plugin Commands | Heroku Dev Center
states:These [data maintenance] commands include and expand on the functionality of the older pg:maintenance and redis:maintenance commands. Heroku plans to deprecate those commands in favor of the commands in this plugin.
The notes atUpgrading a Heroku Key-Value Store Version | Heroku Dev Center may be helpful until Heroku updates their documentation.
You might get an email from heroku that begins:
Your database redis-word-number premium-1 (REDIS on scihist-digicoll-production) must undergo maintenance….
If you do nothing, the maintenance will happen automatically, usually about a week after you receive the email.
We use redis mainly for holding the queue for our background jobs. When the maintenance happens, there will be a very brief redis outage, in which the app can’t queue background jobs, and workers working background jobs can’t connect to redis to get jobs. The workers will report an error when this happens, but can (we think) recover and reconnect to redis when it’s back.
This isn’t a disaster, our app seems to recover from the redis outage fine. But if you’d like to take more control of the situation and do it manually, this might minimize chances of anything going wrong with background jobs related to ingest, and keep those errors out of our logs/error monitoring.
(See Heroku developer setup for instructions on setting up heroku command line, including with -r production
configuration).
Temporarily disable our hirefire auto-scaling manager, so it will allow our worker count to be scaled down to zero. Just toggle the “enabled” toggle for
worker
at https://manager.hirefire.io/Scale down workers to zero:
heroku ps:scale worker=0 -r production
Wait until worker is actually scaled down, you can see with
heroku ps -r production
Disable staff access, so staff can’t trigger ingest with bg jobs that might not be able to be enqueued:
heroku config:set LOGINS_DISABLED=true -r production
Because we use Heroku preboot, it can take 2-3 minutes to take effect, check to make sure you are really locked out of staff UI?
Run the maintenance now per heroku instructions, on production, eg:
heroku redis:maintenance --run REDIS --force -r production
When it’s finished, you should get an email, you can also check on status with
heroku redis:info -r production
Enable staff logins again:
heroku config:set LOGINS_DISABLED=false -r production
Scale workers back up to their default, probably 2 (if you get it wrong hirefire will fix it):
heroku ps:scale worker=2 -r production
Turn on hirefire manager again at https://manager.hirefire.io/
Because of heroku preboot, it could take 2-3 minutes for staff logins to be enabled again. Don’t leave until you confirm they are!
While this process should avoid it – if you wound up with any redis-related errors in Honeybadger despite yourself, go “resolve” them.
While we turned off staff logins to avoid ingest background job enqueues while maintenance can happen, some user actions can still trigger background job enqueues, like asking for an “on-demand derivative”. If someone does this at just the wrong/right time, there could be an error. This probably won’t happen at our current level of traffic; if we wanted to avoid the chance absolutely, we’d have to disable the public-facing app too, which can be done with heroku maintenance:on -r production
and heroku maintenance:off -r production