Responding to an error report because you are on-call, and need some ideas for how to get started with some quick actions? We got you.
Check on status of heroku dynos
Using heroku CLI, run:
heroku ps -a scihist-digicoll-production
(Note if a restart/redeploy just happened, and you have heroku preboot on, you may not be seeing status of dynos actually serving requests. đ See âdisable prebootâ below.)
Look at our logs
Consolidated app logs are avilable on heroku dashboard, on âresourcesâ tab, click on âpapertrailâ add-on at bottom to get a nice web GUI for our logs, that also lets you search.
There are ways to set up command-line access to papertrail logs too. https://github.com/papertrail/papertrail-heroku-plugin https://github.com/papertrail/papertrail-cli
Even without papertrail you can use heroku cli
heroku logs
to look at running trail of current heroku logs, but the display isnât nearly as nice as papertrail, and includes more noise that papertrail filters out.
Also in addition to general logs, we have errors specifically monitored by http://honeybadger.io , each person has their own individual login.
Disable preboot
The heroku preboot feature makes it possible for us to do zero-downtime deploys, but also really complicates visibility/introspection into heroku dynos, and quick-response to making changes like dyno restarts and redeploys.
If you are troubleshooting an already problematic/downtime situation, it might make sense to turn off preboot to make things more straightforward:
heroku features:disable preboot -a scihist-digicoll-production
You can enable it again with enable
instead of disable
. You can see if itâs enabled with heroku features -a scihist-digicoll-production
Is heroku itself having problems? Or are other platforms we use?
https://status.hirefire.io/ status
searchstax status: https://status.searchstax.com/
AWS status (notoriously underreports problems though): https://status.aws.amazon.com/
Check heroku release activity
If a heroku tried to do a release but failed, you may be in a confusing situation where you arenât using the version of code/config you think you are. Heroku releases (which may fail) can be triggered not only by pushing new versions of code, but by config variable changes, and in some cases add-on changes.
Look at release history with heroku CLI:
heroku releases
Failed releases will be in red. With the id from the left-most column, you can look at specific log output (mainly of our custom release phase) for the failed or successful release, eg:
heroku releases:output v323
You can also see some limited release status info in the Web GUI on the Activity tab.
Restart heroku dynos
From heroku web GUI, you can restart all dynos from the âMoreâ menu in top right navbar, choose ârestart all dynosâ.
Using the heroku CLI, you can restart only web or only workers, or even a specific dyno.
heroku ps:restart worker -a scihist-digicoll-production heroku ps:restart web -a scihist-digicoll-production heroku ps:restart worker.2 -a scihist-digicoll-production
Note: Itâs not clear to me how often this restarting heroku dynos will actually fix a problem, and in some cases it could cause a less stable state, if for instance heroku is having problems.
Note: If heroku âprebootâ is on, it can take 3+ minutes for restart to actually take effect. See âdisable prebootâ above.
Restart solr on Searchstax
Login to searchstax
Use shared credentials stored in our credential spot
Click on the instance you want to restart (
scihist_digicoll
(production), orscihist-digicoll-staging)
At bottom of page there is a single node listed (our plan only has one node), you can click âstop solrâ, and then âStart solrâ
note: restarting solr will result in the app having some downtime/generating errors while itâs restarting, if it is up and accessible during restart!
Disable autoscaling
We use http://hirefire.io for autoscaling our worker dynos (maybe in future web dynos). Has it gone crazy and you need to just disable it?
No worries, just login to http://hirefire.io (we each have our own login), and you can click the âenableâ toggle on or off next to each autoscale worker, right on the initial dashboard. (We may only have one worker).
Note: If you turn off auto-scaling when workers are scaled up, they will probably stay scaled up! Look at the minimum scale value (2, as I write this), you may want to scale down to that manually after turning off auto-scaling:
# how many workers are there? $ heroku ps worker -a scihist-digicoll-production # set em back to two $ heroku ps:scale worker=2 -a scihist-digicoll-production
Put entire app into maintenance mode
Disable our app, it wonât be accessible to anyone, but theyâll get a nice maintainance message.
In heroku web GUI, go to âsettingsâ tab, scroll down to âMaintenance modeâ section, toggle switch.
In heroku CLI , run heroku maintenance:on -a scihist-digicoll-production
and heroku maintenance:off -a scihist-digicoll-production
(See more on our custom maintenance page configuration at Heroku custom maintenance page )
Disable staff logins
We can effectively make the app âread-onlyâ but still available to the public by disabling staff logins. So we donât have a public facing outage, but if weâre dealing with some kind of data corruption issue weâre trying to diagnose, we might want to âfreezeâ staff out.
In heroku config vars on heroku dashboard settings tab, just set LOGINS_DISABLED
to true
.
Set to false
or remove the config var entirely to restore staff logins.
Reindex solr
If search is weird, our Solr index may have gotten out of sync. Fortunately, we can (re-)build a new Solr index in only a couple minutes. Using the heroku CLI to run our rake tasks:
heroku run rake scihist:solr:reindex scihist:solr:delete_orphans -a scihist-digicoll-production
if this results in an error that makes you think the searchstax solr is not properly set up, you could try:
heroku run rake scihist:solr_cloud:create_collection -a scihist-digicoll-production
. (That should not do any harm in any case, it might just complain telling you âcollection already existsâheroku run rake scihist:solr_cloud:sync_configset -a scihist-digicoll-production
And see also restarting Searchstax Solr above.
Restore postgres database from backups
See separate page.