Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Responding to an error report because you are on-call, and need some ideas for how to get started with some quick actions? We got you.

Table of Contents

To execute any of these on staging instead of production replace -a scihist-digicoll-production with -a scihist-digicoll-staging.

Check on status of heroku dynos

...

heroku ps -a scihist-digicoll-production

(Note if a restart/redeploy just happened, and you have heroku preboot on, you may not be seeing status of dynos actually serving requests. 😞 See “disable preboot” below.)

Look at our logs

Consolidated app logs are avilable on heroku dashboard, on “resources” tab, click on “papertrail” add-on at bottom to get a nice web GUI for our logs, that also lets you search.

Also in addition to general logs, we have errors specifically monitored by http://honeybadger.io , each person has their own individual login.

Disable preboot

The heroku preboot feature makes it possible for us to do zero-downtime deploys, but also really complicates visibility/introspection into heroku dynos, and quick-response to making changes like dyno restarts and redeploys.

If you are troubleshooting an already problematic/downtime situation, it might make sense to turn off preboot to make things more straightforward:

heroku features:disable preboot -a scihist-digicoll-production

You can enable it again with enable instead of disable. You can see if it’s enabled with heroku features -a scihist-digicoll-production

Is heroku itself having problems? Or are other platforms we use?

...

Look at release history with heroku CLI:

heroku releases -a scihist-digicoll-production

Failed releases will be in red. With the id from the left-most column, you can look at specific log output (mainly of our custom release phase) for the failed or successful release, eg:

heroku releases:output v323 -a scihist-digicoll-production

You can also see some limited release status info in the Web GUI on the Activity tab.

...

Code Block
heroku ps:restart worker -a scihist-digicoll-production
heroku ps:restart web -a scihist-digicoll-production
heroku ps:restart worker.2 -a scihist-digicoll-production

Note: It’s not clear to me how often this restarting heroku dynos will actually fix a problem, and in some cases it could cause a less stable state, if for instance heroku is having problems.

Note: If heroku “preboot” is on, it can take 3+ minutes for restart to actually take effect. See “disable preboot” above.

Restart solr on Searchstax

  1. Login to searchstax

    1. Use shared credentials stored in our credential spot on the P:\ drive

  2. Click on the instance you want to restart (scihist_digicoll (production), or scihist-digicoll-staging)

  3. At bottom of page there is a single node listed (our plan only has one node), you can click “stop solr”, and then “Start solr”

...

No worries, just login to http://hirefire.io (we each have our own login), and you can click the “enable” toggle on or off next to each autoscale worker, right on the initial dashboard. (We may only have one worker).

Note: If you turn off auto-scaling when workers are scaled up, they will probably stay scaled up! Look at the minimum scale value (2, as I write this), you may want to scale down to that manually after turning off auto-scaling:

Code Block
# how many workers are there?
$ heroku ps worker -a scihist-digicoll-production

# set em back to two
$ heroku ps:scale worker=2 -a scihist-digicoll-production

Put entire app into maintenance mode

...

In heroku CLI , run heroku maintenance:on -a scihist-digicoll-production and heroku maintenance:off -a scihist-digicoll-production

(Note: Right now, this is just a generic heroku maintenance message. It is possible to customize/brand this page, we may get to that eventually. https://github.com/sciencehistory/scihist_digicoll/issues/1201 See more on our custom maintenance page configuration at Heroku custom maintenance page )

Disable staff logins

We can effectively make the app “read-only” but still available to the public by disabling staff logins. So we don’t have a public facing outage, but if we’re dealing with some kind of data corruption issue we’re trying to diagnose, we might want to ‘freeze’ staff out.

In heroku config vars on section of the heroku dashboard settings tab, just set LOGINS_DISABLED to true.

...

Restore postgres database from backups

See separate page.