/
Operational troubleshooting notebook

Operational troubleshooting notebook

June 5 2024

3 H12 timeouts in last two days

During each, plenty of puma workers, we're not running out of puma workers

Each one has ordinary/fine dyne load (CPU), RAM usage, etc. 

Heroku metrics on response time show the 30s response time at each time period — that's the H12 error, heroku timing out after waiting 30 seconds for our app to return a response. 

So mystery is what these requests were and why they didn't return a response in under lasted 30 seconds, it's not too many requests/lack of puma workers, but particular long-running requests. Can we figure out what they were?

Ah, we have that info in the logged H12 errors, of course.

  • Jun 03 20:00:20 scihist-digicoll-production heroku/router: at=error code=H12 desc="Request timeout" method=GET path="/works/i3uubdg/viewer_images_info" host=digital.sciencehistory.org request_id=bbc866f6-d55a-4915-a85b-9d956d4cb063 fwd="40.77.188.137" dyno=web.1 connect=0ms service=30000ms status=503 bytes=0 protocol=https

    • Currently returns in 140ms, and it’s not a very long response

    • I don’t think this request even involves Solr

  • Jun 03 21:36:36 scihist-digicoll-production heroku/router: at=error code=H12 desc="Request timeout" method=GET path="/works/jd472x205/viewer_images_info" host=digital.sciencehistory.org request_id=07811788-12a6-4a74-a3a3-d7b8641cb0b8 fwd="35.150.180.134" dyno=web.1 connect=0ms service=30000ms status=503 bytes=0 protocol=https

    • Currently returns in 117ms

    • Same action as above, does not involve solr

    • Does not look in logs unusual in any way, ua claims to be mobile safari

  • Jun 04 00:03:43 scihist-digicoll-production heroku/router: at=error code=H12 desc="Request timeout" method=GET path="/collections/wh246s128?f%5Bdepartment_facet%5D%5B%5D=Archives&f%5Blanguage_facet%5D%5B%5D=English&f%5Bplace_facet%5D%5B%5D=California--Fullerton&f%5Bsubject_facet%5D%5B%5D=Beckman+Instruments%2C+Inc.&sort=newest_date" host=digital.sciencehistory.org request_id=f8a42ab8-fcc2-48d5-914a-957449560365 fwd="43.159.41.42" dyno=web.1 connect=0ms service=30000ms status=503 bytes=0 protocol=https

    • Does involve Solr, and somewhat complex solr query

    • But currently returns in 390ms

    • Logs definitely looks like this is a crawler scraping facet results (and doesn’t have a logged user agent), but it didn’t seem to overwhelm our app

 

Currently still a mystery why these requests timed out at 30 seconds!

Related content

Quick Operational Troubleshooting Cookbook
Quick Operational Troubleshooting Cookbook
More like this
Database maintenance on Heroku
Database maintenance on Heroku
More like this
Manually executing heroku redis maintenance
Manually executing heroku redis maintenance
More like this
Heroku Operational Components Overview
Heroku Operational Components Overview
More like this
Viewer is broken
Viewer is broken
More like this
On-demand PDF derivatives
On-demand PDF derivatives
More like this