Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

We don’t currently really have “infrastructure-as-a-service” with our heroku setup, it’s just set up on the heroku system (and third-party systems) via GUI and/or CLI’s, there isn’t any kind of script to recreate our heroku setup from nothing.

...

It can also be listed and configured on the heroku command line with eg heroku ps and heroku ps:scale among others.

Web dynos

We run web dynos at a somewhat pricy “performance-m” size, not because we need that much RAM, but because we discovered the cheaper “standard” dynos had terrible performance characteristics leading to slow response times for our app even when not under load.

We run a single performance-m dyno normally – we may use auto-scaling to scale up under load, but keep in mind that if you are running two (or more) performance-m dyno’s for any length of time, a performance-l dyno costs the same as two performance-m’s but is much more powerful! (But there is no way to autoscale between dyno types instead of just count. 😞 In May 2024, we believed that the amount of traffic we were getting was regularly overloading this capacity, even if much of it was bots. We decided to upgrade to a single performance-l dyno (8 vCPUs), running 4 worker processes with 3 threads each.

(3 threads based on new Rails defaults, based on extensive investigation by Rails maintainers. More info on dyno sizing can be found at: https://mailchi.mp/railsspeed/how-many-ruby-processes-per-cpu-is-ideal?e=e9606cf04b . and https://github.com/rails/rails/issues/50450 Also Heroku docs, but I think those are not necessarily currently up to date best practices)

Within a dyno, the number of puma workers/threads is configured by the heroku config variables WEB_CONCURRENCY (number of worker processes) and RAILS_MAX_THREADS determine how many puma workers/threads are running (number of threads per process). These vars are conventional, and have effect because they are referenced in our heroku_puma.rb, which is itself referenced in our Procfile that heroku uses to define what the differnet dynos do. (We may consolidate puma_heroku.rb into the standard config/puma.rb in the future). Heroku docs for recommend two puma processes with five threads on a performance-m; in production on our performance-m. Jrochkind doesn’t tottally trust that and we are trying three worker processes (WEB_CONCURENCY=3) with three threads each (RAILS_MAX_THREADS=3), because we can afford the RAM and jrochkind feels like this might be preferable. (Previously tried WEB_CONCURRENCY=5 and RAILS_MAX_THREADS=2, but wondered if that was not helping with cpu contention under spikes) config/puma.rb in the future).

Prior to May 2024, we originally started running one somewhat pricy “performance-m” size (2.5 GB RAM, 2 vCPUs), not because we need that much RAM, but because we discovered the cheaper “standard” dynos had terrible performance characteristics leading to slow response times for our app even when not under load. By May 2024 we were running 2 worker processes with 4 threads each – seemed to be the best performance profile that fit into RAM.

Worker dynos

The performance problems with standard size heroku downloads aren’t really an issue for our asynchronous background jobs, so worker dynos use standard-2x size.

...

  • Delete all failed jobs in the rescue admin pages.

  • Make a rake task to enqueue all the jobs to the special_jobs queue.

    • (lightbulb) The task should be smart enough to skip items that have already been processed. That way, you can interrupt the task at any time, fix any problems, and run it again later without having to worry.

    • (lightbulb) Make sure you have an easy way to run the task on individual items manually from the admin pages or the console.

    • (lightbulb) The job that the task calls should print the IDs of any entities it’s working on to the Heroku logs.

    • (lightbulb) It’s very helpful to be able to enqueue a limited number of items and run them first, before embarking on the full run. For instance you could add an extra boolean argument only_do_10 (defaulting to false ) and add a variation on:

      Code Block
      scope = scope[1..10] if only_do_10
  • Test the rake task in staging with only_do_10 set to true.

  • Run the rake task in production but only_do_10 for a trial run.

  • Spin up a single special_jobs dyno and watch it process 10 items.

  • Run the rake task in production.

  • The jobs are now in the special_jobs queue, but no work will actually start until you spin up dedicated dynos.

  • 2 workers per special_jobs dyno is our default, which works nicely with standard-2x dynos, but if you want, try setting SPECIAL_JOB_WORKER_COUNT env variable to 3.

  • Our redis setup is capped at 80 connections, so be careful running more than 10 special_jobs dynos at once. You may want to monitor the redis statistics during the job.

  • Manually spin up a set of special_worker dynos of whatever type you want at Heroku's "resources" page for the application. Heroku will alert you to the cost. (10 standard-2x dynos cost roughly $1 per hour, for instance; with the worker count set to two, you’ll see up to 20 items being processed simultaneously).

  • Monitor the progress of the resulting workers. Work goes much faster than you are used to, so pay careful attention to:

  • (lightbulb) If there are errors in any of the jobs, you can retry the jobs in the Rescue pages, or rerun them from the console.

  • Monitor the number of jobs still pending in the special_jobs queue. When that number goes to zero, it means the work will complete soon and you should start getting ready to turn off the dynos. It does NOT mean the work is complete, however!

  • When all the workers in the special_jobs queue complete their jobs and are idle:

    • (lightbulb) rake scihist:resque:prune_expired_workers will get rid of any expired workers, if needed

    • Set the number of special_workerdynos back to zero.

    • Remove the special_jobs queue from the resque pages.

...