We don’t currently really have “infrastructure-as-a-service” with our heroku setup, it’s just set up on the heroku system (and third-party systems) via GUI and/or CLI’s, there isn’t any kind of script to recreate our heroku setup from nothing.
...
Delete all failed jobs in the rescue admin pages.
Make a rake task to enqueue all the jobs to the
special_jobs
queue.The task should be smart enough to skip items that have already been processed. That way, you can interrupt the task at any time, fix any problems, and run it again later without having to worry.
Make sure you have an easy way to run the task on individual items manually from the admin pages or the console.
The job that the task calls should print the IDs of any entities it’s working on to the Heroku logs.
It’s very helpful to be able to enqueue a limited number of items and run them first, before embarking on the full run. For instance you could add an extra boolean argument
only_do_10
(defaulting tofalse
) and add a variation on:Code Block scope = scope[1..10] if only_do_10
Test the rake task in staging with
only_do_10
set to true.Run the rake task in production but
only_do_10
for a trial run.Spin up a single
special_jobs
dyno and watch it process 10 items.Run the rake task in production.
The jobs are now in the
special_jobs
queue, but no work will actually start until you spin up dedicated dynos.2 workers per
special_jobs
dyno is our default, which works nicely withstandard-2x
dynos, but if you want, try settingSPECIAL_JOB_WORKER_COUNT
env variable to 3.Our redis setup is capped at 80 connections, so be careful running more than 10
special_jobs
dynos at once. You may want to monitor the redis statistics during the job.Manually spin up a set of
special_worker
dynos of whatever type you want at Heroku's "resources" page for the application. Heroku will alert you to the cost. (10standard-2x
dynos cost roughly $1 per hour, for instance; with the worker count set to two, you’ll see up to 20 items being processed simultaneously).Monitor the progress of the resulting workers. Work goes much faster than you are used to, so pay careful attention to:
the Papertrail logs
the redis statistics for the app in Heroku (go to the resource page then click “Heroku data for redis”.
If there are errors in any of the jobs, you can retry the jobs in the Rescue pages, or rerun them from the console.
Monitor the number of jobs still pending in the
special_jobs
queue. When that number goes to zero, it means the work will complete soon and you should start getting ready to turn off the dynos. It does NOT mean the work is complete, however!When all the workers in the
special_jobs
queue complete their jobs and are idle:rake scihist:resque:prune_expired_workers
will get rid of any expired workers, if neededSet the number of
special_worker
dynos back to zero.Remove the
special_jobs
queue from the resque pages.
...
To separate logs into router and non-router files, resulting in smaller and more readable files:
Code Block |
---|
mdirmkdir router mkdir nonrouter ls *.tsv | gawk '{ print "grep -v 'heroku/router' " $1 " > nonrouter/" $1 }' | bash ls *.tsv | gawk '{ print "grep 'heroku/router' " $1 " > router/" $1 }' | bash |
...