Journal of heroku investigations. Most recent entries on top? See also Heroku Consideration

Wed Oct

...

14

The RAM and CPU resource issues are concerning.

Why does an instance seem to take even more RAM on heroku than on our EC2?
Why are slow actions even so much slower on heroku than on our EC2?

Things we might investigate:

Use heroku https://devcenter.heroku.com/articles/log-runtime-metrics experimental add-on to get more precise logging of our RAM use over time as we trigger actions.
Try passenger on heroku instead of puma, to compare apples to applies
Try the heroku buildpack for jemalloc and compiling ruby with that, which some people say makes ruby use RAM better. (We didn’t do that in our EC2 though). https://elements.heroku.com/buildpacks/gaffneyc/heroku-buildpack-jemalloc
Try a heroku standard-0 postgres and standard-1x dyno to be using actual resources we will be using, in case the ’hobby' ones we are using to test have different performance characteristics
- Dyno standard-1x can easily be temporarily turned on and off, but db will probably stay there at $50/month
Actually analyze and try to optimize our app, RAM usage and performance
- Make fixity report run on cronjob and give you stored results instead of running when you click on it
- Make many-child pages use “infinite scroll” technique to only load first X and load more when you scroll down, instead of trying to load all at once
- More efficient production of each child page element on work pages (hard-code URLs etc)
- Use derailed gem to figure out what parts are using so much RAM and fix them https://github.com/schneems/derailed_benchmarks
While we can probably optimize our app, the fact that we weren’t forced to on manual EC2 but will on heroku worries us that we’re raising the skill level and time needed to maintain a working app on heroku? (actually already HAVE spent time optimizing app now, but apparently not yet good enough for heroku?)

RAM measure investigations

Using heroku log-runtime-metrics, confirm that our 1-worker-with-two-threads puma instance is starting at 316MB.

After just accessing home page, it’s up to 346.74MB
Accessing 115-child work ysnh5if, it’s up to 375MB, a few more times 386MB, then 392MB
Accessing ramelli it’s up to 444MB, a couple more times 493MB, then 511MB!!!

We may have a memory leak or bad memory behavior – but why isn’t it effecting us on passenger on our manual EC2s?

Wait, may be bad on passenger too! And yet it works on our EC2…

To measure on passenger, ssh to ubuntu@ staging web server,

run sudo passenger-memory-stats1.
run sudo PASSENGER_INSTANCE_REGISTRY_DIR=/opt/scihist_digicoll/shared passenger-status

passenger-memory-stats on web is showing instance VMSize from 536MB to 738MB. Has something happened to raise our memory usage since last time we looked? And why isn’t this machine swapping horribly? But it also says Total private dirty RSS: 463.93 MB, maybe the “Private” value matters more than the “VMSize” value… but not on heroku that measures actual VMSize? (passenger-status shows only 200M and down, they show different things – neither may be what heroku measures, but they are working okay on our raw EC2….

...

For RAM comparison, on our current EC2 production, after being up for some time, passenger reports this memory use:

Code Block

------ Passenger processes ------
PID    VMSize     Private   Name
---------------------------------
18108  299.2 MB   2.0 MB    Passenger watchdog
18114  1082.6 MB  5.3 MB    Passenger core
18139  30.4 MB    0.4 MB    /usr/local/lib/ruby/gems/2.6.0/gems/passenger-5.3.7/buildout/support-binaries/PassengerAgent temp-dir-toucher /tmp/passenger-standalone.11jhb2e --cleanup --daemonize --pid-file /tmp/passenger-standalone.11jhb2e/temp_dir_toucher.pid --log-file /opt/scihist_digicoll/shared/passenger.log --user digcol --nginx-pid 18123
18187  958.4 MB   340.2 MB  Passenger AppPreloader: /opt/scihist_digicoll/current (forking...)
18206  873.4 MB   281.1 MB  Passenger AppPreloader: /opt/scihist_digicoll/current (forking...)
18225  738.5 MB   197.3 MB  Passenger AppPreloader: /opt/scihist_digicoll/current (forking...)
18244  736.6 MB   160.8 MB  Passenger AppPreloader: /opt/scihist_digicoll/current (forking...)
18261  736.7 MB   158.1 MB  Passenger AppPreloader: /opt/scihist_digicoll/current (forking...)
18278  736.8 MB   169.9 MB  Passenger AppPreloader: /opt/scihist_digicoll/current (forking...)
18295  736.9 MB   163.0 MB  Passenger AppPreloader: /opt/scihist_digicoll/current (forking...)
18312  737.0 MB   169.8 MB  Passenger AppPreloader: /opt/scihist_digicoll/current (forking...)
18329  737.1 MB   163.2 MB  Passenger AppPreloader: /opt/scihist_digicoll/current (forking...)
18346  737.2 MB   162.4 MB  Passenger AppPreloader: /opt/scihist_digicoll/current (forking...)

So actually it’s true that the Private RSS was getting up to 340MB, although after more use. One difference is that on heroku it seems to balloon up memory quicker. But I may have under-estimated our RAM use – although it still isn’t the 400-500MB+ that we’re seeing in heroku.

An app with the work show page almost entirely disabled is at sample#memory_total=277.77MB sample#memory_rss=269.82MB

We might be able to get under 300 by making the work/show page avoid loading all children at once with an “infinite scroll” technique. This would also take care of our slowest pages. Pages we are trying that are NOT large-membered-work-show seem to currently on ‘standard’ rather than ‘hobby’ resources be loading at similar times to current EC2, we think? Fixity report 3s on heroku compared to 3.5s on EC2, so actually faster on heroku?

If we limit to only 50 children on a page, ramelli loads from heroku in about 2.6s (yeah, still slow), and takes RAM: sample#memory_total=447.76MB sample#memory_rss=399.51MB gah why is this still so much!! – I guess the way we did it we still loaded all children into memory but just didn’t display them, let’s chagne that…. after a few loads, still up to sample#memory_total=466.50MB sample#memory_rss=398.28MB gahhhh.

If we limit to the 5 child work, we get a more reasonable sample#memory_total=321.20MB sample#memory_rss=252.28MB… to compare, let’s slice ramelli to actually 5 children… it’s still taking more than 2 seconds to return (what’s it doing?), but is sample#memory_total=395.11MB sample#memory_rss=326.82MB , ok i guess?

without actual member display code, and limited to 5…. sample#memory_total=391.79MB sample#memory_rss=323.62MB … about the same… aha, it’s partially our viewer_images_info taking up all the memory, that one still has full list. (but doesn’t explain why the page load time is so slow) Just curl… no, still slow still same memory.

A moment to look at speed again

Yes, even with standard-2x and standard pg, ramelli is taking 4-6s on heroku, compared to 2-2.5s on our current EC2. 😞 Smaller 115-item work goes from 0.5-0.6s on EC2 to ~0.9-1.2S on heroku, what.

RAM how many threads can we get away with

RAILS_MAX_THREADSour puma config pays attention to that heroku config env, making it easy to switch.

One worker 5 threads on a standard-1x (512MB) dyno – we exceeded memory capacity repeatedly requesting ramelli. 😞

three threads – yep, still exceeded quota.

two threads? seems to be okay, but pushing it! We wouldn’t want our app to expand it’s wasteland any further. sample#memory_total=497.73MB sample#memory_rss=473.06MB

going back to one thread for a consistent baseline for exploring how changes effect memory.

Monday Oct 12/Tuesday Oct 13

Moving database to standard-0 ($50/month), and web dyno to standard-1x ($25/month), just to make sure we’re using production resources, although I don’t expect it to make a difference (hobby pg and dyno we were using ought to be just as fast), but just to rule it out.
- Ramelli is coming back in 3 to 4 seconds, with no apparent spikes to 6 or 7, so… better? If still double the reliable 2 seconds on our EC2 situation.
- RAM still super problematic, sample#memory_total=511.92MB sample#memory_rss=500.29MB
- fixity report page 3-5 seconds, actually matching expected?
Blank Rails new app RAM usage?
- It is using a reasonable sample#memory_total=128.11MB sample#memory_rss=95.13MB
  - OK, what is making our app twice as big even on home page? need to investigate.
- let’s try same skeleton rails app, but with our scihist-digicoll gemfile, so we’re loading all those gems….
  - Up to sample#memory_total=266.05MB sample#memory_rss=191.89MB
  - yeah, that’s a lot more. Although still like half of what we were seeing before! If we can keep it under 300MB, we can be okay. Hmm. We’re gonna have to do memory profiling of scihist-digicoll.
  - scihist-digicoll deploys as sample#memory_total=273.16MB sample#memory_rss=206.83MB, not TOO much more….
    - but just request of home page takes us to sample#memory_total=278.65MB sample#memory_rss=212.80MB… hmm, not THAT much more, refreshing home page gives us a few more. 😞
    - Five child work takes us to sample#memory_total=277.77MB sample#memory_rss=211.89MB , a few refreshes to sample#memory_total=274.44MB sample#memory_rss=208.56MB
    - We are doing way better memory-wise than last time we looked?? Maybe moving away from hobby dyno really did matter???
    - 115 child work at sample#memory_total=306.34MB sample#memory_rss=240.46MB, it is getting bigger hmm.
    - Several Ramelli loads up to sample#memory_total=398.45MB sample#memory_rss=332.57MB DOH. Although taking up to 6 seconds to come back sometimes.
    - Let’s actually try a branch which allocates very little per member, disabled member view.
  - A testing version of scihist_digicoll which only displays a friendlier_id for each thumb/lockup, how does ramelli do….
    - sample#memory_total=443.03MB sample#memory_rss=379.39MB no better???
    - Let’s try without iterating through the children at all….
      - sample#memory_total=421.28MB sample#memory_rss=353.05MB WHAT REALLY? What is taking this memory, we’ve made ramelli hypothetically not load any more objects than a page with one child.
    - Aha, well, decorator.representative_member is still doing a members load. let’s stop it (this is also a point of optimization, we’re doing TWO member fetches here!)
      - Down to sample#memory_total=393.84MB sample#memory_rss=334.83MB … a little bit better, but this is still REALLY WEIRD that it’s so much. We’re going to have to memory profile somehow.
    - Let’s try elminating MOST of show page, it’s just a title! ramelli is still sample#memory_total=313.77MB sample#memory_rss=245.55MB still pretty big. WEIRD.

Thurs Oct 8

The RAM and CPU resource issues are concerning.

Why does an instance seem to take even more RAM on heroku than on our EC2?
Why are slow actions even so much slower on heroku than on our EC2?

Things we might investigate:

Use heroku https://devcenter.heroku.com/articles/log-runtime-metrics experimental add-on to get more precise logging of our RAM use over time as we trigger actions.
Try passenger on heroku instead of puma, to compare apples to applies
Try the heroku buildpack for jemalloc and compiling ruby with that, which some people say makes ruby use RAM better. (We didn’t do that in our EC2 though). https://elements.heroku.com/buildpacks/gaffneyc/heroku-buildpack-jemalloc
Try a heroku standard-0 postgres and standard-1x dyno to be using actual resources we will be using, in case the ’hobby' ones we are using to test have different performance characteristics
- Dyno standard-1x can easily be temporarily turned on and off, but db will probably stay there at $50/month
Actually analyze and try to optimize our app, RAM usage and performance
- Make fixity report run on cronjob and give you stored results instead of running when you click on it
- Make many-child pages use “infinite scroll” technique to only load first X and load more when you scroll down, instead of trying to load all at once
- More efficient production of each child page element on work pages (hard-code URLs etc)
- Use derailed gem to figure out what parts are using so much RAM and fix them https://github.com/schneems/derailed_benchmarks
While we can probably optimize our app, the fact that we weren’t forced to on manual EC2 but will on heroku worries us that we’re raising the skill level and time needed to maintain a working app on heroku? (actually already HAVE spent time optimizing app now, but apparently not yet good enough for heroku?)

RAM measure investigations

Using heroku log-runtime-metrics, confirm that our 1-worker-with-two-threads puma instance is starting at 316MB.

After just accessing home page, it’s up to 346.74MB
Accessing 115-child work ysnh5if, it’s up to 375MB, a few more times 386MB, then 392MB
Accessing ramelli it’s up to 444MB, a couple more times 493MB, then 511MB!!!

We may have a memory leak or bad memory behavior – but why isn’t it effecting us on passenger on our manual EC2s?

Wait, may be bad on passenger too! And yet it works on our EC2…

To measure on passenger, ssh to ubuntu@ staging web server,

run sudo passenger-memory-stats1.
run sudo PASSENGER_INSTANCE_REGISTRY_DIR=/opt/scihist_digicoll/shared passenger-status

passenger-memory-stats on web is showing instance VMSize from 536MB to 738MB. Has something happened to raise our memory usage since last time we looked? And why isn’t this machine swapping horribly? But it also says Total private dirty RSS: 463.93 MB, maybe the “Private” value matters more than the “VMSize” value… but not on heroku that measures actual VMSize? (passenger-status shows only 200M and down, they show different things – neither may be what heroku measures, but they are working okay on our raw EC2….

https://www.phusionpassenger.com/library/indepth/accurately_measuring_memory_usage.html

Heroku claims to be measuring “RSS” too, is that different than “private RSS”? sample#memory_total=509.52MB sample#memory_rss=469.41MB sample#memory_cache=40.11MB sample#memory_swap=0.00MB

Still way more than our passenger numbers! Let’s try with passenger…

Passenger on heroku

Having trouble getting passenger working on heroku for some reason…. hmm, without me doing anything it seems to have settled down and is working.

Just home page query sample#memory_total=376.59MB sample#memory_rss=290.23MB
Accessing 115-child work ysnh5if, sample#memory_total=415.04MB sample#memory_rss=328.95MB, but then recovers to sample#memory_total=351.00MB sample#memory_rss=264.92MB
ramelli 4b29b614k, sample#memory_total=459.54MB sample#memory_rss=373.44MB

So not really that different. Maybe a bit better under passenger.

Try jemalloc with puma

https://elements.heroku.com/buildpacks/gaffneyc/heroku-buildpack-jemalloc
just home page: sample#memory_total=315.88MB sample#memory_rss=241.65MB
115-child ysnh5if: sample#memory_total=348.99MB sample#memory_rss=274.69MB
ramelli 4b29b614k: sample#memory_total=421.22MB sample#memory_rss=346.92MB
- After a couple reloads: sample#memory_total=499.28MB sample#memory_rss=424.98MB

Maybe a bit better, but not so much really, about the same.

Wed Oct 7

We have a semi-functional app deployed to heroku – no Solr (so no searching), no background jobs, lots of edge case issues. But something to look at.

...

Versions Compared

Old Version 23

New Version Current

Key

Wed Oct

14

RAM measure investigations

A moment to look at speed again

RAM how many threads can we get away with

Monday Oct 12/Tuesday Oct 13

Thurs Oct 8

RAM measure investigations

Passenger on heroku

Try jemalloc with puma

Wed Oct 7

Page Comparison

Versions Compared

Old Version 23

New Version Current

Key

Wed Oct

14

RAM measure investigations

A moment to look at speed again

RAM how many threads can we get away with

Monday Oct 12/Tuesday Oct 13

Thurs Oct 8

RAM measure investigations

Passenger on heroku

Try jemalloc with puma

Wed Oct 7