...
We can see that while it is advertised as a “1GB” machine, the JVM only has a bit less than half of that.
This isn’t tunable by us, it’s SearchStax’s choice. This is to leave space in RAM for other OS and maintainance tasks, and we can see that the system Physical Memory is pretty healthy too. See more on SearchStax memory choices in SearchStax docs. (And also some SearchStax docs on tuning Solr memory use.)
The good news is the 460MB is still well over the 287MB we saw our in-use production server using.
Only using 109MB on boot, a bit less than our staging solr for some reason, but in the ballpark.
Adding some load
We’ll want to do some reindexes, and also a bunch of queries.
I pointed our existing EC2 staging server at the searchstax solr.
did a reindex, while refreshing Solr admin a lot to see dashboard – sometimes up to 140MB use.
Did a “blank” search, which I know requires all facets to be calculated which can be RAM intensive. Temporarily up to 150MB use, then down to 80MB again.
Went to last page of pagination of “blank search” – I believe Solr is RAM hungry when you do deep pagination like this. Temporairly up to 150MB, then back down to 75MB.
Okay…
Let’s do a load test where we ask for that deep-pagination page over and over… while also doing a reindex.
wrk -c 1 -t 1 -d 3m https://staging-digital.sciencehistory.org/catalog?page=333
=> refreshing to see RAM use, never went over 170MB, didn’t get any OOM errors. After done, the RAM rested for a bit around 150MB then returned to 99MB.
Without restarting, now with a some concurrency to stress it more?
wrk -c 4 -t 1 -d 3m https://staging-digital.sciencehistory.org/catalog?page=333
=> Observed up to 250MB, but no OOM and that’s still around 50% of capacity, plenty of room.
10 concurrency, far more than we’d ever see in reality, just to see what happens? `
wrk -c 10 -t 1 -d 3m https://staging-digital.sciencehistory.org/catalog?page=333
`Still didn’t observe more than 285MB. And that at one point it dropped to 100MB even though test was still going on?
Using the app manually it’s definitely kinda slow when this much load is being put on it – unsurprising! But no errors, it’s working!
OK, that’s all looking fine.
Load testing with queries taken from log
Let’s try load testing with a “realistic” set of URLs taken from actual app logs?
grep "/catalog?" production.log | shuf | head -50 > 50_random_queries.txt
Gave me 50 random catalog search queries from our actual production app. (OK, there were a few redirects and other things we didn’t want in there too). Most of them look like bot traffic honestly, just following facet links kind of randomly. Or our “chemistry” search that is a “ping” done every minute by our honeybadger uptime checker.
But let’s try these anyway. Using some ruby regexp magic to actually extract some urls, then we run:
URLS=./500_urls.txt wrk -c 10 -t 1 -d 1m -s load_test/multiplepaths.lua.txt https://staging-digital.sciencehistory.org/
And do a reindex at the same time…
RAM usage observed up to 275MB. Still well under our 490MB JVM max.
App was accessible the whole time manually, although slow. No Solr OOM or other errors reported.
Conclusion?
I think we’re fine with NDN1.
If not, we can always upgrade to NDN2 at any time. Even if we do an annual contract for NDN1, we can always upgrade it to NDN2.