Page Comparison

...

Current implementation uses existing download_large size derivatives. I thought it was good to include high-res images suitable for printing at high-quality, but this leads to pretty large PDF sizes – and contributes to large RAM sizes. Try with download_medium see if that alone lets us do very large PDFs without worrying about it? Probably good enough.

Tried it: uses signfiicantly less RAM, but our biggest works still use too much, so doesn’t get us all the way there, unless we’re going to limit PDF generation to 500-page0-max or something. (smaller images may be better for users anyway, may do anyway).

50-image work, qf85nc451. Originally 99MB PDF using 440MB RAM. Smaller images, 20MB PDF using 284MB RAM.
100-image work, fx719n43f. Originally 171MB PDF using 549MB RAM. Smaller images, 35MB PDF using 328MB RAM.
325-image work, 1831ck38c. Originally 386MB PDF using 912MB RAM with out of memory errors. Smaller images, 92MB PDF, 472MB RAM.
Ramelli, 694 items. Originally 1.8GB PDF(!), did not measure RAM far too much for heroku. Smaller images, 325MB PDF, RAM usage 987MB, with heroku out of memory errors – so this is around the limit for what we can fit on heroku still (and we do have a few larger ones maybe too).

Ruby hexapdf instead of prawn

...

https://github.com/rrthomas/pdfjam

Progress? Merge PDFs?

One problem with those command-line ones is it makes it hard to do a progress bar like we’re doing now, if it requires downloading all the thumbs in advance, then in one command line (with no progress reported) making a PDF.

Is there a way to invoke them to “add one more image on end of PDF”, building it up one image at a time? Then we don’t need to have them all downloaded at once, and can report progress.

Or, should/could we use (any) tool to make a bunch of 1-page PDFs, then some other (command-line?) tool to “combine all these 1-page PDFs into one PDF”, which might be a fast and cheap operation?

pdftk

https://www.pdflabs.com/tools/pdftk-server/

(hmm, can’t add image to pdf i don’t think, although can merge and edit metadata on pdf)

combine_pdf

yet another ruby pdf library. is one thing I found to let us edit metadata (ie Info Dictionary) on existing pdf. Could maybe also do other useful stuff for us.

https://github.com/boazsegev/combine_pdf

nope just tried using it to edit metadata on a very large PDF, it used a ton of RAM.

Uncaching on-demand derivatives

If you’re trying different PDF generation techniques, you want to get the app to create PDFs with the new one – but the built-in caching of already created PDF will interfere with this.

Here’s one way to force uncache on heroku, replace with desired friendlier_id of work:

Code Block
heroku run bundle exec rails runner "OnDemandDerivative.where(work_id: Work.find_by_friendlier_id('qf85nc451').id).destroy_all"

Versions Compared

Old Version 2

New Version Current

Key

Ruby hexapdf instead of prawn

Progress? Merge PDFs?

pdftk

combine_pdf

Uncaching on-demand derivatives