Derivative generation on Heroku (obsolete)

Types of original images in the digital collections

  • TIFF

    • Black and white colorspace

    • RGB colorspace

  • PDF

Software needed to generate derivatives:

See Aptfile below.

Diagnostics for derivative generation software

These are now automated in a suite of tests you can run with. ./bin/rspec system_env_spec. See https://github.com/sciencehistory/scihist_digicoll/blob/master/system_env_spec/README.md .

See the page’s history for how we did this in the past.


Software setups

#

Aptfile

Buildpack

Results

#

Aptfile

Buildpack

Results

1

libvips-tools mediainfo imagemagick poppler-utils
heroku-community/apt heroku/ruby

vips-8.9.1-Sun Feb 23 08:51:26 UTC 2020

Color TIFFs work

B&W TIFFs do not work

PDFs work

2

libvips-tools mediainfo imagemagick poppler-utils

vips-8.9.1-Sun Feb 23 08:51:26 UTC 2020
PDFs work

All TIFFS work

Removing --eprofile srgb_profile_path from the arguments to vipsthumbnail in the code (docs) avoids the error described in issue 942.

3

vips-8.10.2-Mon Oct 12 16:43:59 UTC 2020

At some point in 2021, vips -l | grep -i pdf started returning blank - no poppler support, so PDFs don’t work.

All TIFFS work

4

vips-8.10.6-Tue Mar 23 20:52:58 UTC 2021

PDFs work

All TIFFS work

Combined audio derivatives don’t work

5

 

vips-8.10.6-Tue Mar 23 20:52:58 UTC 2021

PDFs work

All TIFFS work

Combined audio derivatives work again (see issue 1448)

libpoppler-glib8 in the aptfile may not be needed (see issue 1455 )

6

mediainfo
imagemagick
libglib2.0-0
libglib2.0-dev
libpoppler-glib8
poppler-utils

https://github.com/heroku/heroku-buildpack-activestorage-preview

heroku-community/apt

https://github.com/brandoncc/heroku-buildpack-vips

heroku/ruby

PDF on-demand stopped working in staging, so we added poppler-utils back into aptfile.

7

mediainfo

imagemagick

libglib2.0-0
libglib2.0-dev
libpoppler-glib8
qpdf
tesseract-ocr
tesseract-ocr-eng
tesseract-ocr-deu
tesseract-ocr-fra
tesseract-ocr-spa
tesseract-error-while-loading-shared-libraries-libarchive-so-13-python
libarchive13

heroku/python
https://github.com/heroku/heroku-buildpack-activestorage-preview
https://buildpack-registry.s3.amazonaws.com/buildpacks/heroku-community/apt.tgz
https://github.com/brandoncc/heroku-buildpack-vips
https://github.com/fnando/heroku-buildpack-exiftool
heroku/ruby

OCR.

Note we are removing poppler-utils

  • Note: Row 2 was a band-aid; it violated the rule implicit in the code that all TIFF derivatives should have their derivatives encoded as srgb, including the derivatives of B&W originals. I interpret the documentation as meaning that the icc profile of originals is reused in their derivatives, but further research is needed.

  • identify -verbose graphics_file.tiff | grep Colorspace can be used to elucidate what happens to various types of original after being processed by vipsthumbnail.