/
Derivative generation on Heroku (obsolete)

Derivative generation on Heroku (obsolete)

Types of original images in the digital collections

  • TIFF

    • Black and white colorspace

    • RGB colorspace

  • PDF

Software needed to generate derivatives:

See Aptfile below.

Diagnostics for derivative generation software

These are now automated in a suite of tests you can run with. ./bin/rspec system_env_spec. See https://github.com/sciencehistory/scihist_digicoll/blob/master/system_env_spec/README.md .

See the page’s history for how we did this in the past.


Software setups

#

Aptfile

Buildpack

Results

#

Aptfile

Buildpack

Results

1

libvips-tools mediainfo imagemagick poppler-utils
heroku-community/apt heroku/ruby

vips-8.9.1-Sun Feb 23 08:51:26 UTC 2020

Color TIFFs work

B&W TIFFs do not work

PDFs work

2

libvips-tools mediainfo imagemagick poppler-utils

vips-8.9.1-Sun Feb 23 08:51:26 UTC 2020
PDFs work

All TIFFS work

Removing --eprofile srgb_profile_path from the arguments to vipsthumbnail in the code (docs) avoids the error described in issue 942.

3

vips-8.10.2-Mon Oct 12 16:43:59 UTC 2020

At some point in 2021, vips -l | grep -i pdf started returning blank - no poppler support, so PDFs don’t work.

All TIFFS work

4

vips-8.10.6-Tue Mar 23 20:52:58 UTC 2021

PDFs work

All TIFFS work

Combined audio derivatives don’t work

5

 

vips-8.10.6-Tue Mar 23 20:52:58 UTC 2021

PDFs work

All TIFFS work

Combined audio derivatives work again (see issue 1448)

libpoppler-glib8 in the aptfile may not be needed (see issue 1455 )

6

mediainfo
imagemagick
libglib2.0-0
libglib2.0-dev
libpoppler-glib8
poppler-utils

https://github.com/heroku/heroku-buildpack-activestorage-preview

heroku-community/apt

https://github.com/brandoncc/heroku-buildpack-vips

heroku/ruby

PDF on-demand stopped working in staging, so we added poppler-utils back into aptfile.

7

mediainfo

imagemagick

libglib2.0-0
libglib2.0-dev
libpoppler-glib8
qpdf
tesseract-ocr
tesseract-ocr-eng
tesseract-ocr-deu
tesseract-ocr-fra
tesseract-ocr-spa
tesseract-error-while-loading-shared-libraries-libarchive-so-13-python
libarchive13

heroku/python
https://github.com/heroku/heroku-buildpack-activestorage-preview
https://buildpack-registry.s3.amazonaws.com/buildpacks/heroku-community/apt.tgz
https://github.com/brandoncc/heroku-buildpack-vips
https://github.com/fnando/heroku-buildpack-exiftool
heroku/ruby

OCR.

Note we are removing poppler-utils

  • Note: Row 2 was a band-aid; it violated the rule implicit in the code that all TIFF derivatives should have their derivatives encoded as srgb, including the derivatives of B&W originals. I interpret the documentation as meaning that the icc profile of originals is reused in their derivatives, but further research is needed.

  • identify -verbose graphics_file.tiff | grep Colorspace can be used to elucidate what happens to various types of original after being processed by vipsthumbnail.

Related content

Heroku Operational Components Overview
Heroku Operational Components Overview
More like this
Installing tesseract
Installing tesseract
More like this
Sync prod to staging
Sync prod to staging
More like this
Quick Operational Troubleshooting Cookbook
Quick Operational Troubleshooting Cookbook
More like this
Deploying the OH holding branch
Deploying the OH holding branch
More like this
Heroku developer setup
Heroku developer setup
More like this