Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 24 Next »

Types of original images in the digital collections

  • TIFF

    • Black and white colorspace

    • RGB colorspace

  • PDF

Software needed to generate derivatives:

  • mediainfo

  • convert

  • pdfunite

  • vips

  • ffmpeg

Diagnostics for derivative generation software

Run the following commands in a heroku dyno.

This is now automated in a suite of tests you can run with. ./bin/rspec system_env_spec. See https://github.com/sciencehistory/scihist_digicoll/blob/master/system_env_spec/README.md Below for historical purposes only.

Your results may vary slightly, but anything that is way off should be seen as a red flag.

heroku run bash

mediainfo --version
# Normal output:
# MediaInfo Command line,
# MediaInfoLib - v19.09

convert -version
# Normal output:
# Version: ImageMagick 6.9.10-23 Q16 x86_64 20190101 https://imagemagick.org
# Copyright: © 1999-2019 ImageMagick Studio LLC
# License: https://imagemagick.org/script/license.php
# Features: Cipher DPC Modules OpenMP
# Delegates (built-in): bzlib djvu fftw fontconfig freetype jbig jng jpeg lcms lqr ltdl lzma openexr pangocairo png tiff webp wmf x xml zlib

pdfunite -v
# Normal output:
# pdfunite version 0.86.1

vips --version
# Normal output:
# vips-8.10.6-Tue Mar 23 20:52:58 UTC 2021

ffmpeg -version
# Normal output:
# ffmpeg version 4.2.3 Copyright (c) 2000-2020 the FFmpeg developers
# built with gcc 5.4.0 (Ubuntu 5.4.0-6ubuntu1~16.04.12) 20160609
# configuration: --prefix=/home/work/sffmpeg/build --datadir=/home/work/sffmpeg/build/etc --disable-shared --enable-static --enable-pic --pkg-config-flags=--static --enable-gpl --enable-version3 --disable-doc --disable-debug --disable-ffplay --disable-outdevs --enable-runtime-cpudetect --extra-cflags='-I/home/work/sffmpeg/build/include -static' --extra-ldflags=-L/home/work/sffmpeg/build/lib --extra-ldexeflags=-static --extra-libs='-lstdc++ -lexpat -ldl -lm -lpthread' --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libaom --enable-libmp3lame --enable-libspeex --enable-libtheora --enable-libvorbis --enable-libx264 --enable-libx265 --enable-libvpx --enable-libopus --enable-libfreetype --enable-libass --enable-mbedtls
# libavutil      56. 31.100 / 56. 31.100
# libavcodec     58. 54.100 / 58. 54.100
# libavformat    58. 29.100 / 58. 29.100
# libavdevice    58.  8.100 / 58.  8.100
# libavfilter     7. 57.100 /  7. 57.100
# libswscale      5.  5.100 /  5.  5.100
# libswresample   3.  5.100 /  3.  5.100
# libpostproc    55.  5.100 / 55.  5.100


vips -l | grep -o '[a-z_]*pdf[a-z_]*'
# Normal output:
# pdfload_base
# pdfload
# pdf
# pdfload_buffer
# pdfload_source

cd tmp

PROFILE=`ls ../vendor/bundle/ruby/*/gems/kithe-*/lib/vendor/icc/sRGB2014.icc`
wget https://digital.sciencehistory.org/downloads/m3zcuho -O b_w.tiff
wget https://digital.sciencehistory.org/downloads/1h16b9n -O color.tiff
wget https://digital.sciencehistory.org/downloads/519ucnx -O normal.pdf

vipsthumbnail  color.tiff  --eprofile $PROFILE
vipsthumbnail  b_w.tiff    --eprofile $PROFILE
vipsthumbnail  normal.pdf

identify *.jpg | grep sRGB

# Normal output (ignore warnings):
# tn_b_w.jpg    JPEG 128x108 128x108+0+0 8-bit sRGB 23603B 0.000u 0:00.000
# tn_color.jpg  JPEG 128x96 128x96+0+0   8-bit sRGB 11880B 0.000u 0:00.000
# tn_normal.jpg JPEG 99x128 99x128+0+0   8-bit sRGB  1619B 0.010u 0:00.000

Software setups

We’ve been through several of these since starting to investigate Heroku.

#

Aptfile

Buildpack

Results

1

libvips-tools
mediainfo
imagemagick
poppler-utils
heroku-community/apt
heroku/ruby

vips-8.9.1-Sun Feb 23 08:51:26 UTC 2020

Color TIFFs work

B&W TIFFs do not work

PDFs work

2

libvips-tools
mediainfo
imagemagick
poppler-utils
heroku-community/apt
heroku/ruby

vips-8.9.1-Sun Feb 23 08:51:26 UTC 2020
PDFs work

All TIFFS work

Removing --eprofile srgb_profile_path from the arguments to vipsthumbnail in the code (docs) avoids the error described in issue 942.

3

mediainfo
imagemagick
poppler-utils
heroku-community/apt
https://github.com/machinio/heroku-buildpack-vips
heroku/ruby

vips-8.10.2-Mon Oct 12 16:43:59 UTC 2020

At some point in 2021, vips -l | grep -i pdf started returning blank - no poppler support, so PDFs don’t work.

All TIFFS work

4

mediainfo
imagemagick
libglib2.0-0
libglib2.0-dev
libpoppler-glib8
heroku-community/apt
https://github.com/brandoncc/heroku-buildpack-vips
heroku/ruby

vips-8.10.6-Tue Mar 23 20:52:58 UTC 2021

PDFs work

All TIFFS work

Combined audio derivatives don’t work

5

mediainfo
imagemagick
libglib2.0-0
libglib2.0-dev
libpoppler-glib8
https://github.com/heroku/heroku-buildpack-activestorage-preview
https://buildpack-registry.s3.amazonaws.com/buildpacks/heroku-community/apt.tgz
https://github.com/brandoncc/heroku-buildpack-vips
heroku/ruby

vips-8.10.6-Tue Mar 23 20:52:58 UTC 2021

PDFs work

All TIFFS work

Combined audio derivatives work again (see issue 1448)

libpoppler-glib8 in the aptfile may not be needed (see issue 1455 )

6

mediainfo
imagemagick
libglib2.0-0
libglib2.0-dev
libpoppler-glib8
poppler-utils

https://github.com/heroku/heroku-buildpack-activestorage-preview

heroku-community/apt

https://github.com/brandoncc/heroku-buildpack-vips

heroku/ruby

PDF on-demand stopped working in staging, so we added poppler-utils back into aptfile.

7

mediainfo

imagemagick

libglib2.0-0
libglib2.0-dev
libpoppler-glib8
qpdf
tesseract-ocr
tesseract-ocr-eng
tesseract-ocr-deu
tesseract-ocr-fra
tesseract-ocr-spa
tesseract-error-while-loading-shared-libraries-libarchive-so-13-python
libarchive13

heroku/python
https://github.com/heroku/heroku-buildpack-activestorage-preview
https://buildpack-registry.s3.amazonaws.com/buildpacks/heroku-community/apt.tgz
https://github.com/brandoncc/heroku-buildpack-vips
https://github.com/fnando/heroku-buildpack-exiftool
heroku/ruby

OCR.

Note we are removing poppler-utils

  • Note: Row 2 was a band-aid; it violated the rule implicit in the code that all TIFF derivatives should have their derivatives encoded as srgb, including the derivatives of B&W originals. I interpret the documentation as meaning that the icc profile of originals is reused in their derivatives, but further research is needed.

  • identify -verbose graphics_file.tiff | grep Colorspace can be used to elucidate what happens to various types of original after being processed by vipsthumbnail.

  • No labels