Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 21 Next »

Overview

The Internet Archive BookReader is a candidate to replace the digital collections' custom-built viewer. See https://sciencehistory.atlassian.net/wiki/spaces/HDC/pages/2206203905/OCR+planning+notes#Search-within--Work for a discussion in the context of other candidates.

A working example: the bird book. Note

  • the ease of navigation within the book;

  • the right and left side navigation that allows you to interact with the book in much the same way as a codex;

  • the mature search-within-book interface;

  • the scaleable, copy-able text overlay.

A custom-built php script serves the images.

Important links and resources

Search within the book

In December 2023, we ran some experiments to see if we could integrate a modified version of the BookReader demo code with image and text metadata from the digital collections. Interestingly enough, we were able to get the BookReader to consume not only our images but also our HOCR content, which allowed us to demo a simple version of “search inside the book”.

  • The code for the demo is here.

  • The item from the digital collections used is hardcoded, as this was just a proof of concept.

  • The serializers that we use to serve our metadata to the bookreader are at this PR. (Don’t merge it!)

  • Note this demo uses single JPG images – so is not appropriate for delivering full-resolution pan-and-zoom, which needs a tiling solution (such as IIIF, see below), which we have not yet been able to make work

IIIF

IIIF is a standard that could theoretically allow us to use the BookReader to consume our images and metadata. (The APIs that interest us are the image API and the content search API.) This would notably allow us to offer pan-and-zoom functionality, among other useful features. For this to work:

  • we would need to serve our metadata according to some version of the those APIs, presumably the latest, and

  • the BookReader would have to work against those APIs.

The Internet Archive and IIIF

This blog post describes the history of the Internet Archive and IIIF. The key sentence seems to be: “By making Internet Archive images and texts IIIF-compatible, they may be opened using any number of compatible IIIF viewer apps, each offering their own advantages and unique features”. Tellingly, the post makes no mention of the BookReader.

The Internet Archive does in fact maintain a IIIF server, but its front end is actually Mirador (which itself includes the OpenSeadragon viewer.)

The BookReader and IIIF

  • Originally we wanted to look at the BookReader’s IIIF demo/plugin because we thought it was a proven working path to integrating with BookReader, that would also give us use of appropriate-zoom-level-tiling images (instead of fetching entire full-page full-res images). we put some time into trying this path, but…

    • It appears neither of those assumptions were true – the IIIF plugin was actually still fetcing whole-page graphics and the plugin/demo appears not to be working and need a lot of work!

    • We aren’t actually wedded to IIIF (we don’t currently even use it), so this isn’t necessarily a disaster, it just means this was the wrong path to investigate.

    • Subsequent are notes about the issues with IIIF plugin

Even if I had been able to fix the open issue, the work would have been of little help to us since both the BookReader and the IIIF standard have evolved too much in the intervening 5 years of development.

Conclusion

I have to conclude that the BookReader is not worth pursuing as a component of the digital collections. While impressive in its current form, it depends on a complex and ill-documented set of interfaces with the Internet Archive’s image and metadata servers, and relies in particular on a home-grown php image server script that looks difficult to maintain.

I certainly understand the IA’s desire (which can be inferred from their blog post) to move to a more interoperable standard for serving images, to and get out of the business of maintaining an image viewer altogether.

  • No labels