Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

https://readcoop.eu/transkribus/ is a “comprehensive platform for the digitization, AI-powered text recognition, transcription and searching of historical documents.” I’ve heard it mentioned repeatedly at recent conferences I’ve been to, and I’m intrigued by the apparent possibilities. Could we make use of it, in the next few years, to make transcribing handwritten documents cheaper and more efficient?

I’m proposing to spend a bit of time tinkering with it in order to find out.

Table of contents

Child pages (Children Display)

Scope and assumptions

  • High quality translation of text files (as opposed to images of handwriting) is likely to become cheap, easy and convenient in the next 10 years or so, due to progress in machine learning and the economic incentives involved. This will be especially true for translations between the major modern languages.

  • I am focusing strictly on transcription here, not on translation.

  • An expert human transcriber and translator, given enough time, is always going to do a better job than an automatic transcription engine, because the human expert can infer the best reading of a handwritten text from deep familiarity with the linguistic, historical and cultural context within which a document was produced.

  • Experts’ time and effort are precious commodities; thus, we don’t want an expert to spend as little time as possible on simple data waste any time on mere data-entry.

  • It’s at least possible to imagine that a decent machine transcription of an image into a text file might be a real time-saver to an expert, even if the copy includes a certain number of errors.

...