Intro
https://readcoop.eu/transkribus/ is a “comprehensive platform for the digitization, AI-powered text recognition, transcription and searching of historical documents – from any place, any time, and in any language.” It’s come up repeatedly at conferences I’ve attended, and I’d like to get some practical knowledge about what it actually does. Could we make use of it, in the next few years, to make transcribing handwritten documents cheaper and more efficient? I’m proposing to spend a bit of time tinkering with it in order to find out.
Timeline
roughly 3 weeks in mid-late January 2023 (subject to interruptions, of course; in the short term this is low priority).
General approach
Install Transkribus and create an account that I can use
Read the docs
Become familiar with the Transkribus interface
Learn the lingo (and there is a lot of lingo)
Get an intuitive sense of what it's really for, and how we might realistically use it
Explore the user community any and other helpful online resources
Research current use patterns: what are common ways in which people are using the tool to save time and effort.
Specific tasks
Test the tool using Jocelyn's transcriptions of Bredig handwritten letters as "ground truth"
Attempt to train the software to transcribe letters it hasn't seen yet
Evaluate results against expert translations
Take a look at other collections of handwritten letters too (Pasteur? Booth?)
see if we can imagine ways to get more mileage out of human experts in the future by partially automating the transcription process.
Deliverables
I’d like to write up my results here in the wiki; I’m thinking I want to produce a short, non-technical 2-page explanation containing:
a description of what the software is
what it is not
a glossary of machine-learning technical terms that I’ve encountered, so we all speak the same lingo
a list of online resources I found helpful in teaching myself how to use the software
examples of what peer institutions have done with it
how it might come in handy in the future.