Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

I’m proposing to spend a bit of time tinkering with it in order to find out.

Scope and assumptions

  • I am focusing strictly on transcription here, not on translation.

  • An expert human transcriber and translator, given enough time, is always going to do a better job than an automatic transcription engine, because the human can infer the best reading of a handwritten text from deep familiarity with the linguistic, historical and cultural context within which a document was produced.

  • Experts’ time and effort are precious commodities; thus, we want an expert to spend as little time as possible on simple data entry.

  • It’s at least possible to imagine that a decent machine transcription of an image into a text file might be a real time-saver to an expert, even if the copy includes a certain number of errors.

Timeline

roughly 3 weeks in mid-late January 2023 (subject to interruptions, of course; in the short term, this ranks low among our priorities).

...

  • Install Transkribus and create an account that I can use

  • Read the docs

  • Become familiar with the Transkribus user interface

  • Learn the lingo (and there is a lot of lingo)

  • Explore the user community and other online resources

  • Research current use patterns: what are common ways in which our peers are using the tool

  • Gain an intuitive sense of how we might realistically use it

...

  • Test the tool using Jocelyn's transcriptions of Bredig handwritten letters as "ground truth"

  • Attempt to train the software to transcribe letters it hasn't seen yet

  • Evaluate automatic transcriptions results against expert transcriptions

  • Take a look at other collections of handwritten letters too (Pasteur? Booth?)

  • see if I can come up with practical recipes that could get us more mileage out of human experts in the future, by partially automating the transcription process.write-up (see below)

Deliverables

I’d like to write up my results here in the wiki; I want to produce a short, non-technical report containing:

  • a description of what the software is…and , and what it is not

  • a glossary of machine-learning technical terms

  • a list of online resources (websites, listservs, videos, etc.)

  • examples of how peer institutions have used the software

  • some suggestions about how I think it might come in handy in the futurepractical recipes that could get us more mileage out of human experts in the future, by partially automating the transcription process.