Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Audio Files

Oral History’s Current Status:
We have 3 types of audio files used by Oral Histories. Two are legacy formats which we won’t be using further and one is the current and future standard. We can support 2 of the formats, but currently will not be able to accept the other audio format.

Legacy Formats:

  1. WMA: A few files are Windows Media Audio (WMA) formatted. Current we will not be able to accept or use any of these files.  If we run across any WMA only OH records, we'll need to make a ticket to talk about how to handle them.
  2. MP3: A number of oral histories have been kept as mp3 files, these will kept in their original format and further changes will be from the mp3MP3.

Current Format:

  1. WAV: The current standard format used for all Oral Histories is the WAV file. This is a lossless uncompressed audio file. This format has uses from a preservation standpoint but takes up more space than other audio options, We will not use this in our application.

Use in digital collections:

For simplicity’s sake we should focus on only a limited number of audio formats. One will be mp3, to support old oral histories and for broad compatibility for streaming/download use by most users. To support lossless storage of audio files (WAVs and others) we will use Fully Lossless Audio Codec (FLAC) files.  If WAV files exist along with MP3 files, the WAV files should take precedence given they are lossless and uncompressed and are therefore of a higher quality than the compressed lossy MP3 format. Such files will need to be converted to the FLAC format for ingestion into the Digital Collections.

Benefits:

  • The flac file standard reduces the size of a WAV file to 50-60% of the size of the original file. This cuts down storage costs and data transfer costs and time.
  • FLAC is also a well supported audio standard with an open source license.

...

  1. FLAC format, used for WAV files.
  2. MP3 format, when the original file is a mp3 formatted file.

Workflow/Ingest:

When a oral history has been selected the digital collections team will handle the ingest of the file(s) into the digital collection. Right now there will be two workflows, one if the file is a .mp3 and one if it is a .wav file.

...

Response from Lee Berry, Curator of Oral Histories (via Nicole J. email 3/6/19): "I’m disinclined to merge the audio files, since they’re usually indicated in the transcript and it can be helpful to know which audio file you’re listening to as you navigate the PDF."  Based on her response, let's move ahead with a zipped folder containing all the sections within a single interview.

flac assurance: To make sure what our flac file is a proper copy, even though we are not promising preservation, we need to be sure that the flac can be transferred back to the wav file with no changes in data. This is done by taking a checksum of the original wav and then after the flac is made generating a new wav from the flac and comparing its checksum to the original file. The tool used for the conversion is freac (to make the flac) and the flac command line tool to rebuild a new wav. In all cases the flac command line tool is considered to be the gold standard for conversion in or out of the flac format, not ffmpeg. Other tools, Audacity, were tested but did not generate the same checksum between the new and old wav files, making any future programmatic comparisons difficult.

User persona per 5/2 mtg. with Lee Berry

User references particular passage found within oral history transcript.  Would then (an Oral History researcher) reads oral history transcript and identifies a particular passage of interest. User would like to listen to that passage via the audio player (user should be able to navigate to particular file identify audio file and corresponding timestamp for passage based on notes within transcript).   May User may also like want to download that particular passage; would prefer not to download the entire oral history recording.

Additional context for user persona and general notes: As described by Lee, researchers typically begin by consulting an oral history transcript (when available) and then may consult the audio file(s) to listen to passages of particular interest. It is rare that researchers would choose to listen to an entire oral history from start to finish when a transcript is available. In terms of streaming functionality, we decided to focus on meeting the needs and expectations of researchers, as described in the user persona above, while the ability to download audio file(s) will meet the needs of more casual users. Lee also shared that they rarely get requests for audio files from external users (though she's hoping greater visibility through the Digital Collections will encourage more use of the audio files); in contrast, internal users (podcast folks and the like) typically want access to specific segments rather than entire recordings.

Next steps per 5/2 mtg. with Lee Berry

Develop audio player that retains individual files so user can navigate to particular file of interest. Shared multi-file player mock-up previously shared by Nicole with the team and Lee agreed this interface or something similar would be ideal. Also placement of the audio player above the pdf transcript is ideal so users don't have to scroll to see audio files.

We will discuss download functionality after player development. Lee prefers that time be spent on player versus download functionalities (i.e., zipped folder download capability not as important as user being able to listen to or download particular file per transcript notation). Dan shared a hand-drawn mock-up showing download arrows next to audio filenames so users could download individual files.  Lee agreed this would be a feasible approach.

Consider asking internal users such as Mariel, Christy and/or Rebecca how they would like the player and download functionality to perform.  Per post-meeting discussion, let's wait until player is created to ask about user preference for download capabilities.

Content not covered:

  • We will not be working at all with WMA files yet due to issues in handling the audio codec and questions about the best option for conversion. As there are very few of these files this should not be a problem.
  • We will not transfer or preserve any tags on the audio files themselves. Currently (3/2019) it does not seem like any Oral Histories use tags on the audio file itself.
  • We are serving as a method of access for the Oral Histories, not as a preservation system. We will preserve the files we have to the same standards other digital collections files are preserved but Oral Histories is handling the preservation of their files themselves according to their practices.

Video

Not yet considered.