Migration from the OH microsite to the digital collection

Here are some notes about scope and assumptions for the migration of the OH microsite into the digital collections.

Common identifier

We'll base the correspondence between source and destination records on the "oral history transcript number". For the Rauscher OH, for instance, both source and destination records contain this ID, which is "0560": https://digital.sciencehistory.org/works/gr4xnkk in the citation, and https://oh.sciencehistory.org/oral-histories/rauscher-iii-frank-j under "interview details".

Migration:
We plan to migrate the following metadata elements from the Oral History microsite into the digital collections:

  • Interview

    • Sponsor field correction: this was actually entered manually, and was not part of the migration.

  • Interviewee

    • Portrait

    • Birth and death

      • Date and place

    • Education

      • Date, Institution, degree, subject

    • Career

      • Start and end dates, institution, role

    • Awards

      • Date, award

  • Interviewer

    • Name

    • Profile

    • Connection between interview and profile

Institutions -> FAST headings

Institution names under Education and Career will be converted, where possible, to the equivalent FAST term.

 

  1. Create list of all unique institutions from current data in microsite by automatic extraction from db

  2. We’ll have to find the corresponding FAST heading for each one, and make a table, perhaps in google docs. This is expected to be a manual process.

Overwriting data in production

The abovementioned fields (listed at top) are all currently BLANK in the digital collections. As we refine our migration code, the blank fields will be populated with successive versions of data harvested from the microsite, each replacing the previous one.

Careful: this means if you enter any data *manually* into any of the destination fields in the Digital Collections, that data will be replaced with fresh microsite data next time we run a migration.

Test plan

The idea here is to compare each sample interview on staging in the digital collections with its corresponding microsite record. If everything looks good, we’ll run the import in production during the first week of May.

Careful: don’t change any metadata in the digital collections (staging or production) in response to this test, as it will be overwritten anyway.

Careful 2: If the data is wrong in both the microsite and the digital collections staging site, feel free to fix the problem on the microsite, but we won’t consider that a problem with the migration code.

Here’s a spreadsheet for notes. Or maybe we can just take notes on this page.

Any problems we identify that are widespread can go into GitHub as issues for Eddie to work on.

Problems that demonstrably affect only one or two records are probably easiest to fix manually in the digital collections once the migration is complete (early May, hopefully.) Make a note and come back to them.

Sample records:

Interviews with multiple interviewees: Cole/Verma, Aitchisons, …

Same person interviewed more than once: Lederberg 1, Lederberg 2, Hackerman 1, Hackerman 2, Hackerman 3, Hackerman 4.

Interviews with multiple interviewers: Hay, Ehrlich, …

Interviews with FAST headings changed: Yi, …

Other outliers: Schoemaker, …

Random sample:

Feel free to add others.

Round 2:

Metadata to check

  • Interviewee portrait

    • Alt text

    • Caption

  • Interviewee bios

    • Should all be there

    • Birth and death

      • Date and place

    • Education

      • Date, school (should be FAST), degree, subject

    • Career

      • Start and end dates, employer (should be FAST), role

    • Awards

      • Date, award

  • Interviewers

    • Should all be there

    • Biographies should be present if they exist in the microsite

Wrap up

We ran the migration successfully in production on May 5th 2021.

The migration code was removed from the codebase in pull request #1137:https://github.com/sciencehistory/scihist_digicoll/issues/1137 ).

In case our testing above missed any files or metadata, a backup of the microsite code, files and database will remain in Glacier Deep Archive storage at  s3://chf-hydra-backup/oral_history_microsite_legacy_data/oh_final_backup.tar.gz .