Page Comparison

Present: Lee Berry, Michelle DiMeo, Stephanie Lampkin, Cat Lu, Erin McLeary, Patrick Shea, Amanda Shields, Andrea Tomlinson, Jim Voelkel

Absent: David Caruso, Anna Headley, Hillary Kativa, Amanda Shields

INTRODUCTION

MD: Goal: Conversation across departments; make CHF-wide decisions; inform various staff in our departments

Future topics: standards and policies for metadata, imaging, rights and permissions

DEPT. OVERVIEWS

Photography - MD: read Hillary's overview- Word Doc. Hillary also recently created Access database revealed mostly been using 300dpi, some in 400dpi and above. Now cataloging in MARC and moved from object to folder/collection level cataloging

Rare Books - JV: Questioned what standards are for funding digital projects - (MD/CL - 400 dpi, FADGI Guidelines; imaging standards discussion likely at next meeting). Images from Neville digital project to illustrate Neville collection, every book with little exception has title page and maybe a few more images, more plates from interesting books (random); technology changed, most valuable books imaged first and poorly; good for reference but not good for digital library, 5000 Neville books, 3 images per book, thumbnail 15,000, images recorded in access database, filenames parsable, start off with bibrecord, loosely linked to catalog record with metadata. Website collections separate, one off, legacy data/images (about 50), largely same images, few instances taken for magazine. JV has also taken images personally, one offs given to andrea to upload to opac, shoots to highest res of camera, 70mb dropped 8mb. Outside of Neville about 1500 books, working on workflow where books are imaged every year when they come in. Ask Elsa for paperwork from Neville project. Former employee Bob Hull also has spreadsheet on what each image is - AT to send to MD and CL.

Museum collections - EM: handout prepared by AS (not present). PastPerfect 5 also documents exhibitions, conservation and condition reports stored somewhere else, intern this summer surveying high/low res jpegs to determine concrete details, have a lot of assets in different places, don't have full control over, includes fine art, objects on website also one offs that were copied and pasted by hand; CL got data out of PastPerfect and began mapping, but needs clean-up.

Archives - PS: accessioning in Excel files, PP5 used for stamp collection and advertisements, haven't used in a while, no plans in future. Accessioning moving into ArchivesSpace module. Archival digitizing very rare outside of image collections, will do pdfs for access purposes to send offsite. Processing for media kept with a/v formats. Finding aids historically choose local rules, moving forward marking up with EAD, with minimal DACS requirements. Accession spreadsheet will stay because of museum crossover.

Oral Histories - LB: about 700 oral histories completed, 200 in various stages. All have digital component, each interview gets standardized file structure on P Drive. Final comprised of Pdfs, word files, mp3s, wavs, digital photographs. Had Excel spreadsheets and Access databases trying to capture info about all of these files–first spreadsheet then Access database, stopped sometime 2008, then relied on HighOrbit (workflow software) to keep that info and track OH progress. Data extractable from HighOrbit system–with Chuck's help into Access database, very hodgepodgy, cataloging in MARC as part of OPAC catalog, separate input into website CMS.

Library OPAC - AT: Catalog has 67,000 individual titles, 135,000 ind. item records, 165 of about 270 archival records have finding aids, 38 image collection records with pdf finding aids, 4,303 records from Neville collection. With some title variation, use MARC, LCSH, original cataloging also shared on OCLC.

MD - imaging, rights, and data standards all over the place across departments, at least 3 different rights statements exist on website. Big project ahead.

HYDRA DEMO LINKS

MD - Hydra is a DAMS that also offers backend preservation, images will be checksumed for file integrity. Also has rights/access management, linked data capability. Not out-of-the-box solution, but can span diverse collections. Will develop in house and move through project phases. All Hydra has faceted browsing, using metadata we input. Data can have geolocation, link back to catalog, creative commons license, links to social media, page turner mechanisms for books, zoom in on images, download PDFs or high res images.

Examples:

Digital Commonwealth (image collection), https://www.digitalcommonwealth.org/

UC San Diego http://library.ucsd.edu/dc

Institut del Teatre (museum, small institution) http://colleccions.cdmae.cat/

John Hopkins Levy Collection - levysheetmusic.mse.jhu.edu/

MD - Short term plan for website launch in November with migrated data from current website as digital collections page. Later to be replaced by Hydra page as Collections on website.

METADATA INTRO

CL - Mapping data currently in website as intro to Dublin Core - flexible, has been used by curators to describe all different types of collections CHF has, can be broad

Dublin Core - object level cataloging that describes the digital file. So if it's an image from a book, it describes that image, not the whole book.

15 core elements - Example Publishing element can include localized fields and repeat fields, Creator - person related to the creation;

LB: Can some departments enter more information than others? (MD - Yes, and we will meet with different departments individually after this meeting to agree guidelines)

JV: What will the workflow be? Seems like a lot of work. (MD - Combination of machine clean-up and human involvement, might be different for different departments, could explore two step QA process and centralizing digtization to employ library interns)

CL: Encourages staff to identify harder to catalog items, can explore how these would these map onto Dublin Core

CL clarifies coding on metadata spreadsheet: Blue items on spreadsheet - descriptive metadata, about the content of the collection item; Green is administrative metadata, New ID - digital object identifier URI issued by system

JV - Rare Books - most won't be local guidelines, where do we pull controlled vocabularies? (CL identifying examples; MD to set meeting with JV afterwards to discuss rare books)

PS - Archives isn't wedded to any system; advertisements and stamps at item level - folder level on OPAC

CL - Do people want to keep divisions between departments as on current website? No - Archives and Rare Books in Library, but do we want to identify Rare Books as collection within Library? Ask Erin - does she want to keep Museum distinctions?

MD- First project phase: AH building server infrastructure, starting testing with Sufia (basic file ingest example for Hydra), next will try ingesting objects with out-of-the-box Dublin Core template

Next step is identifying core collection of about 50 objects from each curator, must be single image for ingests, AT will work on batch upload scripts for mapping, but needs to know what dataset will look like first. Depts to discuss top 50 items, depth and quality of data they'll provide, and thinking about which Dublin Core elements they want/need

JV: wants to see what's been done for rare books and Dublin Core at other institutions, ex. Measurements for rare books? (CL - Similar to various artifacts with different labels)

General discussion on taking messy data and images not according to standard, MD - do we move forward with things that don't meet guidelines? Having guidelines cross the board will help with grants and zoom functions. Not all legacy items unusable–photographs and some PastPerfect museum items have good imaging and workable data. Think through how to centralize workflow–digitization queue.

PS: Archives to start by choosing object record of archival collections that represent each Finding Aid, with a link to Finding Aid in OPAC.

JV: Rare Books could use the manuscripts that Penn digitized. Complex objects like fully digitized books will have to wait for page turning, but could do a representative image. Manuscripts still need to be cataloged in the OPAC.

LB: For oral histories, how to control access to audio and full transcript? Search everything, but not display everything? (MD-not positive yet, but we'll find a way. Oral histories unfortunately has to wait due to many unique requirements.)

MD: Re: workflow, will start with these datasets and come up with stats re:how much time it actually takes to develop, catalog and ingest.

PS: Plans for current CMS? MD/CL - no current plans for CMS. Different project, will pass on info from Night Kitchen as available.

MD: Next step: Curators to go down list of fields and think what info they want to capture, identify around 50 items to begin cataloging for first batch ingest, probably first start with Hillary's image archives.

JV: Plans for search integration? MD- eventually could do a discovery layer like VUFind that integrates catalog and digital collections, but pros and cons to discuss; some benefit to them remaining separate catalogs. For website launch, might have two searches, one searches website, other searches digital collections. Could merge datasets for website, but needs more discussion. Early discussion with Night Kitchen revealed they have not worked much with Solr.

Versions Compared

Old Version 2

New Version Current

Key