Dev notes - Hydra camp
Questions and practicalities
practicalities
Environment var management?
envyable?
How to curate, manage, maintain a set of test files / data?
use rspec
Does Blacklight provide REST APIs?
You can get search results but maybe not facets?
What db are most people using? postgres? How to move from sqlite to postgres?
use what you know or is supported by your IT department
(sqlite is not for production)
What decisions do I need to make before I launch into production? what decisions can I change later on?
what id do you want to use?
object model in fedora - don't even worry about it for the first 50-100 thousand items
don't store anything exclusively in solr
Questions
What are the encoding defaults of metadata and databases in Hydra?
what data modeling needs to be done in addition to what fedora / hydra already provide (or will provide)? default URIs? Should these be changed? If so, to what?
DEFER (may not be relevant): what are the graph manipulation tools in fedora? how are these accessed from hydra? Or is it just taken care of when API requests are made?
Re: Ordering, what is a proxy? Also has a meaning outside the context of ordering.
Answers
is XML completely deprecated in fedora 4?
Doesn't sound like it. Get more info. Mark recommends examining your use cases – legacy XML from fedora 3? maybe model it that way for fed4. Ultimately he says it doesn't matter what format you store the data in, i.e. same from a user standpoint. (fair enough)
Reasons to use XML
You have very hierarchical data in nice, clean XML that someone is attached to
Reasons to use RDF
It's the current direction of the community
code is a little bit simpler
more specific / self-describing / natively self-authoritizing
You have data-sharing use cases
how / when is bagit used with fedora? hydra?
bagit is for storing collections of files / metadata in remote locations. we have no reason to worry about this at the moment.
MIRA: open source?
General Linked Data conversation
marmotta and fedora both implement LDP (linked data platform) which is a brand new w3c specification
Marmotta is currently being used in conjunction with fedora by people who want to cache external linked data sets for local use. (e.g., only the relevant portions of LCSH, and only on an as-needed basis)
Linked Data Fragments is super new and it's not clear yet how useful it will be
Fedora3 used "data streams" Fedora4 uses the concept of "attachments" (see LDP). Fedora uses an RDF triplestore (fedora3 had to do a lot of serializing and deserializing and parsing). Fedora still has binary datastreams (used for attachments, non-RDF stuff).
You can theoretically use SPARQL directly into fedora's triplestore. but this would be a bad idea slash only for very expert people.
How does hydra/fedora handle preservation? E.g. checksums.
Fedora4 doesn't yet carry forward all the technical support for this that Fedora3 had. That's why there is current work around an audit service.
archivematica people believe that there are holes in hydra preservation. archivematica
Can you use fixity checks built in to your storage platform? If so, don't do it in fedora / hydra.
https://github.com/psu-stewardship/scholarsphere/wiki/Fixity-in-Sufia-with-Fedora-4 - confirm that this doc is still relevant and ask to talk to these people.
There is a Hydra preservation interest group (note for Michelle)
Other things to be aware of: provenance and history of edits - planned auditing service will take this on. Currently / formerly everything was recorded as a single user.
Consider thinking about fedora storage formats. What happens if you lose fedora?
What does it look like to upgrade?
Have sufficient test coverage
Make sure all tests are passing
Increase versions of dependencies in your Gemfile
Run tests and observe failures.
Start turning tests green
Uncategoriezed notes
- Reindex everything into Solr from Fedora!
- ActiveFedora::Base.reindex_everything
- opaquenamespaces: a community registry / namespace for RDF properties. Probably best practice to try to put locally-required properties there or somewhere similar. This was started by Karen and Tom at U Oregon.
- Page numbers: can use this as literal sequence and put marked pagenumbers in page label property.
- Book.where(title:"hat") # note this returns an array, not a single book object (the way find does)
- re: IDs. consider using the default id as an internal identifier and another field as a local id for human use.
- one challenge of hosting on amazon is moving large files around. sufia uses ffmpeg and converts them to playable proxies. anything that can be handled by imagemagick, openoffice / libreoffice (creates a thumbnail from the first page of the doc)
- Currently recommending different heads for different types of collections. use a shared gem for the data models. Then create a separate admin head for managing them all in one place.
- sufia has a new wiki page on metadata modeling https://github.com/projecthydra/sufia/wiki/Customizing-Metadata
- DCE also recommends keeping number of servers as low as possible until metrics indicate you should make changes (and can measure those changes on desired vectors)
- Dspace, libguides, digital library publishing (drupal, node apps), archive-it – NYU harvests into hydra (ichabod) using Internet Archive API. metadata of record is in another system, but can be supplemented in the admin backend. Also have batch-loaded enrichment data.
- EAD 'kitchen sink' fixture! https://github.com/NYULibraries/findingaids/blob/development/spec/fixtures/examples/EAD_Tracer.xml
- the findingaids app they have is similar to stanford's arclight.
- http://summit2015.lodlam.net/about/
- fedora 4 implements w3c standard for access controls
- REST vs. CRUD
- GET, POST, PUT/PATCH, DELETE (ActiveFedora, HTTP)
- Read, Ceate, Update, Delete - ActiveRecord (RDMS)
- ORM - object relational mapper - the code that interfaces with the actual database
- LDP is a way to use REST to talk about objects with containment relationships
- Gemfile.lock shows the expansion of the gemfile
- If developing on core, replace hydra gem with the gemspec contents of same if you want to mess with changing versions of those dependencies. (or if you want to manage / change these versions manually)
- look at other peoples' .gitignore files for rails and sufia projects
- What should our ID be? sufia uses NOIDs. note: fedora has a concept of different minters so this may be a factor here as well. NOID translates to fedora as pair paths, but fedora doesn't actually store it that way. so why did sufia do it this way?
- characterization object contains xml. you could take each of those values and store them as properties.
- different 2nd-level facets: see dl.tufts.edu
- one-off pages / static pages
- curationexperts/alexandra-v2 see welcome/about see also index, contact form. note empty controllers and views that go with them.
Deployment
- travis-ci.org/curationexperts/alexandria-v2/builds
- https://travis-ci.org/projecthydra/sufia
capistrano: deployment manager. redundant with a ci workflow?
bambu - stanford's environment management solution.
- PRODUCTION setup
- tomcat, solr, fedora replace hydra-jetty / jetty wrapper
- postgres as opposed to sqlite
- WATCH RAILS VERSIONS between multithreading and databases
- staging
- may be as much like production as possible. may have less CPU power, less memory, smaller HD. May be an exact clone. Also take into account how much time / effort this may require.
- ditch testunit and install rspec. spec directory:
- spec
- fixtures
- pbcore
- artesia
- joyce_chen
- image_1.xml
- joyce_chen
- mars (filemaker database)
- audio_1.xml
- image_1.xml
- artesia
- pbcore
- fixtures
- spec
- look at fixtures vs. factories
- fixtures have to be maintained.
- factories behave the way you tell them to behave; sometimes you need to put in real data.
- sandy metz railsconf 2013 presentation video
- github.com/afred/openvault - look at factories here.
- github.com/projecthydra-labs/hydradam
- github.com/WGBH/pbucore
- huge xslt stylesheet to convert XML into RDF-XML, which they will then use to load into fedora4.
- Amazon Ops products: elastic beanstalk, opsworks
Testing
- rspec only? selinium with rspec?
- or capybara with cucumber?
Example fedora instances:
- scholarsphere
- dl.tufts.edu - tufts digital library - put a hydra head on top of existing fedora repo. awesome transcription / TEI w/ embedded timecode / audio player
- levysheetmusic - changes / customizations to interface
- hullhistorycentre.org.uk - hull city archives - example of EADs. (nice search box page!)
- hydra.hull.ac.uk - has a backend with workflow stuff. would likely be happy to give a short demo. (also note interesting icons)
- alexandria digital research
- spotlight (stanford) - library.stanford.edu/projects/spotlight - for exhibit building. - note: blacklight gallery gem gives you different views of results lists.
- another gem: date slider
- digital.case.edu (built on worthwhile, rdf-driven) - open seadragon + iiif-compliant server for amazing image viewing. view metadata / different formats.
- dl.tufts.edu - MIRA (management of institutional repository assets. more workflow-type, controlled deposit.
- http://demo.curationexperts.com/
- WGBH - digitize on-demand. Metadata is published and there's a button.
- HydraDAM (replaced Artesia at WGBH)
- single EAD site (blacklight-only) http://bassiveratti.stanford.edu/en/catalog
Syllabus
March 9th-13th, 2015
Yale University Library
New Haven, CT
Course Goals
The goal of Hydra Camp is to introduce new developers to the skills and tools they will need to successfully build Hydra based digital repository solutions. There’s a lot of ground to cover and you won’t walk out at the end of the week a complete expert, but we hope we’ll have provided you enough of a scaffolding to jump-start your own work and keep learning like the rest of us. We hope that the topics covered at Hydra Camp provide enough breadcrumbs that you’ll have a good idea where to start looking once you get home and start digging into problems on your own!
Supplies
Laptop & Power Supply
Headphones/Earbuds
Water bottle/Travel Mug
Location
WALKING - Meet in The Study lobby at 8:45am Monday morning, walk as a group - https://goo.gl/maps/kH3Cu
DRIVING - Commuters should park in any of the visitor lots on campus: http://to.yale.edu/parking-map
WIFI SSID: available Monday
Prerequisites
If you have never used Ruby, visit http://tryruby.org for an interactive tutorial.
We’ll be providing a VirtualBox VM with everything you need for development, if you have time please install VirtualBox (download here) before class. Also install Vagrant (download here) if possible
Alternatively, if you want to run a local development environment directly on your laptop - RailsBridge Installfest has good instructions for getting your system setup: http://installfest.railsbridge.org/installfest/ - you can skip the Heroku steps.
Local Development Environment Requirements
If you’re using a Mac, install/update XCode & homebrew.
Install Git (on Mac, we recommend using homebrew to install git)
Install RVM + Ruby 2.1.5 using RVM `rvm install ruby-2.1.5`
(if you follow the installfest exactly, 2.0.0 also works fine)
Install Java 7 runtime (if you already have 6 installed, that also works)
Install Rails 4.1.9: `gem install rails 4.1.9`
Install a Text Editor of your choice. KomodoEdit is a popular free option. Many people use TextMate (not free). VIM is hardcore but some of us do it.
Create a (free) Github Account if you don't already have one: https://github.com/signup/free
Create Github SSH Keys and set them up for your development machine
NOTE: We’ll have help available in class if you run into any troubles getting your system set up. We’ll have the VirtualBox image configured with Ubuntu and the necessary tools to complete class exercises or can help install all necessary software locally.
Syllabus
https://docs.google.com/document/d/10YeaUkYV-akfLQhqVd7Zffy_UxosdDUgzu6RxuIsxBA/edit#
Day 1: Monday, March 9th
MORNING - start 9:00am
Welcome - Housekeeping - Introductions
Your name, institution, something unique about yourself
Course goals!
Rails for Zombies - at your own pace - bring headphones https://www.codeschool.com/courses/rails-for-zombies-redux
IN PARALLEL - finish Virtual Box & Vagrant setup, distribute VMs
LUNCH ~12:30 - on your own
Setup VMs and/or confirm local dev environments
Config the local git with your name and e-mail
git config --global user.name "Your Name"
git config --global user.email you@example.comDive into Hydra: https://github.com/projecthydra/hydra/wiki/Dive-into-Hydra
NOTE: before step “Lesson: install hydra jetty” - copy
master.zip from the home directory to the hydra-demo/tmp directory
cp ~/master.zip tmpIndividual help available for install questions
[Optional] RailsBridge - Intro to Rails
http://docs.railsbridge.org/intro-to-rails/ - you can skip Heroku sectionsDINNER - on your own
Day 2: Tuesday, March 10th
MORNING - start 9:00am
Hydra Framework Technical Overview
https://wiki.duraspace.org/display/hydra/Technical+Framework+and+its+PartsHydras in the Wild - examples of live Hydra heads
See links to examples under Resources
LUNCH - on your own
[Optional] Start modelling your own metadata
[Optional] Create some additional content (books) to search
[Optional] http://vim-adventures.com
DINNER - group dinner @ TBA (?)
Day 3: Wednesday, March 11th
MORNING - start 9:00 am
Install Sufia (Development): https://github.com/projecthydra/sufia
see also http://demo.curationexperts.com
NOTE: git ssh may not work over the Yale Guest WIFI, use these lines in your Gemfile:
gem 'sufia', '6.0.0.rc4'
gem 'kaminari', git: 'https://github.com/harai/kaminari.git', branch: 'route_prefix_prototype'
gem 'blacklight', '~> 5.9.0'Dependencies are pre-installed in your VM:
Redis: apt-get or brew install
FITS: https://github.com/curationexperts/hydradam/wiki/Installation%3A-fits
Imagemagick: apt-get or brew install
Collaborative development exercise using github
Also see:LUNCH ~11:30 - Visit Sterling, Beinecke, and eat at Trumbull College.
Production Deployment
http://curationexperts.com/2013/10/07/the-hydra-production-stack/
https://github.com/curationexperts/hydradam/wiki/Production-Installation%3A-Overview
[OPTIONAL] Solrizer walkthrough - additional references:
https://github.com/projecthydra/solrizer (see Readme)
[OPTIONAL] Blacklight quickstart https://github.com/projectblacklight/blacklight/wiki/Quickstart https://github.com/projectblacklight/blacklight/wiki/Configuring-and-Customizing-Blacklight
[Optional] Solr Tutorial: https://lucene.apache.org/solr/4_7_1/tutorial.html
DINNER - on your own
Day 4: Thursday, March 12th
MORNING - start 9:00am
MORNING - start 9:00 am
XML Metadata - https://github.com/projecthydra/hydra/wiki/Lesson:-Build-a-Book-Model-in-XML
Create an object in Sufia and Hydra-Demo repos with architecture walk-through
[OPTIONAL] Managing descriptive Metadata:
XML:https://github.com/projecthydra/om/wiki/Tame-your-XML-with-OM OR
RDF:https://github.com/projecthydra/active_fedora/wiki/Tame-your-RDF-Metadata-with-ActiveFedoraGroup photo: https://www.flickr.com/photos/126513397@N05/15064527352/ The Hydra developer toolkit
Production Deployment
http://curationexperts.com/2013/10/07/the-hydra-production-stack/https://github.com/curationexperts/hydradam/wiki/Production-Installation%3A-Overview
LUNCH - on your own
AFTERNOON - Students depart as necessary for transit
Q&A
Development process - feature branches, forks, pull requests
see: http://ndlib.github.io/practices/ruby-and-rails-developer-tools/
Hydra-tech ■ IRC ■ Bundler & gems ■ DRY
Debugger ■ Better Errors ■ Fedora Admin ■ IDEs
The Hydra Community
https://wiki.duraspace.org/display/hydra/Hydra+Community+FrameworkOperations & Production Deployment
Avalon deployment - see especially Virtual Machine install #2: http://www.avalonmediasystem.org/download
HydraDam Install instructions
[OPTIONAL] Hydra Access Controls: https://github.com/projecthydra/hydra-head/wiki/Access-Controls-with-Hydra
Day 5 [BONUS]: Friday, March 13th
Blacklight workshop
RESOURCES
Get on the hydra-tech mailing list & IRC:
https://groups.google.com/forum/#!forum/hydra-tech
https://wiki.duraspace.org/pages/viewpage.action?pageId=43910187Hydra Developers Page
https://wiki.duraspace.org/display/hydra/Developers
also at https://github.com/projecthydra/hydra/wiki/For-DevelopersContributing Code
Rails engines - see http://edgeguides.rubyonrails.org/engines.html
Hydra Hacking - OM, customized views, search and facet customization, testing, access restrictions, etc.
Legal https://wiki.duraspace.org/display/hydra/Hydra+Licensed+Contributors
Practical https://github.com/projecthydra/hydra/blob/master/CONTRIBUTING.md
Release notes and wiki for individual gems (varies)
Release notes: eg. https://github.com/projecthydra/active_fedora/releases
Semantic versioning: http://semver.org
General Rails Programming
CodeSchool: https://www.codeschool.com/paths/ruby#starting-rails
RailsBridge: http://docs.railsbridge.org/docs/
Rails Guides: http://guides.rubyonrails.org
Humble Little Ruby Book: http://www.humblelittlerubybook.com
RAILS APIs: http://api.rubyonrails.org/
Skilled Up: http://www.skilledup.com/learn-ruby-on-rails-guide/
Data Models
Dive into Hydra - the basic case
Sufia - Medium Sized RDF file-centric model
https://github.com/projecthydra/sufia/blob/master/sufia-models/app/models/concerns/sufia/generic_file/metadata.rb
https://github.com/projecthydra/sufia/blob/master/sufia-models/app/models/datastreams/generic_file_rdf_datastream.rbetc.
UCSD DAMS - Comprehensive RDF modelling of ‘the world’
https://github.com/ucsdlib/damsExample Hydra and Hydra related sites
IUCat (Blacklight Only) - http://www.iucat.iu.edu
Spotlight (Blacklight + Spotlight) -
Live Exhibit - Maps of Africa
Blog post with videos
Avalon Media System: http://www.avalonmediasystem.org/
Integrated Development Environments
RubyMine: http://www.jetbrains.com/ruby/
Aptana RadRails: http://www.aptana.com/products/radrails
Data Curation Experts
Website & contact info: http://curationexperts.com
Tell your Friends about Hydra Camp!
http://curationexperts.com/who-we-are/about/hydra-camp/
RDF book: Semantic Web for the Working Ontologist: Effective Modeling in RDFS and OWL http://workingontologist.org/
RDF Primer: http://www.w3.org/TR/2014/NOTE-rdf11-primer-20140225/
ResourceSync: http://www.openarchives.org/rs/toc
Fixity in Sufia with Fedora 4: https://github.com/psu-stewardship/scholarsphere/wiki/Fixity-in-Sufia-with-Fedora-4
Sample code for exercises
https://gist.github.com/afred/660e8f43026ad08a992d
def show_my_stuff(solr_parameters, user_parameters)
solr_parameters[:fq] ||= []
solr_parameters[:fq] << "has_model_ssim:(\"#{Book.to_class_uri}\" OR \"#{Manuscript.to_class_uri}\")"
end
Blacklight slides (content only)
Customization Examples
Vanilla Blacklight
http://demo.projectblacklight.orgProject Showcase
http://projectblacklight.orgLevy Sheet Music Collection
http://levysheetmusic.mse.jhu.eduHull History Center
http://catalogue.hullhistorycentre.org.uk/catalogue/U-DAR
http://catalogue.hullhistorycentre.org.uk/catalogue/U-DAR-x2-4-63
Documentation & Getting Help
Github - projectblacklight
https://github.com/projectblacklight
.../blacklight/wiki + wiki/Quickstart
...wiki/Configuring-and-Customizing-BlacklightBootstrap
http://getbootstrap.com
...especially http://getbootstrap.com/css/Look at the repos, use your web inspector
E-mail & IRC - .../blacklight/wiki#support
Setting up the Dev Environment
Get the VM running
Install VirtualBox & Vagrant
Copy tutorial directory
Open a terminal window & cd to tutorial
vagrant up
vagrant ssh
Set up a new blacklight project
mkdir /vagrant/projects
ln -s /vagrant/projects projects
cd projects
rails new search_app -m https://raw.github.com/projectblacklight/blacklight/master/template.demo.rb
cd search_app
nano Gemfile - uncomment #gem 'therubyracer'
bundle install
rake jetty:start
rake solr:marc:index_test_data
rails server
connect to http://localhost:3000
Customizing catalog_controller.rb
Open the file in your editor
Read through the comments, try changes
Change how fields display on index views
(i.e. search results)Turn off author search, turn on year
What else can you do via catalog controller configuration?
Internationalization
Bassi Veratti
Site: http://bassiveratti.stanford.edu/en/catalog
Code:https://github.com/sul-dlss/bassi_veratti/blob/master/ app/views/catalog/_home_text.html.erb
Translations: https://github.com/sul-dlss/bassi_veratti/blob/master/ config/locales/it_home.yml
Docs: http://guides.rubyonrails.org/i18n.htmlEXERCISE: Create a custom English and Hindi welcome message
Advanced Customization
Code Examples
https://github.com/jkeck/hc-blacklight-app
EXERCISE: Add flags (thumbnails) to listings to visualize language
maybe use something like: http://upload.wikimedia.org/wikipedia/en/thumb/4/4c/Flag_of_Sweden.svg/320px-Flag_of_Sweden.svg.png ?
Customization best practices
Make the smallest change possible to meet your requirements:
Add tests - if you make a change, will it still work the same way after an upgrade?
Ask if there’s an easier way
Other examples
Filtering - don’t show deleted items https://github.com/curationexperts/mira/pull/299/files