OHMS Media URLs

In the OHMS metadata editor, you supply a URL for your media file. The OHMS staff editor uses this URL for giving you a player for sync tasks; the same URL is then included in the XML that powers the end-user viewer – so it’s sort of “fixed” once that XML is loaded – if you are using the standard OHMS viewer, if we are making our own custom one, it can ignore the URL from the OHMS XML, and simply use application logic to create the audio player with appropriate URL.

We will be using derviative media to power OHMS. Likely a “stitched-together” mp3, but even if the work has a single audio file, it’ll still be a derivative.

Our derivatives (as well as our originals) are stored on S3, and generally delivered directly to users from S3.

Our derivatives are currently all stored publically-accessible, even if they are non-public (this is a known flaw in our infrastructure, that hasn’t been prioritized to fix, but we may fix sometiem).

1. S3 public URLs work fine with OHMS

Eg: https://scihist-digicoll-staging-derivatives.s3.amazonaws.com/bca805c3-c07b-4689-897f-4161f6ad9cc6/small_mp3/868eaf7f970cbeb8da3ec5a2a79d39b3.mp3

When added to OHMS metadata as “Media URL” – works just fine, including seeking behavior. Works fine in editor, works fine in standard OHMS viewer “preview”

 

2. Signed S3 URLs do NOT work in OHMS

You can create a “signed” S3 URL, to: A) provide (time-limited) access to a non-public asset, or B. instruct S3 to set certain response headers.

That might look like this:

https://scihist-digicoll-staging-derivatives.s3.amazonaws.com/bca805c3-c07b-4689-897f-4161f6ad9cc6/small_mp3/868eaf7f970cbeb8da3ec5a2a79d39b3.mp3?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAU4GX5J7E4SLLVKXR%2F20200219%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20200219T174024Z&X-Amz-Expires=900&X-Amz-SignedHeaders=host&X-Amz-Signature=d3db772cbc3e0470ca7df58e035e8c250d24e7851416a78add070aec0f5c5ad2

OHMS editor will refuse to use such a URL, it says:

After more experimintation, the problem seems to be that OHMS editor will refuse to use any URL with a ? (query string) in it. Just adding ?foo=bar to the end of a URL that otherwise works – and where the ?foo=bar version still delivers the same working thing -- makes OHMS editor refuse to use it.

This doesn’t make a lot of sense, I can’t think of any reason OHMS editor should refuse to use a URL with ? in it, it can be fetched and used just like any URL, normally. It seems like a bug to me, but that doesn’t mean it will be a high priority for OHMS to fix (or they may not see it as a bug for some reason).

However, even if OHMS would use signed URLs, they are time-limited (up to max one week), so while they might work for a week’s worth of editing, they’d have to be constantly updated – and would not be suitable for including in a static XML file to power standard OHMS viewer.

3. OHMS won’t use a URL that redirects

While our “public” S3 URLs will work for world-public assets, we don’t really treat them like persistent URLs we commit to keeping valid forever – normally we redirect to them from a URL like https://staging-digital.sciencehistory.org/downloads/ckc096h/small_mp3 , which is meant to be persistent.

Can we give OHMS that original URL, that redirects to an S3 URL?

No, OHMS will refuse to use it, with same error message as above in 2. This is true whether the redirect target has ? in it or not, the redirect alone is enough for OHMS to reject it, it needs a URL that delivers the media directly without a redirect.

Since we aren’t committing for URLs to never change (they’d change if we re-created our derivatives because we realized there was a parameter error of some kind, for instance) – it could be a problem if they change.

Conclusions

  1. We will use S3 public URLs with OHMS. At some point some part of the staff interface will suggest the OHMS Media URL to use.

  2. If we fix our derivatives to no longer be all-public – we’ll still need to make sure that derivatives for public Assets have public S3 ACLs, so they can be used with OHMS, we’ll need a process for keeping S3 ACLs sync’d with app permissions, rather than simply use signed URLs.

  3. I will try to find out if there is any way to bulk/programmatic update/edit metadata in OHMS, in cases where our URLs change.

  4. As another option, we could provide an action in our app that effectively proxies bytes from S3. So we could provide a URL that looked like whatever we wanted (no ? to mess up OHMS), that delivered bytes from S3 – even if the S3 object not have a Public ACL. And our app URL could remain the same even if S3 URL changed (because our app knows the real URL, same as it uses for redirecting from stable URL).

    1. This is a bit of a pain to get right, because we need to handle HTTP Range headers. But Shrine provides some instructions.

    2. The main downside of this is it would keep a Web Worker busy longer, delivering very large files – this is why we try to deliver direct from S3 instead.

    3. We will not pursue this right away, but making note of it as a “big hammer” if we run into problems it would solve. But for keeping a web worker busy with file delivery, it could theoretically resolve all other issues. We maybe would want to look at different deploy architecture giving us a higher bound on web workers (multi-threaded request dispatch, etc). Or some fancy solution involving nginx proxying with web workers getting out of the way.