HLS streaming video and AWS MediaConvert

We convert our videos to the HLS format for an adaptive bitrate streaming-friendly format. We use AWS MediaConvert to do so, also using the ActiveEncode gem’s MediaConvert adapter to interact with the MediaConvert service.

Original main Github issue: https://github.com/sciencehistory/scihist_digicoll/issues/1659 , which documents some design-decisions, and links to PR’s for initial code implementation.

AWS Resources used

Buckets

  • HLS is stored at buckets scihist-digicoll-staging-derivatives-video and scihist-digicoll-production-derivatives-video

    • There is NO replication/backup bucket for either of these. For now we’ve decided not to do backups of this large-sized re-createable content.

    • derivatives bucket for HLS is a publicly-readable bucket! This is consistent with most of our other derivatives at present, public S3 even for works/assets that may not be marked published in the app. The URLs involve some unguessable random strings.

      • If we needed to make this instead actually access controlled at the bucket level, we’d have to add some extra infrastructure – you can’t simply serve a signed URL to the HLS .m3u8 playlist file because that file includes references to other unsigned URLs. You could probably use CloudFront, using a signed cookie as access control instead of signed URLs (which would require giving the S3 bucket a sciencehistory.org hostname so the app could set cookies for it) . Or an entirely different architecture, like switching to an all-in-one video server like MediaPackage instead of using S3 at all.

Cloudfront CDN

We have a cloudfront distribution in front of the -derivatives-video buckets holding the HLS. It’s a public bucket, so doesn’t need any cloudfront authentication stuff, which we haven’t tackled yet.

It’s controlled by terraform, and then the Shrine S3 storage is set with a “host” config param pointing to it. (So urls generated by shrine will be via cloudfront hostname, not direct S3 bucket).

AWS Presets

AWS Presets are not controlled by terraform (not supported by terraform), but are stored as JSON in our code repo. https://github.com/sciencehistory/scihist_digicoll/tree/master/infrastructure/aws-mediaconvert

Use of AWS presets aws-mediaconvert-preset-high, -medium, and -low are hard-coded into our code.

Some discussion of preset choices at https://github.com/sciencehistory/scihist_digicoll/issues/1693

IAM Role for MediaConvert jobs

MediaConvert API requires you to supply an IAM role that the MediaConvert jobs will execute under.

We’ve created separate ones for staging and production, with only access to appropriate bucketes. Also a dev one.

These are not as of initial implementation controlled by terraform because it was too challenging for us to make that happen. Instructions in this AWS tutorial are helpful; except we restricted S3 access to by using appropriate existing policies to restrict to just staging/production S3 buckets.

Provided as config to app by ScihistDigicoll::Env aws_mediaconvert_role_arn, they might be arn:aws:iam::335460257737:role/scihist-digicoll-DEV-MediaConvertRole, arn:aws:iam::335460257737:role/scihist-digicoll-staging-MediaConvertRole, arn:aws:iam::335460257737:role/scihist-digicoll-production-MediaConvertRole

 

MediaConvert policy for users

Then, we needed to create IAM policies mediaconvert_dev, mediaconvert_staging, and mediaconvert_production, which are assigned respectively to dev users group, s3_digicoll_staginguser, and s3_digicoll_production user.

These policies grant access to mediaconvert functions, and to iamPassRole allowing them to pass the roles created above to mediaconvert!

Eg

{ "Sid": "mediaconvertActions", "Effect": "Allow", "Action": "mediaconvert:*", "Resource": "*" }, { "Sid": "iamPassRole", "Effect": "Allow", "Action": "iam:PassRole", "Resource": "arn:aws:iam::335460257737:role/scihist-digicoll-staging-MediaConvertRole" },

 

Source Code Implementation Overview

Written in May 2022 at time of implementation, note links are to historical source code at that time.

  1. An Asset has an after_promotion hook that triggers launch of a MediaConvert job.

  2. The work of creating the MediaConvert job is done in CreateHlsMediaconvertJobService, which uses the ActiveEncode gem’s MediaConvert adapter to actually interface with AWS MediaConvert. (ActiveEncode is configured to use mediaconvert in ./config/initializers/active_encode.rb.

  3. The MediaConvert job is an asynchronous process – a launched job is recorded in an ActiveEncodeStatus ActiveRecord model (database row).

  4. MediaConvert needs to be periodically polled to get updates to see when the job is done. ActiveEncodeStatus#refresh_from_aws does that polling (via ActiveEncode). At the moment, lacking an ActiveJob adapter that lets us future-schedule jobs, we run a heroku scheduled task that updates all outstanding statuses, by calling a rake task (which also deletes old no longer of use status rows).

  5. The current known status of any related MediaConvert jobs is shown on the admin page for a specific video asset.

  6. After we have discovered a MediaConvert job is done, we store the location of the master playlist .m3u8 is stored in a Shrine attachment on Asset. This attachment configuration is defined by the local Shrine uploader class VideoHlsUploader. There are a few unusual things about this Shrine uploader:

    1. Instead of being stored in a database column of it’s own as usual, it’s stored in an attribute created with attr_json that is just a key in the json_attributes jsonb column. This pretty much just works!

    2. Instead of uploading files through the Shrine attachment, we have MediaConvert produces them on the S3 drive that the attachment already points to. Then we store the location of only the master playlist .m3u8 file in the attachment. Asset#hls_playlist_file_as_s3= has logic for taking an s3:// URI reference and pointing the shrine attachment at it.

    3. We override the deletion/destroy logic so when Shrine tries to delete the attachment, instead of only deleting the master .m3u8 playlist, it deletes the entire directory it was in, with all accompanying HLS files.