This is a small Heroku project in charge of weekly exports of our EAD files from the ArchivesSpace API to S3.
There is ruby code [stored in github] to export from ArchiveSpace API to EAD files on an S3 bucket. This code runs as a heroku app, scheduled to execute regularly (currently nightly). There is terraform configuration [also stored in the same github repo] to create and manage the S3 and other AWS resources involved.
More detailed documentation can also be found in the README of the GitHub repo. Here are some links and overview:
GitHub project
sciencehistory/export_archivesspace_xml
including Terraform management for the S3 bucket and the IAM user and policy
Basic commands:
cd export_archivesspace_xml/terraform
;terraform init
;terraform plan
. The terraform accurately describes the facts on the ground in AWS as of January 2022.Note that the shared state is stored in AWS as well, using the same technique as scihist-digicoll.
Heroku project
export-archivesspace-xml
Note: we currently have 3 Heroku add-ons: Proximo for proxying ($5 / month); Papertrail for logging (free); and the Heroku Scheduler to actually spin up the task (free).
Note there are heroku configuration variables needed to identify and provide access to archivespace and S3 resources, see github repo README.
Configured for “static website hosting” (more info)
The “endpoint” is at http://ead.sciencehistory.org.s3-website-us-east-1.amazonaws.com, but the bucket name is just
ead.sciencehistory.org
.Unfortunately, currently accessed only over http. We’d need to set up cloudfront for https access to an s3 bucket, which we have not currently done.
CNAME ead.sciencehistory.org, which points to the s3 bucket endname.
Note that the CNAME needs to be identical to the bucket name in S3 for it to work.
Managed using ordinary sciencehistory DNS, our external partners use this hostname so it must be kept running.
IAM user that can access the bucket
IAM policy granting that user access to the bucket