export_archivesspace_xml
This is a small Heroku project in charge of weekly exports of our EAD files from the ArchivesSpace API to S3.
There is ruby code [stored in github] to export from ArchiveSpace API to EAD files on an S3 bucket. This code runs as a heroku app, scheduled to execute regularly (currently nightly). There is terraform configuration [also stored in the same github repo] to create and manage the S3 and other AWS resources involved.
More detailed documentation can also be found in the README of the GitHub repo. Here are some links and overview:
GitHub project
sciencehistory/export_archivesspace_xml
including Terraform management for the S3 bucket and the IAM user and policy
Basic commands:
cd export_archivesspace_xml/terraform
;terraform init
;terraform plan
. The terraform accurately describes the facts on the ground in AWS as of January 2022.Note that the shared state is stored in AWS as well, using the same technique as scihist-digicoll.
Heroku project
export-archivesspace-xml
Note: we currently have 3 Heroku add-ons: Proximo for proxying ($5 / month); Papertrail for logging (free); and the Heroku Scheduler to actually spin up the task (free).
Note there are heroku configuration variables needed to identify and provide access to archivespace and S3 resources, see github repo README.
Configured for “static website hosting” (more info)
The “endpoint” is at http://ead.sciencehistory.org.s3-website-us-east-1.amazonaws.com, but the bucket name is just
ead.sciencehistory.org
.
Cloudfront distribution:
We serve the EADS over HTTPS. For this we use a Cloudfront distribution.
The distribution uses a “custom SSL certificate”, which is currently our wildcard cert.
This needs to be updated every Fall (currently).
Certificates can be added at https://us-east-1.console.aws.amazon.com/acm/home?region=us-east-1#/certificates/list ; click the import button. Note that you do need to paste in the certificate chain.
The results from running
openssl s_client -connect ead.sciencehistory.org:443 -showcerts
should containVerify return code: 0 (ok)
. Otherwise, UPenn’s client will complain sending us a message reading in partcertificate verify failed (unable to get local issuer certificate)
and refuse to run the import.
CNAME ead.sciencehistory.org, which points to the s3 bucket endname.
Note that the CNAME needs to be identical to the bucket name in S3 for it to work.
Managed using ordinary sciencehistory DNS, our external partners use this hostname so it must be kept running.
IAM user that can access the bucket
IAM policy granting that user access to the bucket