ArchivesSpace is a server whose main purpose is to host a software program also named… ArchivesSpace. The program is “an open source archives information management application for managing and providing web access to archives, manuscripts and digital objects”. The server also hosts a few auxiliary programs who take the output from ArchivesSpace and convert it into various other formats, which are then made available via an Apache webserver on the same machine.
Background
We store digital descriptions of our archival collections in the following six places:
Location | Format | Number of collections described | Source | Example | Who can see it? |
---|---|---|---|---|---|
| Word documents | Roughly 270, dates 1997 – present. | This is the original collection description. |
| Institute staff |
ArchivesSpace site | MySQL-backed website | Roughly 45 as of 2020 | Entered manually based on the P drive Word files. | https://archives.sciencehistory.org/resources/81#tree::resource_81 | Only logged in ArchivesSpace users |
ArchivesSpace Apache front end | EAD (xml format) | Roughly 45 as of 2020 | Generated nightly from ArchivesSpace database | https://archives.sciencehistory.org/ead/scihist-2012-021.xml | Public |
ArchivesSpace Apache front end | HTML | Roughly 45 as of 2020 | Generated nightly from ArchivesSpace database | Public | |
OPAC | ? | Exported manually as PDF from the ArchivesSpace site, then attached to the OPAC record for the collection | https://othmerlib.sciencehistory.org/articles/1065801.15134/1.PDF | Public | |
https://guides.othmerlibrary.sciencehistory.org/friendly.php?s=CHFArchives | LibGuide | Most collections, categorized by subject. | ? | Technically public, but does not appear to be linked from anywhere. |
Workflow
Finding aids are stored as Word documents at
Shared/P/Othmer Library/Archives/Collections Inventories/Archival Finding Aids and Box Lists
.Kent enters the data in them, one by one, into ArchivesSpace. He revises them in the process. As of summer 2020 approximately 45 have been entered.
Once they are in ArchivesSpace:
They are automatically exported, via a nightly cron job described below, to EAD files https://archives.sciencehistory.org/ead/ .
They are also converted to HTML where they to the public. Examples: Wotiz; Simon; Fenn; Carbogel; Brody. There is currently no Web page that lists these HTML files, so you have to know the URL beforehand or be directed to them from e.g. Google or the OPAC.
Kent also exports them to a PDF, which he then sends to Victoria. These are entered into the OPAC. (see e.g. https://othmerlib.sciencehistory.org/articles/1065801.15134/1.PDF )
Note: the PDF has to be manually updated in the OPAC every time the metadata in ArchivesSpace changes.
In certain cases the OPAC record also points at the HTML file at https://archives.sciencehistory.org/ , which, of course, is updated nightly.
Finally, the exported EAD files are also ingested by University of Penn Libraries Special Collections and the Center for the History of Science, Technology, and Medicine (CHSTM).
Penn, in turn, processes these EAD files on a nightly basis and adds them to the Philadelphia Area Archives Research Portal (PAARP)
Example: http://dla.library.upenn.edu/dla/pacscl/detail.html?id=PACSCL_SCIHIST_2012021USpaphchf
A conversation with Holly Mengel, the archivist responsible for the process, reassured us that the only thing required for this export to work is for valid EAD files be publicly accessible in the directory at https://archives.sciencehistory.org/ead/ . This URL could be changed as long as we give Holly plenty of notice and coordinate with her, which raises the possibility of us posting them to e.g. an S3 bucket.
Likewise, CHSTM ingests these EADs and makes them searchable at its search portal.
Example https://www.chstm.org/collections/search?text=Carbogel
Attempts to contact our liaison at CHSTM, Richard Shrake, have failed. Eddie intends to follow up directly with Babak Ashrafi.
Technical details about the server
ArchivesSpace lives on an AWS S3 server ArchivesSpace-prod, at https://50.16.132.240/ (also found at https://archives.sciencehistory.org)
The current production version of Aspace is 2.7.1
.
Terminal access: ssh -i /path/to/production/pem_file.pem ubuntu@50.16.132.240
The ubuntu
user owns all the admin scripts.
The relevant Ansible role is: /roles/archivesspace/
in the ansible-inventory
codebase.
SSL is based on the following: http://www.rubydoc.info/github/archivesspace/archivesspace
The executables are at /opt/archivesspace/
The configuration file is /opt/archivesspace/config/config.rb
Logs are at: logs/archivesspace.out
Apache server is at /var/log/apache2/
Configuration for the Apache site is at /etc/apache2/sites-available/000-default.conf
. It would be a good idea to spend some time drastically simplifying this configuration.
Main users
Kent does a majority of the encoding
Hillary Kativa
Patrick Shea
Startup
To start Archivesspace:
sudo systemctl start archivesspace. You may need to run this several times (just wait 30 seconds between attempts.)/opt/archivesspace/archivesspace.sh start
(as userubuntu
)
You can troubleshoot startup by looking at the start script (invoked by the above): /opt/archivesspace/archivesspace.sh startThere may be a short delay as the server re-indexes data.
Restarting the server to fix Tomcat memory leak
We restart the ArchivesSpace program (not the server) using a cronjob that runs /opt/archivesspace/archivesspace.sh restart
every night at 2 am. This prevents a chronic memory leak from eating up all the CPU credits for the machine.
Export
The ArchivesSpace EADs are harvested by:
Institution | Liaison | Contact |
Center for the History of Science, Technology, and Medicine (CHSTM) | Richard Shrake | |
University of Penn Libraries Special Collections | Holly Mengel |
Both institutions harvest the EADs by automatically scraping https://archives.sciencehistory.org/ead/ . Once harvested, the EADs are added to their aggregated Philly-area EAD search interfaces.
The main export files are located at: /home/ubuntu/archivesspace_scripts
. They are checked into code at https://github.com/sciencehistory/archivesspace_scripts .
Important files:
| Runs the nightly export (called by cron every night at 9 PM). This calls |
| Settings |
| Extracts XML from ArchiveSpace and saves a series of EADs into It exports EADs that contains links to the actual digital objects. |
| Transforms the EADs in It relies on files (stylesheets, transformations) in
|
| Checks that the publicly accessible files in |
Once processed by generate.sh
, the xml files are publicly accessible at https://archives.sciencehistory.org/ead/
via an Apache web server.
Details about the as_export.py
script:
This code was adapted from https://github.com/RockefellerArchiveCenter/as_export
It is buggy and calling it via
complete_export.sh
is the best way to run it reliably.
Building the server
The server not yet fully ansible-ized.
What is missing from the ansible build:
It doesn’t copy the scripts over correctly.
More technical documentation
http://archivesspace.github.io/archivesspace/
Note: we are paying members of the AS consortium. We will want to set up an account for eddie:https://archivesspace.atlassian.net/wiki/spaces/ADC/pages/917045261/ArchivesSpace+Help+Center
Backups
These consist of making backups of the sql database used by the ArchivesSpace program.
Place the Mysql database in |
| Dumps the mysql database to |
Sync |
| Runs an This script is run as a crontab by user |
See Backups and Recovery for a discussion of how the chf-hydra-backup
s3 bucket is then copied to Dubnium and in-house storage.
Restoring from backup
You can get a recent backup of the database at https://s3.console.aws.amazon.com/s3/object/chf-hydra-backup/Aspace/aspace-backup.sql
Note that the create_aspace.yml
playbook creates a minimal, basically empty aspace
database with no actual archival data in it.
To restore from such a backup onto a freshly-created ArchivesSpace server,
copy your backup database to an arbitrary location on the new server
ssh in to the new server
Log into the empty
archivesspace
database:mysql archivesspace --password='the_archivessace_database_password' --user=the_user
Once at the mysql command prompt, load the database:
mysql>
\. /path/to/your/aspace-backup.sql
Documentation
https://archivesspace.atlassian.net/wiki/home contains comprehensive documentation.
If you have a sciencehistory.org
address, you can get access to it by filling out a form.