Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 49 Next »

ArchivesSpace is a server whose main purpose is to host a software program also named… ArchivesSpace. The program is “an open source archives information management application for managing and providing web access to archives, manuscripts and digital objects”. The server also hosts a few auxiliary programs who take the output from ArchivesSpace and convert it into various other formats, which are then made available via an Apache webserver on the same machine.

Background

We store digital descriptions of our archival collections in the following six places:

Location

Format

Number of collections described

Source

Example

Who can see it?

Shared/P/Othmer Library/Archives/Collections Inventories/Archival Finding Aids and Box Lists

Word documents

?

This is the original collection description.

?

Institute staff

ArchivesSpace site

MySQL-backed website

Roughly 45 as of 2020

Entered manually based on the P drive Word files.

https://archives.sciencehistory.org/resources/81#tree::resource_81

Only logged in ArchivesSpace users

ArchivesSpace Apache front end

EAD (xml format)

Roughly 45 as of 2020

Generated nightly from ArchivesSpace database

https://archives.sciencehistory.org/ead/scihist-2012-021.xml

Public

ArchivesSpace Apache front end

HTML

Roughly 45 as of 2020

Generated nightly from ArchivesSpace database

https://archives.sciencehistory.org/2012-021.html

Public

OPAC

PDF

?

Exported manually as PDF from the ArchivesSpace site, then attached to the OPAC record for the collection

https://othmerlib.sciencehistory.org/articles/1065801.15134/1.PDF

Public

https://guides.othmerlibrary.sciencehistory.org/friendly.php?s=CHFArchives

LibGuide

Most collections, categorized by subject.

?

Subject: nuclear chemistry

Technically public, but does not appear to be linked to from anywhere.

Workflow

Technical details about the server

ArchivesSpace lives on an AWS S3 server ArchivesSpace-prod, at https://50.16.132.240/ (also found at https://archives.sciencehistory.org)

The current production version of Aspace is 2.7.1 .

Terminal access: ssh -i /path/to/production/pem_file.pem ubuntu@50.16.132.240

The ubuntu user owns all the admin scripts.

The relevant Ansible role is: /roles/archivesspace/ in the ansible-inventory codebase.

SSL is based on the following: http://www.rubydoc.info/github/archivesspace/archivesspace

The executables are at /opt/archivesspace/

The configuration file is /opt/archivesspace/config/config.rb
Logs are at: logs/archivesspace.out

Apache server is at /var/log/apache2/

Configuration for the Apache site is at /etc/apache2/sites-available/000-default.conf. It would be a good idea to spend some time drastically simplifying this configuration.

Main users

  • Kent does a majority of the encoding

  • Hillary Kativa

  • Patrick Shea

Startup

  • To start Archivesspace: sudo systemctl start archivesspace. You may need to run this several times (just wait 30 seconds between attempts.)

    • /opt/archivesspace/archivesspace.sh start (as user ubuntu)

  • You can troubleshoot startup by looking at the start script (invoked by the above): /opt/archivesspace/archivesspace.sh start

  • There may be a short delay as the server re-indexes data.

Restarting the server to fix Tomcat memory leak

Note: as of August 2020, the below procedure has been rendered obsolete. We are now simply restarting the server with /opt/archivesspace/archivesspace.sh restart every Sunday at 2 am, which appears to solve the problem before it occurs.

ArchivesSpace has a memory leak that causes it to use more CPU time than it should. This will slowly drain all the burst credits, at which point the server slows down.

Another clue: if you go to the AWS console for the server, under the Monitoring tab, if the CPU Utilization graph shows anything over about 15 %.

Procedure:

  • Contact all the main users listed above (especially Kent), and make sure they’re not actively working on the server.

  • Once given the go-ahead:

    • Log in to the server.

    • Throughout the process, keep in mind you can run sudo systemctl status archivesspace for the daemon status at any point. If it’s running, you’ll see a variation on: [...]
      Active: active (running) since Tue [...]

    • /opt/archivesspace/archivesspace.sh statusshould give you the status of the program. (“ArchivesSpace is running as (PID: 7483)”).

    • Run top in a separate window to monitor the CPU usage. The goal is to see a dramatic reduction in usage after this process.

    • sudo systemctl stop archivesspace

    • sudo systemctl start archivesspace (You may have to run this two or three times – the start script is finicky)

    • /opt/archivesspace/archivesspace.sh restart

    • If all else fails, you can also go into the AWS console and reboot the EC2 instance.

    • Once everything is properly restarted:

      • the https://archives.sciencehistory.org/ front-end is available again

      • After a few minutes, you should see the CPU use go down dramatically in top.

      • The AWS monitoring graph for CPU Utilization graph should drop. (see figure below.)

    • Once you’re done, notify all involved that the server is available again.

Export

The ArchivesSpace EADs are harvested by:

Institution

Liaison

Contact

Center for the History of Science, Technology, and Medicine (CHSTM)

Richard Shrake

shraker13@gmail.com

University of Penn Libraries Special Collections

Holly Mengel

hmengel@pobox.upenn.edu

Both institutions harvest the EADs by automatically scraping https://archives.sciencehistory.org/ead/ . Once harvested, the EADs are added to their aggregated Philly-area EAD search interfaces.

The main export files are located at: /home/ubuntu/archivesspace_scripts . They are checked into code at https://github.com/sciencehistory/archivesspace_scripts .

Important files:

complete_export.sh

Runs the nightly export (called by cron every night at 9 PM). This calls as_export.py and generate.sh below.

local_settings.cfg

Settings

as_export.py

Extracts XML from ArchiveSpace and saves a series of EADs into /exports/data/ead/*/*.xml .

It exports EADs that contains links to the actual digital objects.

generate.sh

Transforms the EADs in /exports/data/ead into HTML and and saves them into var/www/html. See for instance https://archives.sciencehistory.org/beckman e.g.

It relies on files (stylesheets, transformations) in

finding-aid-files
fa-files

xml-validator.sh

Checks that the publicly accessible files in /var/www/html/ead/ are valid.

Once processed by generate.sh, the xml files are publicly accessible at https://archives.sciencehistory.org/ead/

via an Apache web server.

Details about the as_export.py script:

Building the server

The server not yet fully ansible-ized.

What is missing from the ansible build:

  • It doesn’t copy the scripts over correctly.

More technical documentation

http://archivesspace.github.io/archivesspace/

Note: we are paying members of the AS consortium. We will want to set up an account for eddie:https://archivesspace.atlassian.net/wiki/spaces/ADC/pages/917045261/ArchivesSpace+Help+Center

Backups

These consist of making backups of the sql database used by the ArchivesSpace program.

Place the Mysql database in /backup

mysql-backup.sh

Dumps the mysql database to /backup/aspace-backup.sql.
This script is run as a crontab by user ubuntu : 30 17 * * 1-5 /home/ubuntu/archivesspace_scripts/mysql-backup.sh

Sync /backup to an s3 bucket

s3-backup.sh

Runs an aws s3 sync command to place the contents of /backup at https://s3.console.aws.amazon.com/s3/object/chf-hydra-backup/Aspace/aspace-backup.sql?region=us-west-2&tab=overview.

This script is run as a crontab by user ubuntu : 45 17 * * 1-5 /home/ubuntu/archivesspace_scripts/s3-backup.sh

See Backups and Recovery for a discussion of how the chf-hydra-backup s3 bucket is then copied to Dubnium and in-house storage.

Restoring from backup

You can get a recent backup of the database at https://s3.console.aws.amazon.com/s3/object/chf-hydra-backup/Aspace/aspace-backup.sql

Note that the create_aspace.yml playbook creates a minimal, basically empty aspace database with no actual archival data in it.

To restore from such a backup onto a freshly-created ArchivesSpace server,

  • copy your backup database to an arbitrary location on the new server

  • ssh in to the new server

  • Log into the empty archivesspace database:

    • mysql archivesspace --password='the_archivessace_database_password' --user=the_user

  • Once at the mysql command prompt, load the database:

    • mysql> \. /path/to/your/aspace-backup.sql

  • No labels