Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 43 Next »

We store descriptions of our archival collections in the following places:

Location

Format

Number

Source

Example

Who can see it?

Shared/P/Othmer Library/Archives/Collections Inventories/Archival Finding Aids and Box Lists

Word documents

?

This is the original collection description.

?

Institute staff

ArchivesSpace site

MySQL-backed website

Roughly 45 as of 2020

Entered manually based on the P drive Word files.

https://archives.sciencehistory.org/resources/81#tree::resource_81

Only logged in ArchivesSpace users

ArchivesSpace Apache front end

EAD (xml format)

Roughly 45 as of 2020

Generated nightly from ArchivesSpace database

https://archives.sciencehistory.org/ead/scihist-2012-021.xml

Public

ArchivesSpace Apache front end

HTML

Roughly 45 as of 2020

Generated nightly from ArchivesSpace database

https://archives.sciencehistory.org/2012-021.html

Public

OPAC

PDF

?

Exported manually as PDF from the ArchivesSpace site, then attached to the OPAC record for the collection

https://othmerlib.sciencehistory.org/articles/1065801.15134/1.PDF

Public

Workflow

Technical details about the server

ArchivesSpace lives on an AWS S3 server ArchivesSpace-prod, at https://50.16.132.240/ (also found at https://archives.sciencehistory.org)

The current production version of Aspace is 2.7.1 .

Terminal access: ssh -i /path/to/production/pem_file.pem ubuntu@50.16.132.240

The ubuntu user owns all the admin scripts.

The relevant Ansible role is: /roles/archivesspace/ in the ansible-inventory codebase.

SSL is based on the following: http://www.rubydoc.info/github/archivesspace/archivesspace

The executables are at /opt/archivesspace/

The configuration file is /opt/archivesspace/config/config.rb
Logs are at: logs/archivesspace.out

Apache server is at /var/log/apache2/

Configuration for the Apache site is at /etc/apache2/sites-available/000-default.conf. It would be a good idea to spend some time drastically simplifying this configuration.

Main users

  • Kent does a majority of the encoding

  • Hillary Kativa

  • Patrick Shea

Startup

  • To start Archivesspace: sudo service archivesspace start. You may need to run this several times (just wait 30 seconds between attempts.)

  • You can troubleshoot startup by looking at the start script (invoked by the above): /opt/archivesspace/archivesspace.sh start

  • There may be a short delay as the server re-indexes data.

Restarting the server to fix Tomcat memory leak

Note: as of August 2020, the below procedure has been rendered obsolete. We are now simply restarting the server every Sunday at 2 am, which appears to solve the problem before it occurs.

ArchivesSpace has a memory leak that causes it to use more CPU time than it should. This will slowly drain all the burst credits, at which point the server slows down.

Another clue: if you go to the AWS console for the server, under the Monitoring tab, if the CPU Utilization graph shows anything over about 15 %.

Procedure:

  • Contact all the main users listed above (especially Kent), and make sure they’re not actively working on the server.

  • Once given the go-ahead:

    • Log in to the server.

    • Throughout the process, keep in mind you can run sudo service archivesspace status for the service status at any point. If it’s running, you’ll see a variation on: [...] Loaded: loaded (/etc/systemd/system/archivesspace.service; enabled; [...])
      Active: active (running) since Tue [...]

    • Run top in a separate window to monitor the CPU usage. The goal is to see a dramatic reduction in usage after this process.

    • sudo systemctl stop archivesspace

    • sudo systemctl start archivesspace (You may have to run this two or three times – the start script is finicky)

    • If all else fails, you can also go into the AWS console and reboot the EC2 instance.

    • Once everything is properly restarted:

      • the https://archives.sciencehistory.org/ front-end is available again

      • After a few minutes, you should see the CPU use go down dramatically in top.

      • The AWS monitoring graph for CPU Utilization graph should drop. (see figure below.)

    • Once you’re done, notify all involved that the server is available again.

Export

The ArchivesSpace EADs are harvested by:

Institution

Liaison

Contact

Center for the History of Science, Technology, and Medicine (CHSTM)

Richard Shrake

shraker13@gmail.com

University of Penn Libraries Special Collections

Holly Mengel

hmengel@pobox.upenn.edu

Both institutions harvest the EADs by automatically scraping https://archives.sciencehistory.org/ead/ . Once harvested, the EADs are added to their aggregated Philly-area EAD search interfaces.

The main export files are located at: /home/ubuntu/archivesspace_scripts . They are checked into code at https://github.com/sciencehistory/archivesspace_scripts .

Important files:

complete_export.sh

Runs the nightly export (called by cron every night at 9 PM). This calls as_export.py and generate.sh below.

local_settings.cfg

Settings

as_export.py

Extracts XML from ArchiveSpace and saves a series of EADs into /exports/data/ead/*/*.xml .

It exports EADs that contains links to the actual digital objects.

generate.sh

Transforms the EADs in /exports/data/ead into HTML and and saves them into var/www/html. See for instance https://archives.sciencehistory.org/beckman e.g.

It relies on files (stylesheets, transformations) in

finding-aid-files
fa-files

xml-validator.sh

Checks that the publicly accessible files in /var/www/html/ead/ are valid.

Once processed by generate.sh, the xml files are publicly accessible at https://archives.sciencehistory.org/ead/

via an Apache web server.

Details about the as_export.py script:

Building the server

The server not yet fully ansible-ized.

What is missing from the ansible build:

  • It doesn’t copy the scripts over correctly.

More technical documentation

http://archivesspace.github.io/archivesspace/

Note: we are paying members of the AS consortium. We will want to set up an account for eddie:https://archivesspace.atlassian.net/wiki/spaces/ADC/pages/917045261/ArchivesSpace+Help+Center

Backups

These consist of making backups of the sql database used by the ArchivesSpace program.

Place the Mysql database in /backup

mysql-backup.sh

Dumps the mysql database to /backup/aspace-backup.sql.
This script is run as a crontab by user ubuntu : 30 17 * * 1-5 /home/ubuntu/archivesspace_scripts/mysql-backup.sh

Sync /backup to an s3 bucket

s3-backup.sh

Runs an aws s3 sync command to place the contents of /backup at https://s3.console.aws.amazon.com/s3/object/chf-hydra-backup/Aspace/aspace-backup.sql?region=us-west-2&tab=overview.

This script is run as a crontab by user ubuntu : 45 17 * * 1-5 /home/ubuntu/archivesspace_scripts/s3-backup.sh

See Backups and Recovery for a discussion of how the chf-hydra-backup s3 bucket is then copied to Dubnium and in-house storage.

Restoring from backup

You can get a recent backup of the database at https://s3.console.aws.amazon.com/s3/object/chf-hydra-backup/Aspace/aspace-backup.sql

Note that the create_aspace.yml playbook creates a minimal, basically empty aspace database with no actual archival data in it.

To restore from such a backup onto a freshly-created ArchivesSpace server,

  • copy your backup database to an arbitrary location on the new server

  • ssh in to the new server

  • Log into the empty archivesspace database:

    • mysql archivesspace --password='the_archivessace_database_password' --user=the_user

  • Once at the mysql command prompt, load the database:

    • mysql> \. /path/to/your/aspace-backup.sql

  • No labels