Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Create a new user

First, an administrator should log into the Jobs server (production or staging, as needed), then run a rake task like, e.g.:

cd /opt/sufia-project/current/
RAILS_ENV=production bundle exec rake chf:user:create['username@sciencehistory.org']

Then, ask the new user do the following:

*After new account has been created, file Help Desk ticket to add new user to Hydra User Group email list*

Lock out user

run a rake task like, e.g.:

...

Passenger (web worker) administration

Log into the web server as digcol user. You can find current web server IPs with ./bin/cap production list_ec2_servers from an app checkout. 

Get good status info on passenger workers:

$ PASSENGER_INSTANCE_REGISTRY_DIR=/opt/scihist_digicoll/shared passenger-status

Restart application without restarting apache

This will reload config files.

$ PASSENGER_INSTANCE_REGISTRY_DIR=/opt/scihist_digicoll/shared passenger-config restart-app

passenger-config can do some other interesting things as well, such as system-metrics

Resque admin panel

If a file doesn't get characterized correctly the first thing to do is check the resque admin panel. There you can view failures and restart jobs. If you are logged in as an admin user you can view the admin panel at `digital.chemheritage.org/admin/queues`

What version of fedora is running?

it's in the footer of the main page

for example http://localhost:8080/fedora/

Restart application without restarting apache

This will reload config files.

$ passenger-config restart-app

What version of the app is deployed?

$ cat /opt/sufia-project/current/REVISION 

Restart passenger

...

$ passenger-config restart-app

Reindex all of solr:

# ssh to server as deploy user
$ cd /opt/sufia-project/current
$ RAILS_ENV=production bundle exec rake chf:reindex

Or, try using cap remote rake:  `cap (production|staging) invoke:rake TASK=chf:reindex`

Note: If Reindexing due to a server move, import the Postgres database of users prior to reindexing. Otherwise you will need to reindex again once the users have been moved over.

Reindex just works in solr:

# ssh to jobs server (either jobs-prod or jobs-stage)
$ cd /opt/sufia-project/current
$ RAILS_ENV=production bundle exec rake chf:reindex_works

Or, try using cap remote rake:  `cap (production|staging) invoke:rake TASK=chf:reindex_works`

Note: reindex_works task only works when you already have a complete solr index, unlike the much much slower full reindex, which can be run with an empty index to begin with.

Note: make sure to use either "screen" or "nohup", so if/when you get disconnected from terminal on jobs, it’s still running.

Delete all the data

(Don't do this on prod!)

Optional: stop apache or use capistrano's maintenance mode

Shut down tomcat and solr

rm -rf /opt/fedora-data/*
rm -rf /opt/solr/collection1/data/* # solr 4
rm -rf /var/solr/data/collection1/data/* # solr 5

If using Sufia 7 also run

psql -U trilby -d fcrepo -c 'DELETE FROM modeshape_repository'

The temporary testing password for trilby is porkpie2

Delete database stuff (notifications, mostly)

(you'll need the password. it's in the ansible vault.)

psql -U chf_pg_hydra -d chf_hydra
delete from mailboxer_receipts where created_at < '2015-11-9';
delete from mailboxer_notifications where created_at < '2015-11-9';
delete from mailboxer_conversations where created_at < '2015-11-9';
delete from trophies where created_at < '2015-11-9';

Turn tomcat back on (and apache if needed)

Inspect stuff

Note when using the rails console to look at actual live production data it's possible to change and delete things! Please be very careful before submitting commands if you are working with live data. Consider a dry-run on the staging server before doing anything on the production box.

$ bundle exec rails c[onsole] production
# Or if you use my dev box, mess around on a development instance with just $bundle exec rails c
# Get a count of users
> User.all.count
# List all users (you can also work with users directly in pgsql)
> User.find_each { |u| p u.email } 
# Get a count of files
> GenericFile.all.count
# Inspect a file
> f = GenericFile.find(id='3b5918567')
> f.depositor

...

State File

The state file (formerly in /tmp) has been moved to /var/sufia. It is currently being backed up nightly. It must be included in any server migrations to avoid generating errors when uploading (a new state file may try to use an already used fedora ID). 

Rights statements

It's useful to periodically check that all publicly available works have rights statements. As of Summer 2018 this was in fact true, but if you want to quickly check for the ID's of any public works that still need rights statements, log onto jobs_stage or jobs_production, open a console (see "Inspect Stuff" above) and paste the following directly into the console:

GenericWork.search_in_batches('read_access_group_ssim'=>'public') do |group|
group.each do |gw|
if gw["rights_tesim"] == nil || gw["rights_tesim"].count == 0
puts gw["id"]
end #if
end #group.each
end #search_in_batches

Adding and removing items from large collections

This is a known bug as of Summer 2018. See https://github.com/sciencehistory/chf-sufia/issues/1068 for more details about this. We seldom need this functionality, but if you do, here's how to do it in the Rails console.

Assuming the work you want to add or remove has ID work_id and the collection you want to add it to or remove it from has id collection_id,

Removing:

the_collection = Collection.find(collection_id)
the_collection.members.delete(GenericWork.find(work_id))
the_collection.save

Adding:

the_collection = Collection.find(collection_id)
the_collection.members.push(GenericWork.find(work_id))
the_collection.save

For large collections, expect these operations to take five to ten minutes and place considerable load on the server.

Regenerating derivatives on a fileset

Log into the jobs server (prod or staging, depending on the situation.)

...

Check README for scihist_digicoll

Rebuild Solr with 0 Downtime tips:

For scihist_digicoll, we can easily build and swap in a new Solr server. This will result in downtime until the index is remade. While reindexing takes only a minute or two, the server changes being applied to jobs and web can take a while, so there may be many minutes between when one of them is connected to the new Solr server and the other does not.  During that time, we can't reindex.

To minimize downtimes during Solr changes, the preferred method is to take a backup of the old Solr version (if it can be used with the new Solr version, test first) and then restore that backup on the new Solr server so that public users will always be able to run searches.

CORENAME is scihist_digicoll
Location is build by ansible for backups, /backups/solr-backup
BACKUPNAME can be anything you like

On the old Solr machines run

Logged in as ubuntu

curl 'http://localhost:8983/solr/CORENAME/replication?command=backup&name=BACKUPNAME&location=/backups/solr-backup'


To check the status of the backup run

curl "http://localhost:8983/solr/CORENAME/replication?command=details"

Then tar it up

tar czf ~/solr-backup.tar.gz /backups/solr-backup/snapshot.BACKUPNAME

Then move/copy the backup tar to new server via whatever method you care to use, such as scp

If you are working with Production is a good idea to go onto Ansible and edit the group_var/kithe_production you are working on and put in the new private IP address for solr.

Commit the changes to the staging branch so you can easily merge a PR from staging to master to swap the IP address without waiting for staging to update.

On the new Solr machine

Logged in as ubuntu

Extra the tar tar xzf solr-backup.tar.gz to the /backups/solr-backup spot (or anywhere as long as the Solr user can access it)

You may need to pull the backup file from the extracted tar (i.e. if you extract it directly to /backups/solr-backup you may see it in /backups/solr-backup/backups/solr-backup/BACKUPNAME and wish to move the file to /backups/solr-backup/BACKUPNAME)

Make sure all files are owned by unix account and group solrsudo chown -R solr:solr /backups/solr-backup

Make sure the scihist_digoll application code is deployed to the new Solr server and it is running correctly. Because the config files for solr live in our app repo, and are delivered to the Solr server via capistrano deploy. 

Run:

curl 'http://localhost:8983/solr/CORENAME/replication?command=restore&name=BACKUPNAME&location=/PATH'

PATH should just be the directory the backup is in /backups/solr-backup and not the full path name of the folder.

If you want to check the status of the restore run

curl "http://localhost:8983/solr/CORNAME/replication?command=restorestatus"

Now you should make the Solr server IP change. Either by committing a change to staging if this was a staging swap, or merging the change you already put in staging into master if this is a production swap.

Now the new machine has a recent backup and when you update the server IP address users will always get search results. Staff who have recently added or edited items may notice that they look off if it took place after the backup.

Once the servers are switched, run a reindex to catch any changes made during that time.

Clearing out the tmp directory (removes everything older than 8 days.)

This is invoked by a cron job on app prod, but just in case...

find /tmp/* -mtime +8 -delete