SearchStax Solr

SearchStax is a service for a managed/hosted Solr, which we currently use to host our solr. (One production instance one staging instance)

 

SearchStax plan/size

AWS/us-east-1. (because that is where our heroku dynos run) “Silver” plan at smallest size, 1GB RAM, 8GB SSD. (Possibly called an “NDN1” plan/instance in some cases, or was in the past)

No backups or “disaster recovery” – not worth paying for our service level targets, when we can fairly easily re-create solr.

This is $40/month (for one instance; x2 for staging+production), but can get ~50% discount by pre-paying a year. SearchStax bills pro-rated daily if you are creating temporary/short-term instances.

Login to SearchStax

The searchstax console is at https://app.searchstax.com

The kind of SearchStax account we are using only allows two SearchStax console logins, so we are sharing a digital-tech@sciencehistory.org login. The password can be found in our shared team password management.

Beware of Sendio spam protection accidentally trapping communications from searchstax meant for digital-tech@sciencehistory.org.

Solr Auth Protection

It is important we protect Solr from being accessible to the public, as by default the public would have access to Solr admin functions and perhaps information in the solr index meant to be protected to only some users' access.

SearchStax allows protection based on IP address or password (HTTP basic auth). We do not rely on IP address, which eg would not be feasible in a heroku environment (without persistent unique-to-us IP addresses). But instead use only HTTP basic Auth.

You can configure as many Solr auth accounts as you want – these are different from the SearchStax console accounts. You do so in the SearchStax console, after selecting a specific deployment (eg staging or production) at Security / Auth menu. SearchStax offers three permission levels: read (without write); write (without read!); or “Read, Write, Admin”. It is only the latter that is really useful to us generally; this is fine, our app previously had complete access to Solr, and it still does.

We configure a single account scihist_digicoll, the password of which is also in team password management(different passwords should be used for production and staging). This username or password can be changed whenever you want, as long as you update app config to use it!

creating auth for the first time on a new instance. In SearchStax console, you need to go to Security / Auth. Enable auth. Then create a user. Username scihist_digicoll for production or scihist_digicoll_staging (helps us avoid confusing them). Either use password that it is already in team password management, or make sure to update team password management with new secure one! Role admin.

To access the Solr admin pages (URL found via SearchStax console), you would also need a Solr auth account like this. You can re-use the account we use for the app, or just create an account for yourself in the SearchStax console.

SearchStax doc: https://www.searchstax.com/docs/security/

Configuring our Rails App

We configure the app to talk to Solr via a single configuration variable. SOLR_URL in Env/heroku config or solr_url in local_env.yml.

This should include the http basic auth info in the URL. It also should include the name of the app collection (normally scihist_digicoll on the end) of the URL. Such as:

http://scihist_digicoll:$password/ssNNNNNN-aaaaaaa-us-east-1-aws.searchstax.com/solr/scihist_digicoll

You can get the basic solr URL for a deployment (production, staging, etc) from the SearchStax console. Then add the HTTP basic auth (password can be found at P:\Support\Computer Services\Digital Collections\searchstax_credentials.txt) , and add the collection name /scihist_digicoll on the end.

This is the URL to give to the app as SOLR_URL/solr_url config.

The app will use this for reading, writing, as well as updating solr configuration.

Updating Solr Configuration

With our previous solr installation, we provided our Solr configuration files by putting them in a directory on disk that the Solr was configured to use. And we updated files in that actual directory on every capistrano deploy.

Using SearchStax, we don’t have access to that, but we do have access to Solr Cloud APIs for uploading your Solr config directory as a “config set”, and configuring your Solr “collection” to use it.

We have written ruby code to use those APIs, including some high-level rake tasks:

# to create a NEW collection from the solr config in the project at ./solr/config # using the collection name included in your SOLR_URL. # Useful for bootstrapping on a brand new SearchStax or other Solr Cloud deployment ./bin/rake scihist:solr_cloud:create_collection # In an existing collection, sync the solr config in the project at ./solr/config # to the remote Solr Cloud instance configured in SOLR_URL. ./bin/rake scihist:solr_cloud:sync_configset

Both of these rely on SOLR_URL being set for Solr location, as well as including the HTTP basic auth credentials needed to use the Solr API. They normally will be set in a deployment location, or you can set locally, for instance with SOLR_URL=whatever on the command line you execute with rake task.

You can run rake tasks remotely on heroku with heroku run rake scihist… or via our ansible/EC2 setup with the capistrano task for running rake tasks remotely. Or by opening up a bash console to either, etc.

The sync_configset task is designed to be run on every deployment, much like db:migrate, it can make sure on every deployment that any changes in the repo solr/config directory meant to go along with the deployed version of code get deployed with the code. However, if there are solr config changes that requires a re-index, you will still need to take care of that “manually”, planning for downtime or figuring out a no-downtime way to do it, etc.

Note: We do not use SearchStax proprietary API

SearchStax has some custom-to-SearchStax API for managing Solr configsets and collections. I am not sure why it exists as it seems to duplicate what the standard Solr API’s use, while being somewhat harder to use. We do not use it. We use standard Solr API which should work on any Solr Cloud instance, not just SearchStax.

SearchStax docs/support say that the SearchStax API is “more secure”, but being protected by Solr basic auth seem sufficient (and those APIs are available to anyone with our Solr basic auth credentials whether we use them or not!). And also that the SearchStax API is logged/logged differently; I don’t think we care.

Updating SOLR version without downtime

  • Update `.solr_wrapper.yml` in dev to the new version

  • Test the new version in dev:

bin/rake solr:clean bin/rake solr:stop bin/rake solr:start bin/rake scihist:solr:reindex RAILS_ENV=test bin/rake solr:clean RAILS_ENV=test bin/rake solr:stop RAILS_ENV=test bin/rake solr:start RAILS_ENV=test bin/rake scihist:solr:reindex ./bin/rspec --fail-fast
  • Create a temporary deployment that’s as close as possible to the current one, but with the new version of SOLR. This will cost under $2 / day while it’s in use

  • Create a basicauth username and password as detailed above;

  • Construct a new SOLR_URL: $SOLR_URL='https://USERNAME:PASSWORD@DEPLOYMENT-INFO-us-east-1-aws.searchstax.com/solr/scihist_digicoll'

  • Create the temporary collection:heroku run "SOLR_URL=$SOLR_URL ./bin/rake scihist:solr_cloud:create_collection

  • Index to it: heroku run "SOLR_URL=$SOLR_URL ./bin/rake scihist:solr:reindex"

  • Start using it: heroku config:set SOLR_URL=$SOLR_URL

  • Index to it again, as above. Your app can now use the temporary deployment until the upgrade is ready.

  • Since you’re no longer using the regular deployment, you can use this as downtime. A SOLR upgrade takes two hours and needs to be requested via a ticket.

  • When the regular deployment is at the version you want, you can switch back to it:

    • index to it

    • switch the app’s SOLR_URL to point to it

    • index to it again.

  • You can now delete the temporary deployment.