Skip to end of metadata
Go to start of metadata

You are viewing an old version of this content. View the current version.

Compare with Current View Version History

« Previous Version 6 Next »

We use the Cloudflare Turnstile product to try to limit automated bot traffic to our app.

At present it’s only search result pages that are protected in this way, as these are where we were getting trouble: Because search pages (backed by Solr) are more resource constrained; and because bots were traversing every combination of facets in a basically limitless path.

After ~10 searches, a user may be redirected to a Turnstile challenge page, which in many cases will automatically redirect back to search in a few seconds. Users on a given browser should only see the challenge once per 24 hours. (All configurable and subject to change).

Cloudflare Turnstile Account, and credentials

We have a cloudflare account under it@sciencehistory.org that contains our Turnstile configuration. This account also should have Jonathan’s and Eddie’s personal @sciencehistory.org accounts added to it as team members, who can also configure.

We have separate Turnstile “widgets” configured for staging and production. Each needs the allowed hostnames configured; for staging we include localhost if you want to test on dev.

Each turnstile widget has a “site key” and “secret key” which can be accessed in the “settings” panel on Turnstile dashboard. These need to be set in app (eg heroku config var) in CF_TURNSTILE_SITEKEY and CF_TURNSTILE_SECRET_KEY ENV variables.

These can be changed/rotated in Turnstile settings if needed, and then reset on (eg) heroku.

To test in development

Rate tracking requires rack-attack to have a working cache, which we don’t normally have in development – so rate gate to issue challenge will never be met! To test in development, you will want something like config.cache_store = :memory_store in your ./config/development.rb

Disabling

If the turnstile check is causing a problem, it can be disabled by setting ENV var CF_TURNSTILE_ENABLED to "false" (or deleting it, as default is false).

Configuration and Implementation

Original implentation PR: https://github.com/sciencehistory/scihist_digicoll/pull/2838

Configuration currently lives at the bottom of `./config/initializers/rack_attack.rb` (at the bottom, in a to_prepare block).

To see all possible things you can configure, see implementation at /app/controllers/bot_detection_controller.rb

What paths are protected are configured here – we intend to include all search results. If you add more search results pages (alternate views of search-within collections, featured topics, etc) at new URLs, you will have to adjust this configuration to protect them!

You can configure the period and count before a challenge is triggered, and how long a ‘passed' challenge is good for before another challenge might be issued.

More sophisticatedly, we could change the buckets/keys for which rates are calculated – right now they are subnets; could instead take account of http headers, or information looked up about the client ip. We want the check to be quick though, since it happens on every request.

  • No labels