Cloudflare Turnstile bot detection
We use the Cloudflare Turnstile product to try to limit automated bot traffic to our app. Blogged story of it: Using CloudFlare Turnstile to protect certain pages on a Rails app
At present it’s only search result pages that are protected in this way, as these are where we were getting trouble: Because search pages (backed by Solr) are more resource constrained; and because bots were traversing every combination of facets in a basically limitless path.
After ~10 searches, a user may be redirected to a Turnstile challenge page, which in many cases will automatically redirect back to search in a few seconds. Users on a given browser should only see the challenge once per 24 hours. (All configurable and subject to change).
Cloudflare Turnstile Account, and credentials
We have a cloudflare account under it@sciencehistory.org
that contains our Turnstile configuration. This account also should have Jonathan’s and Eddie’s personal @sciencehistory.org accounts added to it as team members, who can also configure.
We have separate Turnstile “widgets” configured for staging and production. Each needs the allowed hostnames configured; for staging we include localhost
if you want to test on dev.
Each turnstile widget has a “site key” and “secret key” which can be accessed in the “settings” panel on Turnstile dashboard. These need to be set in app (eg heroku config var) in CF_TURNSTILE_SITEKEY
and CF_TURNSTILE_SECRET_KEY
ENV
variables.
These can be changed/rotated in Turnstile settings if needed, and then reset on (eg) heroku.
To test in development
Rate tracking requires rack-attack to have a working cache, which we don’t normally have in development – and we also need to enable the bot detect controls which are off by default.
set env CF_TURNSTILE_ENABLED=true
to use Memory cache (resets on app restart) and enable protection in dev.
so rate gate to issue challenge will never be met! To test in development, you will want something like config.cache_store = :memory_store
in your ./config/development.rb
Disabling
If the turnstile check is causing a problem, it can be disabled by setting ENV var CF_TURNSTILE_ENABLED
to "false"
(or deleting it, as default is false).
Configuration and Implementation
Original implentation PR: https://github.com/sciencehistory/scihist_digicoll/pull/2838
Configuration currently lives at the bottom of ./config/initializers/rack_attack.rb
(at the bottom, in a to_prepare
block).
To see all possible things you can configure, see implementation at /app/controllers/bot_detection_controller.rb
What paths are protected are configured here – we intend to include all search results. If you add more search results pages (alternate views of search-within collections, featured topics, etc) at new URLs, you will have to adjust this configuration to protect them!
You can configure the period and count before a challenge is triggered, and how long a ‘passed' challenge is good for before another challenge might be issued.
More sophisticatedly, we could change the buckets/keys for which rates are calculated – right now they are subnets; could instead take account of http headers, or information looked up about the client ip. We want the check to be quick though, since it happens on every request.