/
Cloudflare Turnstile bot detection

Cloudflare Turnstile bot detection

We use the Cloudflare Turnstile product to try to limit automated bot traffic to our app. Blogged story of it: Using CloudFlare Turnstile to protect certain pages on a Rails app

At present it’s only search result pages that are protected in this way, as these are where we were getting trouble: Because search pages (backed by Solr) are more resource constrained; and because bots were traversing every combination of facets in a basically limitless path.

After ~10 searches, a user may be redirected to a Turnstile challenge page, which in many cases will automatically redirect back to search in a few seconds. Users on a given browser should only see the challenge once per 24 hours. (All configurable and subject to change).

Cloudflare Turnstile Account, and credentials

We have a cloudflare account under it@sciencehistory.org that contains our Turnstile configuration. This account also should have Jonathan’s and Eddie’s personal @sciencehistory.org accounts added to it as team members, who can also configure.

We have separate Turnstile “widgets” configured for staging and production. Each needs the allowed hostnames configured; for staging we include localhost if you want to test on dev.

Each turnstile widget has a “site key” and “secret key” which can be accessed in the “settings” panel on Turnstile dashboard. These need to be set in app (eg heroku config var) in CF_TURNSTILE_SITEKEY and CF_TURNSTILE_SECRET_KEY ENV variables.

These can be changed/rotated in Turnstile settings if needed, and then reset on (eg) heroku.

To test in development

Rate tracking requires rack-attack to have a working cache, which we don’t normally have in development – and we also need to enable the bot detect controls which are off by default.

set env CF_TURNSTILE_ENABLED=true to use Memory cache (resets on app restart) and enable protection in dev.

so rate gate to issue challenge will never be met! To test in development, you will want something like config.cache_store = :memory_store in your ./config/development.rb

Disabling

If the turnstile check is causing a problem, it can be disabled by setting ENV var CF_TURNSTILE_ENABLED to "false" (or deleting it, as default is false).

Configuration and Implementation

Original implentation PR: https://github.com/sciencehistory/scihist_digicoll/pull/2838

Configuration currently lives at the bottom of ./config/initializers/rack_attack.rb (at the bottom, in a to_prepare block).

To see all possible things you can configure, see implementation at /app/controllers/bot_detection_controller.rb

What paths are protected are configured here – we intend to include all search results. If you add more search results pages (alternate views of search-within collections, featured topics, etc) at new URLs, you will have to adjust this configuration to protect them!

You can configure the period and count before a challenge is triggered, and how long a ‘passed' challenge is good for before another challenge might be issued.

More sophisticatedly, we could change the buckets/keys for which rates are calculated – right now they are subnets; could instead take account of http headers, or information looked up about the client ip. We want the check to be quick though, since it happens on every request.

Related content

SearchStax Solr
More like this
Heroku Operational Components Overview
Heroku Operational Components Overview
More like this
DNS and SSL and CNAME management for Heroku sites
DNS and SSL and CNAME management for Heroku sites
More like this
Microsoft SSO
More like this
Heroku developer setup
Heroku developer setup
More like this
Heroku Proposal
More like this