Cloudflare Turnstile bot detection

Cloudflare Turnstile bot detection

We use the Cloudflare Turnstile product to try to limit automated bot traffic to our app. Blogged story of it: https://bibwild.wordpress.com/2025/01/16/using-cloudflare-turnstile-to-protect-certain-pages-on-a-rails-app/

At present it’s only search result pages that are protected in this way, as these are where we were getting trouble: Because search pages (backed by Solr) are more resource constrained; and because bots were traversing every combination of facets in a basically limitless path.

After ~2 searches, a user may be redirected to a Turnstile challenge page, which in many cases will automatically redirect back to search in a few seconds. Users on a given browser should only see the challenge once per 24 hours. (All configurable and subject to change).

Cloudflare Turnstile Account, and credentials

We have a cloudflare account under it@sciencehistory.org that contains our Turnstile configuration. This account also should have Jonathan’s and Eddie’s personal @sciencehistory.org accounts added to it as team members, who can also configure.

We have separate Turnstile “widgets” configured for staging and production. Each needs the allowed hostnames configured; for staging we include localhost if you want to test on dev.

Each turnstile widget has a “site key” and “secret key” which can be accessed in the “settings” panel on Turnstile dashboard. These need to be set in app (eg heroku config var) in CF_TURNSTILE_SITEKEY and CF_TURNSTILE_SECRET_KEY ENV variables.

These can be changed/rotated in Turnstile settings if needed, and then reset on (eg) heroku.

To test in development

Rate tracking requires rack-attack to have a working cache, which we don’t normally have in development – and we also need to enable the bot detect controls which are off by default.

set env CF_TURNSTILE_ENABLED=true to use Memory cache (resets on app restart) and enable protection in dev.

so rate gate to issue challenge will never be met! To test in development, you will want something like config.cache_store = :memory_store in your ./config/development.rb

Disabling

If the turnstile check is causing a problem, it can be disabled by setting ENV var CF_TURNSTILE_ENABLED to "false" (or deleting it, as default is false).

Configuration and Implementation

Currently implemented via the bot_challenge_page gem written by jrochkind.

Configuration is in ./config/initializers/bot_challenge_page.rb

What paths are protected are configured here – we intend to include all search results. If you add more search results pages (alternate views of search-within collections, featured topics, etc) at new URLs, you will have to adjust this configuration to protect them!

We also protect downloads of “originals” (usually TIFF), because the bandwidth costs of bots gone wild were too much for us even though our site was stable.

You can configure the period and count before a challenge is triggered, and how long a ‘passed' challenge is good for before another challenge might be issued.

More sophisticatedly, with additional configuration from the gem, we could change the buckets/keys for which rates are calculated – right now they are subnets; could instead take account of http headers, or information looked up about the client ip. We want the check to be quick though, since it happens on every request.

Logging

A challenge is issued with a 403 http status – this is NOT currently included in our main app logs (because we had so many of them it was filling up our log plan), but is instead logged into a database table Rails model BotChallengedRequest. There isn’t currently an admin GUI for looking at it, it’s just in the DB, but recording actually much more than our typical logs record (full user-agent, referer, etc).

An actual successful pass through the challenge will be logged with Cloudflare Turnstile validation passed

Current observation is that most “failures” aren’t logged, as they occur by an agent getting challenged and just never making it past there.

While CloudFlare Turnstile also keeps some statistics, they are missing most of our challenge hits – maybe because the clients are not executing Javascript at all to engage Turnstile.

Community

The #bots channel on the Code4Lib slack is full of helpful peers. Join it at https://code4lib.org/slack