Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

We use the Cloudflare Turnstile product to try to limit automated bot traffic to our app. Blogged story of it: https://bibwild.wordpress.com/2025/01/16/using-cloudflare-turnstile-to-protect-certain-pages-on-a-rails-app/

At present it’s only search result pages that are protected in this way, as these are where we were getting trouble: Because search pages (backed by Solr) are more resource constrained; and because bots were traversing every combination of facets in a basically limitless path.

After ~10 searches, a user may be redirected to a Turnstile challenge page, which in many cases will automatically redirect back to search in a few seconds. Users on a given browser should only see the challenge once per 24 hours. (All configurable and subject to change).

...

Cloudflare Turnstile Account, and credentials

...

These can be changed/rotated in Turnstile settings if needed, and then reset on (eg) heroku.

To test in development

Rate tracking requires rack-attack to have a working cache, which we don’t normally have in development – and we also need to enable the bot detect controls which are off by default.

set env CF_TURNSTILE_ENABLED=true to use Memory cache (resets on app restart) and enable protection in dev.

so rate gate to issue challenge will never be met! To test in development, you will want something like config.cache_store = :memory_store in your ./config/development.rb

Disabling

If the turnstile check is causing a problem, it can be disabled by setting ENV var CF_TURNSTILE_ENABLED to "false" (or deleting it, as default is false).

Configuration and Implementation

Original implentation PR: https://github.com/sciencehistory/scihist_digicoll/pull/2838

Configuration currently lives at the bottom of `./config/initializers/rack_attack.rb` rb (at the bottom, in a to_prepare block).

To see all possible things you can configure, see implementation at /app/controllers/bot_detection_controller.rb

What paths are protected are configured here – we intend to include all search results. If you add more search results pages (alternate views of search-within collections, featured topics, etc) at new URLs, you will have to adjust this configuration to protect them!

You can configure the period and count before a challenge is triggered, and how long a ‘passed' challenge is good for before another challenge might be issued. You can configure what locations are protected by this check (those are the only locations that count for rate limit).

More sophisticatedly, we could change the buckets/keys for which rates are calculated – right now they are subnets; could instead take account of http headers, or information looked up about the client ip. We want the check to be quick though, since it happens on every request.