Module release: BadBot - automatically eliminating spam once and for all

Every webmaster has encountered spam; from account registrations, to contact form submissions, spam bots are constantly on the prowl for new targets. The usual approach to combating such bots is with one of the many available captcha tools, but they are all far from perfect, and with the abundance of services that have warehouses full of real people entering captchas all day (for spam bots to then use), they are often rendered useless. You can change the type of captcha used and you can add new ones every day, but that only leads to a game of cat and mouse with the spammers targeting your site, and wasted time on both sides, not to mention annoyed users. The simple fact is that captchas are not effective in combatting spam.

Other alternatives to combatting spam include black lists of known spammers (such as stopforumspam.com), analyzing the submitted form's content, checking user agent strings, and the like, but again, none of the approaches works as well as one would like.

To reliably identify spam bots we need to find something the bots do not have (but regular users do), and that something is JavaScript. Writing a JS engine is no simple task, and processing a page's JS seriously slows down the page load, and that's something spam bots simply cannot afford. If we can identify form submissions from clients which do not support JS, we can effectively block the spam bots. Sure, this will also affect users who have chosen to disable JS in their browsers, but the percentage of such users is very small, and I believe that inconveniencing a few users with an error message is an acceptable price to pay for eliminating spam.

Enter: Badbot.

Version 1.0 of Badbot supports checking for JS on the user registration form. The logic is quite straightforward:

  1. find a required field in the form
  2. intercept the form submission and fire an AJAX request to generate a token based on the value of said field and a secret salt
  3. populate the returned token into a hidden field in the form and programmatically re-submit the form
  4. generate the token again in the form's validation handler, and compare it against the value of the hidden field in the form
  5. if the values do not match, block the form submission and display an error

Without JS the AJAX request will not fire, the token will not be generated, and form validation will fail. Simple.

I implemented this technique on one of Adobe's Drupal-powered websites a couple years ago. The site was receiving thousands of spam registrations on a daily basis, and none of the captchas made any difference. A new captcha would stop the spam, but the next day the spammers would tweak the bot and the registrations would start coming in again. It was obvious that there were real people solving the captchas. This JavaScript-based technique effectively blocked 100% of spam.

I'm long-overdue for writing this module, but now that the basic version is up on Drupal.org, I encourage everyone to try it out and share your thoughts. Have an idea? Is there something I missed and did not consider? Hit the comments below!