Fighting Form Spammers
One of the most annoying developments on the Internet in recent years has been the rise of automated 'spambots' - autonomous virus-like programs that post a never-ending stream of rubbish to guestbooks and contact forms. This article describes a real-life example in which spammers are combated through a variety of means.
Form Validation
What you need to consider when setting up a form are the following points:
- which fields are going to be 'mandatory'?
- which fields need to be validated as numbers, dates, etc?
- is there any way for the form to be submitted with invalid data?
The first tool you have available is the ubiquitous JavaScript that can be used for client-side form validation. You should use JavaScript to prevent the form being submitted with missing or obviously invalid data. More information and scripts can be found in our articles on JavaScript Form Validation.
Because JavaScript can be disabled in the browser - and some browsers don't even run JavaScript - you need to always re-validate the form data using a server-side script before taking any action such as inserting into a database or sending an email.
This is an opportunity to use more powerful regular expressions to check that the format and values of dates and email addresses are valid. You can even check that the domain of an email address actually exists and has a mail server.
What you haven't yet achieved is blocking automated scripts from submitting your form.
Adding a CAPTCHA
To separate human users from automated scripts you need to implement a Turing test to the form - something that is simple for a human user but difficult for an automated script to negotiate.
This is most often implemented by adding a CAPTCHA image to the form - an image that presents a series of digits or letters to be read and re-typed into a form field. You can find more detailed instructions, and source code, in the article Protecting forms using a CAPTCHA (using PHP).
At this point you've eliminated at least 95% of form spammers. Depending on the CAPTCHA you use there may be one or two scripts that are intelligent enough to decipher the graphic and complete the form anyway.
Identifying the Source
One option for combatting the remaining spammers - and the only option for very large websites with a lot of people attempting to compromise them - is to continue to change and upgrade the CAPTCHA graphic, or implement a different Turing test, such as presenting a simple equation that has to be solved before the form can be submitted.
For smaller sites you might find that all the spam now getting through your form is coming from a single computer or netblock. In that case you have the option of blocking them from within your script or at the firewall.
The first option is simply a matter of comparing their IP address ($_SERVER['REMOTE_ADDR'] in PHP) to a black-list. The firewall option is slightly more complicated.
First you need to analyze your log files to narrow down the problem IP addresses. This command will tell you who has submitted the form:
$ grep "POST /theform\.php" combined_log
You can improve on this using awk:
$ grep "POST /theform\.php" combined_log | awk '{print $1}' | sort | uniq -c | sort
In the real-life example that triggered this article the output of a similar command was the following:
1 XX.132.59.38
4 XXX.225.176.73
6 XXX.225.177.190
8 XXX.225.176.177
The first address is the list is a 'friendly', but the others - clearly from the same netblock - are associated with contact form spam. Using our subnet calculator, we input the lowest and highest of these 'unfriendly' ip addresses and come up with the netmask XXX.225.176.0/23, covering a range of 512 IP addresses.
Blocking at the Firewall
The final step is to add a rule to your firewall using iptables:
# iptables -I INPUT 2 -p tcp -s XXX.225.176.0/23 --dport 80 -j REJECT --reject-with tcp-reset
Note: this command inserts the rule at line 2 in the firewall, but you can change that as needed.
This inserts a DROP command for all HTTP (port 80) traffic coming from the IP block that we've identified. Not only will they be prevented from submitting the form, but also from accessing other areas of the website.
In our case, this measure was necessary because it appeared that this netblock was spamming a number of different sites on our server - and a search showed that other servers and websites were being targeted as well.
The result has been a complete halt of form-based spam to all websites on our server!