System: Referer Spam from Microsoft Bing
With the demise of Live Search and the launch of Bing a few weeks ago we were hopeful that Microsoft was also going to change it's practice of spamming websites with fake search referrals. Until this week it looked like they had, but now we're seeing exactly the same pattern of abuse coming from the same IP addresses as the notorious 'LVSP' and 'QBHP' requests described earlier.
Evidence from the log files
Here's an example of the type of traffic we're seeing. Referrals that appear to come from bing.com, but can't possibly with these one-word search terms.
HTTP_REFERER
This is how the HTTP_REFERER field appears for the fake traffic from Bing:
http://www.bing.com/search?q=about
http://www.bing.com/search?q=africa
http://www.bing.com/search?q=atherton
http://www.bing.com/search?q=australia
http://www.bing.com/search?q=backpacker
http://www.bing.com/search?q=beaded
http://www.bing.com/search?q=calls
http://www.bing.com/search?q=community
http://www.bing.com/search?q=coolbaroo
http://www.bing.com/search?q=corowa
http://www.bing.com/search?q=emergency
http://www.bing.com/search?q=family
http://www.bing.com/search?q=farmkeeper
http://www.bing.com/search?q=films
http://www.bing.com/search?q=glenelg
http://www.bing.com/search?q=health
http://www.bing.com/search?q=higher
http://www.bing.com/search?q=history
http://www.bing.com/search?q=links
http://www.bing.com/search?q=malua
http://www.bing.com/search?q=massacre
http://www.bing.com/search?q=member
http://www.bing.com/search?q=merimbula
http://www.bing.com/search?q=ocean
http://www.bing.com/search?q=policy
http://www.bing.com/search?q=president
http://www.bing.com/search?q=screen
http://www.bing.com/search?q=search
http://www.bing.com/search?q=selling
http://www.bing.com/search?q=simulations
http://www.bing.com/search?q=sister
http://www.bing.com/search?q=sisters
http://www.bing.com/search?q=street
http://www.bing.com/search?q=sydney
http://www.bing.com/search?q=wedding
http://www.bing.com/search?q=ylang
For comparison, here is a sample of the HTTP_REFERER strings for real search traffic:
http://www.bing.com/search?q=php++output+examples+good+looking+tables&filt=all&first=11&FORM=PERE
http://www.bing.com/search?q=escaping+characters+in+javascript&FORM=HPDTDF&src=IE-SearchBox
http://www.bing.com/search?q=atom+reader&go=&form=QBRE
http://www.bing.com/search?q=transitions+using+CSS+javascript&form=QBRE&qs=n
http://www.bing.com/search?q=art+of+web&form=QBLH&filt=all&qs=n
IP addresses
As before, all the suspect traffic is coming from the same location, Microsoft Corporation in Redmond WA:
65.55.104.*
65.55.107.*
65.55.109.*
65.55.110.*
User agent
The user agents being used are interesting in that they mix up different versions of .NET and other components, but all seem to have double spaces where normally you would see a single space:
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SLCC1; .NET CLR 1.1.4322; .NET CLR 2.0.40607; .NET CLR 3.0.30729; .NET CLR 3.5.30707; MS-RTC LM 8)
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SLCC1; .NET CLR 1.1.4322; .NET CLR 2.0.40607; .NET CLR 3.0.30729; .NET CLR 3.5.30729; MS-RTC LM 8)
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SLCC1; .NET CLR 1.1.4322; .NET CLR 2.0.50727)
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SLCC1; .NET CLR 1.1.4322; .NET CLR 2.0.50727; .NET CLR 3.0.04506.648)
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SLCC1; .NET CLR 1.1.4322; .NET CLR 2.0.50727; .NET CLR 3.0.30729; .NET CLR 3.5.30729; MS-RTC LM 8)
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SLCC1; .NET CLR 1.1.4325; .NET CLR 2.0.40607; .NET CLR 3.0.04506.648)
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SLCC1; .NET CLR 1.1.4325; .NET CLR 2.0.40607; .NET CLR 3.0.30729; .NET CLR 3.5.30707)
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SLCC1; .NET CLR 1.1.4325; .NET CLR 2.0.40607; .NET CLR 3.0.30729; .NET CLR 3.5.30729)
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SLCC1; .NET CLR 1.1.4325; .NET CLR 2.0.50727; .NET CLR 3.0.30729; .NET CLR 3.5.30729; MS-RTC LM 8)
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322; .NET CLR 2.0.40607)
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322; .NET CLR 2.0.40607; .NET CLR 3.0.04506.648)
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322; .NET CLR 2.0.50727; .NET CLR 3.0.30729; .NET CLR 3.5.30707)
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322; .NET CLR 2.0.50727; .NET CLR 3.0.30729; .NET CLR 3.5.30729)
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322; .NET CLR 2.0.50727; .NET CLR 3.0.30729; .NET CLR 3.5.30729; InfoPath.2)
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4325; .NET CLR 2.0.40607)
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4325; .NET CLR 2.0.40607; .NET CLR 3.0.04506.648)
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4325; .NET CLR 2.0.40607; .NET CLR 3.0.30729; .NET CLR 3.5.30729; InfoPath.2)
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4325; .NET CLR 2.0.50727; .NET CLR 3.0.04506.648)
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4325; .NET CLR 2.0.50727; .NET CLR 3.0.30729; .NET CLR 3.5.30707; MS-RTC LM 8)
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; SLCC1)
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; SLCC1; .NET CLR 1.1.4322; .NET CLR 2.0.40607; .NET CLR 3.0.04506.648)
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; SLCC1; .NET CLR 1.1.4322; .NET CLR 2.0.40607; .NET CLR 3.0.30729; .NET CLR 3.5.30707)
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; SLCC1; .NET CLR 1.1.4322; .NET CLR 2.0.40607; .NET CLR 3.0.30729; .NET CLR 3.5.30707; MS-RTC LM 8)
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; SLCC1; .NET CLR 1.1.4322; .NET CLR 2.0.40607; .NET CLR 3.0.30729; .NET CLR 3.5.30729)
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; SLCC1; .NET CLR 1.1.4322; .NET CLR 2.0.50727)
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; SLCC1; .NET CLR 1.1.4322; .NET CLR 2.0.50727; .NET CLR 3.0.04506.648)
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; SLCC1; .NET CLR 1.1.4322; .NET CLR 2.0.50727; .NET CLR 3.0.30729; .NET CLR 3.5.30729; InfoPath.2)
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; SLCC1; .NET CLR 1.1.4325; .NET CLR 2.0.40607)
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; SLCC1; .NET CLR 1.1.4325; .NET CLR 2.0.40607; .NET CLR 3.0.30729)
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; SLCC1; .NET CLR 1.1.4325; .NET CLR 2.0.40607; .NET CLR 3.0.30729; .NET CLR 3.5.30707; MS-RTC LM 8)
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; SLCC1; .NET CLR 1.1.4325; .NET CLR 2.0.40607; .NET CLR 3.0.30729; .NET CLR 3.5.30729)
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; SLCC1; .NET CLR 1.1.4325; .NET CLR 2.0.40607; .NET CLR 3.0.30729; .NET CLR 3.5.30729; InfoPath.2)
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; SLCC1; .NET CLR 1.1.4325; .NET CLR 2.0.50727; .NET CLR 3.0.04506.648)
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; SLCC1; .NET CLR 1.1.4325; .NET CLR 2.0.50727; .NET CLR 3.0.30729)
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; SV1; .NET CLR 1.1.4322; .NET CLR 2.0.40607; .NET CLR 3.0.04506.648)
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; SV1; .NET CLR 1.1.4322; .NET CLR 2.0.40607; .NET CLR 3.0.30729)
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; SV1; .NET CLR 1.1.4322; .NET CLR 2.0.40607; .NET CLR 3.0.30729; .NET CLR 3.5.30707; InfoPath.2)
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; SV1; .NET CLR 1.1.4322; .NET CLR 2.0.40607; .NET CLR 3.0.30729; .NET CLR 3.5.30729; MS-RTC LM 8)
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; SV1; .NET CLR 1.1.4322; .NET CLR 2.0.50727; .NET CLR 3.0.04506.648)
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; SV1; .NET CLR 1.1.4325; .NET CLR 2.0.40607)
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; SV1; .NET CLR 1.1.4325; .NET CLR 2.0.40607; .NET CLR 3.0.04506.648)
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; SV1; .NET CLR 1.1.4325; .NET CLR 2.0.50727; .NET CLR 3.0.30729; .NET CLR 3.5.30707)
If they carry on using the double spaces that might be another option for filtering out this traffic. We'll be monitoring the logs in any case and will report back any changes.
Update: So far it's consistent that these are the only requests in our server logs containing MSIE 6.0; surrounded by double spaces. If you wanted, you could use that as the blocking trigger as well as/instead of the rules below. For example:
RewriteCond %{REMOTE_ADDR} ^65\.55\.(104|107|109|110|165|232)
RewriteCond %{HTTP_USER_AGENT} " MSIE 6.0; "
RewriteRule .* - [F]
Note that there are two spaces at each end of the quoted string!
Blocking referer spam from Bing
As shown in our previous article it's possible to block this traffic from your website. They will still appear in the logfiles, but with a 403 Forbidden error, and the robot won't be able to request subsequent files which has been a problem in the past.
Here is the code we're using now to block both the old Live Search referer spam, which may or may not be relevant any more, and the new wave from Bing:
# block Microsoft referer spam
RewriteCond %{REMOTE_ADDR} ^65\.55\.(104|107|109|110|165|232)
RewriteCond %{HTTP_REFERER} (www\.bing|search\.live)\.com
RewriteCond %{HTTP_REFERER} !\&
RewriteRule .* - [F]
Most of this is explained in the preceding article. The fake requests from Bing are being blocked based on the fact that they contain only a single GET parameter and therefore do not contain the & character as would almost any 'real' search traffic.
What do you mean it's not spam?
There are a few alternative theories for why we might be seeing this traffic in the server logs and anlytics. Some of them are reasonable, while others are seriously confused:
Microsoft is trying to detect cloaked websites
Cloaking is a black hat search engine optimization (SEO) technique in which the content presented to the search engine spider is different to that presented to the user's browser.
This is an excuse that Microsoft put out in relation to the Live Search referral spam. At first it may seem reasonable, until you ask why none of the other major search engines (Google, Yahoo, Ask, Cuil, ichiro, ...) feel the need to do the same. Are they all just that much smarter in coming up with a solution?
Perpetrators of this myth often follow up by warning that 'if you block these requests your website will receive a ranking penalty or not be indexed'. That may be true or not, but it's a strange way to run a global search engine.
Search terms are being truncated by a browser bug
This is by far the most absurd theory I've heard to date. The suggestion is that even though someone may have searched for "back packer travel insurance" that the request could arrive with just the word "travel" in the referrer string.
This completely ignores the fact, as shown above, that: a) all the other GET parameters normally associated with a search referral are not present; and b) all these requests come from inside Microsoft Corporation!
Now there may be some kind of 'anonymizer' letting people make searches without their location or actual search terms being revealed, but what's the point then in sending a search string at all?!
Conclusion
Occam's razor tells us that the simplest explanation is usually the best, so until we see any evidence to the contrary we have to assume that it's spam intended to inflate Bing's search numbers.
On one of our servers we are now blocking up to 300 instances of referrer spam from Microsoft per day. The ratio of real to fake traffic is almost 1:1 meaning that if you're not doing any filtering you need to cut your numbers for Bing in half!
References
- Bing Referral Spam / False Visitor Stats
- Bing Classifies Cloaking Detection as "Single Word Query" Issue
- Bing Continues With Fake Referrers
Related Articles - Log Files
- SQL Using a PostgreSQL foreign data wrapper to analyze log files
- System Controlling what logs where with rsyslog.conf
- System Logging sFTP activity for chrooted users
- System Analyzing Apache Log Files
- System Bash script to generate broken links report
- System Blocking Unwanted Spiders and Scrapers
- System Fake Traffic from AVG
- System Referer Spam from Microsoft Bing
- System Referer Spam from Live Search