Since unruly bots have been plaguing my blogs, I decided to set up a few bot traps to log them and notify me when they pay a visit.
My blogs aren’t big and popular so I was expecting that it may take days before something shows up. But I was surprised that not long after implementation, I had already caught two!
The first bot is from Latvia, operating out of the Ad Technology SIA network.
Ad Technology SIA bot
IP: 188.92.73.175
user-agent string: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)
This bot makes itself look like a browser but it turns out that this IP is used by a known spammer who appears on Project Honeypot’s blacklists!
I didn’t need any more reason than that. I banned the entire network and I’d suggest people do the same. The CIDR is 188.92.72.0/21
The second bot is the well-known Baiduspider search engine bot from China:
Baiduspider
IPs: 119.63.198.107, 119.63.198.109
user-agent string: Baiduspider+(+http://www.baidu.jp/spider/)
I Googled Baiduspider and found that it has been a thorn in the sides of webmasters for years. Besides being badly behaved, it has also accessed sites from multiple IPs simultaneously while not identifying itself on concurrent connections. That seems pretty underhanded to me.
Since I can afford to, I banned Baiduspider from my site. Besides… who in China would want to read my blogs anyway?
While being indexed by search engine bots is a good thing in general, some of them hammer your site incessantly and in the process, suck up a heck of a lot more bandwidth and increase server load.
With the banning of Yandex, cuil.com’s Twiceler and now Baiduspider, the resource load for my blogs have dropped significantly. I’m not on an unlimited hosting account so the bytes matter. If my site resources are going to be used, I’d rather have them used by actual visitors and upstanding search engines.