Monday, April 12, 2010

Beware, Comment Spammers!

I had this great idea about how to fight comment spam. If you're not familiar with comment spam, you probably don't have your own blog and you think that "Kathryn" and "Patrick" who try to comment on this blog are just brain dead people. You might be right about the brain dead part, but I'm not sure they're really people.

Do you ever wonder why commenting on blogs can be such a hassle, or why so many blogs require moderation, or why many blogs don't accept comments on older posts, or forbid links in comments? It's because of comment spam. Spammers will submit comments such as "Your post is helpful and informative" or "We need to pay attention to the eco friend environment" that don't address the topic of the post in question. I'm not talking about targeted self-promotion here. It's not comment spam to link to an article you wrote on a similar topic, but it's definitely comment spam if you use a robot to do so. Or if you hire people in Asian boiler rooms to get around the CAPTCHA's that stop your robots.

It used to be that comment spam was done to improve the search engine ranking of websites. That motivation has largely gone away with the development of the "nofollow" tag. Blogs such as "Go To Hellman" attach add rel="nofollow" to any links in the comment threads. This tells spidering robots not to follow the specified links and tells search engines to ignore the links for purposes of site ranking.

I guess the people who have been leaving spam comments on my blog didn't get that memo. It's annoying to have to delete the comments, especially the ones in Chinese where links get hidden around the periods in "...". I went to the Blogger help pages to see if there's any way to report the abusive commenters (this blog restricts anonymous comments, so there's at least a user profile for every comment). There isn't. What's worse, Google tells you that if you don't remove those spam comments, your site's ranking will be hurt. Then I had my bright idea. I clicked on one of the links left in the spam comment. Then I picked some keywords from the page and plugged them into Google to find the site. There, at the bottom of the search result, was an option: Dissatisfied? Help us improve. Google is asking for feedback. I pasted in the URL for my comment spammer's site, and checked the radio button labeled "The results included spam." I clicked send, and my spammer's site was bound for Google oblivion!

Beware, comment spammers, I'm going to report you!

Though I felt good about it, I started to have doubts. A lot of these comment spammers seemed to be Asian; could it be that Asian search engines didn't get the nofollow memo either? Some quick googling confirmed my suspicion, China's leading search engine, Baidu, doesn't pay attention to the nofollow attribute! These comment spammers must be using my blog to juice their Baidu ranking!

Well maybe not. I did a few searches in Baidu. Baidu is probably the worst internet search engine I've ever tried! Baidu gives really stupid results for my vanity search. Baidu doesn't index my blog, my website, or anything I've ever posted. Perhaps China has blacked out the entire Google network, including Blogger, and Baidu doesn't see it any more. Or perhaps "Go To Hellman" has been banned for its post on Qin Shi Huangdi. Baidu has spidered a page from WorldCat that mentions some other Eric Hellman, and has picked up blog mentions of my by John Blyberg and in Dear Author but not much else. It's safe to assume that Baidu's strength is not English-language indexing.

So if Baidu doesn't index my blog, then spammers shouldn't be able to improve their Baidu rankings with comment spam in my blog. There must be some other motivation for the comments.

Another thing I noticed is that Baidu seems to be big on searching for MP3's and PDF's. It ranks sites like Rapidshare rather highly. Maybe Baidu and similar search engines spider websites like my blog to discover the mp3 files, the PDFs, and the video files that Baidu users are really looking for, and the intended audience of the spam comments is these content spiders. My blog has discussed ebooks, piracy and related topics, so maybe the spammers think its a good source for links to content. Who knows?

Another possibility is that the spammers are trying to get bloggers themselves to visit the their sites. "Patrick" from Madras is trying to sell "web templates". It turns out that his site has copied content from another site marketing web templates, which appear to me to be copies of other websites with much of the content stripped out. It's ironic: Patrick seems to be using a template for a web-template selling website to sell web templates.

After a few days, I checked back to see if the website I had complained about had been removed from Google or not. As it turns out, the site actually improved its Google ranking from #5 to #1 in my test search. So much for my career in comment spam scourgedom!
Reblog this post [with Zemanta]

16 comments:

  1. I started to make a list of amusing spam comments like you mentioned (see it here: http://docs.google.com/View?id=dfr2jdcs_262gxmwmrfd)

    I had 100's of spam comments on my blog every day. I noticed the vast majority were submitted for only one post, about ebooks (http://commonplace.net/2009/11/is-an-e-book-a-book/). Eversince I disabled comments for just that one post, I only get very few spam comments anymore.

    ReplyDelete
  2. I get a couple comment spams a day on my blog. They're automatically detected and blocked though. There are a few tricks you can use to detect bot behaviour. I wrote about the one I use on my website here:

    https://secure.grepular.com/Blocking_Comment_Spam_Using_ModSecurity_and_Hidden_Fields

    This method hasn't let a single bot comment through in months.

    ReplyDelete
  3. I've noticed that comment spammers find their targets using search terms such as inurl:"node" intext:"post a comment" -"comments are closed" writing service. Thus, the invisibility incantation to the right -->

    ReplyDelete
  4. Stopping coment spammers is really easy - add a mod to your site that has a random text question a human user has to answer as well as capcha.
    Secondly analyzer your logfiles and traffic from ips and use project honey pot to determine if they are spammers. Then block those ips or ranges at the server level. Linux/apache users can easily use mod rewrite and htaccess. Windows/IIS users simply use deny access.

    ReplyDelete
  5. Captcha is now not such affective. Spammers are using De-captcha software to avoid that. I think the best option is question answer. Means asking a question whose answer can only retrieved by search engine.
    Investing in Property

    ReplyDelete
  6. Captcha should be some advance now. There are some software that can decaptcha those images.There should some scrolling task or game which can only be played by mouse, that will help to reduce spammers.
    Property Investment

    ReplyDelete
  7. Nice journal with terribly attention-grabbing and helpful data on your website. Thanks for sharing the journal and this nice data that is unquestionably planning to facilitate us...

    cheapest dedicated server host

    ReplyDelete
  8. This is Good information about this topic..I like it..Aerial Advertising Australia..Keep it Up!

    ReplyDelete
  9. This comment has been removed by the author.

    ReplyDelete
  10. Wow this blog is awesome. Wish to see this much more like this. Thanks for sharing your information!
    Best website designing company in Laxmi Nagar

    ReplyDelete
  11. It is a fantastic post – immense clear and easy to understand. I am also holding out for the sharks too that made me laugh. socila marketing seo solutions

    ReplyDelete
  12. Yes i am totally agreed with this article and i just want say that this article is very nice and very informative article.I will make sure to be reading your blog more. You made a good point but I can't help but wonder, what about the other side? !!!!!!Thanks Ann Arbor Apartments

    ReplyDelete
  13. A good blog always comes-up with new and exciting information and while reading I have feel that this blog is really have all those quality that qualify a blog to be a one. Crypto Site

    ReplyDelete
  14. so happy to find good place to many here in the post, the writing is just great, thanks for the post. 소셜그래프

    ReplyDelete
  15. I was surfing the Internet for information and came across your blog. I am impressed by the information you have on this blog. It shows how well you understand this subject. Best Cheap Air Flights.com

    ReplyDelete