Sunday, September 30, 2012
Eliminating comment spam on Blogger blogs
by Larry Geller
Over the past several months the number of spam comments generated by spambots or by human comment factories has increased drastically. While Blogger has some kind of comment spam filter, it seems to miss all of these. Each day I seem to be getting more of them.
To eliminate useless, malicious and spam comments, I moderate comments submitted to Disappeared News. So none of these comment factory spams get through, but it is very annoying.
Here’s an example of one of these that came in today:
They’re pretty dumb, but more important: it would be a disservice to readers to let them through. The link could lead to an innocent ad page, or it could download malware into a reader’s computer. The link could be a porn site or some form of fraud. It could change a browser’s home page. You get the point.
I’ve found an easy way to stop them though. So far it works very well.
My regular email is filtered through a program called Popfile which I have used for umpteen years. Blogger sends comments to be moderated via email. So I thought I would see if Popfile could recognize these spam factory comments. It can. Piece of cake. After a very short training (fewer than a dozen messages) it has figured it out.
The classification is done using a naïve Bayes algorithm. In other words, POPFile uses statistics to track which words are likely to appear in which messages. This means that POPFile will adapt to the kind of mail you receive and needs to be trained. Out of the box, it doesn't know anything about spam or how messages from your mother differ from those your friends send you. However, if you train it, it will soon learn how to tell these different kinds of messages apart.
Most of the messages contain flattery or effusive praise, so any spam filter that employs Bayesian filtering should work as well. Popfile is only one example of these programs.
So now I have my spam comments neatly stored in a Spam Comment folder in my email program, and legitimate comments remain untouched.
For me, moderation of comments is essential. I don’t think the crap that is often attached to unmoderated or poorly moderated sites such as many newspaper websites contributes to understanding. Hey—it’s my blog. You’d be surprised that the vitriolic crap that sometimes comes in. And then the author pounds me for being against the First Amendment or insults my mother or something like that.
So if you are a blogger moderating comments via email, see if the spam filter you may already be using can be trained to recognize blog spam. Or try Popfile or one of the other popular filters based on the Bayes algorithm.
Maybe my spammers will see this and go away.
You know, if you were to preview those porn sites you refer to you, and maybe just let through the top 10 links, well, I'm just saying that would be helpful to busy readers.....I mean, as a service.....just saying...