Handling Spam

by
Annika Backstrom
in Meta, on 20 July 2005. It is tagged and #Spam.

I really enjoy comments on my blog. I enjoy the feedback mechanism, and it's a good way to keep in touch. Spam, though, is an obvious problem. I don't require any registration, I don't use capchas, I don't require moderator approval before the comment goes live. Essentially, I've given spammers a big piece of my playground.

Have you ever seen compliment spam on this site? There are no links, just some message to the tune of "I loved your site!," obviously automated. I can only assume they exist to muck up filters and make comment moderation hard, if not impossible.

For a long, long time I've been quarentining spam. I've added a comments_removed table to my WordPress database, and I've modified the stardard "Delete" mechanisms to instead move comments into this holding pen. I can use this table for tracking, statistics, as a corpus for eventual bayesian filtering, etc. As a bonus, if I accidentally "delete" a comment, the result is non-destructive. Recently, I've taken things a step further.

Every so often, I visit my blog's admin section and backtrack through the spam a page at a time. WordPress helps me out. I can search for keywords ("poker," "holdem," "levitra"), then click "Invert Checkbox Selection" and "Delete". Eventually, I hit a wall of good comments and I have to tread more carefully. Even worse is when a comment is posted in the midst of all the spam. I have to work around it for the rest of the session until it's back with the rest of the good comments. This is not ideal. I can mark a comment as spam, so why can't I mark it as "ham"? (The anti-spam, as it were.)

As of last week, my comment tables include a new column: comment_blessed. Set the value to "1", and the comment will be ignored in the admin panel comment list. The initial version has removed the spam/ham wall, since all the known-good comments are hidden during the deletion process. Future additions will include a "Bless" action so that I can bless without accessing the database directly.

It's not ideal (I'm still getting spam, after all), but it's progress. Spammers beware.