11 March 2007

Spam Catch of the Day: 21 Spams Per Hour, and better spam handling

I activated the Akismet plugin for SOB a couple months ago, after a long period of not using it. I had already tried, evaluated, and discarded the Akismet plugin for a number of reasons, including the fact that it had an atrocious interface with abominable shortcomings, back in November 2006. The single biggest problem with the plugin was the fact that it would not allow me to see more than 150 of the items in the spam filter queue, which is just unacceptable since Akismet regularly misidentifies a legitimate comment as spam. To give you an idea of how bad that is, I just skimmed over and deleted 509 spam comments today that had accumulated in the last 24 hours (at an average rate of 21sph, or Spams Per Hour). That's roughly 70% of comments in the spam filter that I wouldn't have been able to double-check for legitimacy — unacceptable.

Since then, and before I reactivated the Akismet plugin, they solved that problem. You can now apparently view the entire queue. At the very least, I know for a fact that you can view more than 1,700 "caught" comments, because I have had more than 1,700 in the filter at one time in the last couple months, and I was able to page through them all.

Unfortunately, the interface is still atrocious, even if the shortcomings aren't quite abominable any longer. The standard WordPress moderation queue is much more conducive to efficiently skimming (and with greater accuracy), it actually allows deletion of one page worth of comments in moderation at a time, it has a far more acceptable rate of false positive errors since it judges based on criteria of your own making rather than some distant server's blacklist-based heuristic, and so on. The ability to moderate a single page's worth of "caught" comments is especially nice, since that means that when I get done going through 1,700 comments (that's more than 30 pages), I don't have to go back to the first few pages again to make sure the comments that arrived while I was going through the queue aren't false positives. It also means that if I get halfway through the queue, but then have to leave to make a meeting or get some sleep or something, I'm secure in the knowledge that I've already deleted everything positively identified as spam — rather than having to start over from page one again. Having to choose between deleting everything or nothing is really not much of a good user experience.

The reason I suffered this annoyance for the last couple months, having turned on the Akismet plugin again, is simply that the moderation queue does not by default manage trackbacks and pingbacks at all. I started getting a lot of trackback and pingback spam, and needed to do something about that. After some research, I've finally decided to replace the Akismet plugin with the Trackback Validator Plugin. I'm going to see how this works out for me. Basically, it is meant to catch any trackbacks or pingbacks that link to pages where there is, in fact, no link back to the weblog, thus proving they're not legitimate trackbacks or pingbacks. I have seen a couple of spammy trackbacks and pingbacks that do link back, and I'll have to see what happens to those when they hit the TBValidator as I test this out.

I've found another trackback validation plugin, apparently based on this one, that places spam-test positives into the moderation queue. This strikes me as backwards, since any page that doesn't link back but is identified in a trackback or pingback seems obviously illegitimate, and in case of false negatives the rest of the trackbacks and pingbacks should go into moderation — perhaps with whitelisting capability. I avoided the derivative version, then, and stuck with the original TBValidator.

If this doesn't work, I'll just pick apart the operation of the derivative plugin to see if it provides any obvious hints on how to simply catch all trackbacks and pingbacks, and send them all into moderation, so I can sort them myself.

I've also considered writing my own captcha-type "Are you human?" validation plugin for comments, so that for a change I'd have something available that works without requiring readers/commenters to accept cookies, register, log in, have JavaScript enabled, or something similarly limiting and ridiculous, but I'm still hopeful that I'll find something that Just Works, in a sane and facilitating fashion rather than a broken, tyrannical, counterproductive fashion. Wish me luck with the TBValidator.


  1. Good luck, and let us know how it works out for you!

    Comment by Sterling Camden — 11 March 2007 @ 04:56

  2. Thanks, Sterling. I shall.

    So far, it's working great. I can scan my moderation queue when I get notices via email by going through them in mutt (my mail user agent), which is much more precise and quickly scannable, and so far I haven't seen a single spamback get through the filter. It's working beautifully.

    Comment by apotheon — 12 March 2007 @ 01:40

  3. [...] Not long ago, in a post titled Spam Catch of the Day: 21 Spams Per Hour, and better spam handling, I discussed some of the problems I have with the Akismet spam filter plugin for WordPress. I also indicated that I would be using the Trackback Validator Plugin in concert with the WordPress built-in moderation queue to capture spam. [...]

    Pingback by Chad Perrin: SOB » Spam Catch of the Day: the effectiveness of my anti-spam solution (and a joke) — 20 March 2007 @ 12:36

