Chad Perrin: SOB

11 March 2007

Spam Catch of the Day: 21 Spams Per Hour, and better spam handling

Filed under: Geek,Metalog — apotheon @ 03:42

I activated the Akismet plugin for SOB a couple months ago, after a long period of not using it. I had already tried, evaluated, and discarded the Akismet plugin for a number of reasons, including the fact that it had an atrocious interface with abominable shortcomings, back in November 2006. The single biggest problem with the plugin was the fact that it would not allow me to see more than 150 of the items in the spam filter queue, which is just unacceptable since Akismet regularly misidentifies a legitimate comment as spam. To give you an idea of how bad that is, I just skimmed over and deleted 509 spam comments today that had accumulated in the last 24 hours (at an average rate of 21sph, or Spams Per Hour). That’s roughly 70% of comments in the spam filter that I wouldn’t have been able to double-check for legitimacy — unacceptable.

Since then, and before I reactivated the Akismet plugin, they solved that problem. You can now apparently view the entire queue. At the very least, I know for a fact that you can view more than 1,700 “caught” comments, because I have had more than 1,700 in the filter at one time in the last couple months, and I was able to page through them all.

Unfortunately, the interface is still atrocious, even if the shortcomings aren’t quite abominable any longer. The standard WordPress moderation queue is much more conducive to efficiently skimming (and with greater accuracy), it actually allows deletion of one page worth of comments in moderation at a time, it has a far more acceptable rate of false positive errors since it judges based on criteria of your own making rather than some distant server’s blacklist-based heuristic, and so on. The ability to moderate a single page’s worth of “caught” comments is especially nice, since that means that when I get done going through 1,700 comments (that’s more than 30 pages), I don’t have to go back to the first few pages again to make sure the comments that arrived while I was going through the queue aren’t false positives. It also means that if I get halfway through the queue, but then have to leave to make a meeting or get some sleep or something, I’m secure in the knowledge that I’ve already deleted everything positively identified as spam — rather than having to start over from page one again. Having to choose between deleting everything or nothing is really not much of a good user experience.

The reason I suffered this annoyance for the last couple months, having turned on the Akismet plugin again, is simply that the moderation queue does not by default manage trackbacks and pingbacks at all. I started getting a lot of trackback and pingback spam, and needed to do something about that. After some research, I’ve finally decided to replace the Akismet plugin with the Trackback Validator Plugin. I’m going to see how this works out for me. Basically, it is meant to catch any trackbacks or pingbacks that link to pages where there is, in fact, no link back to the weblog, thus proving they’re not legitimate trackbacks or pingbacks. I have seen a couple of spammy trackbacks and pingbacks that do link back, and I’ll have to see what happens to those when they hit the TBValidator as I test this out.

I’ve found another trackback validation plugin, apparently based on this one, that places spam-test positives into the moderation queue. This strikes me as backwards, since any page that doesn’t link back but is identified in a trackback or pingback seems obviously illegitimate, and in case of false negatives the rest of the trackbacks and pingbacks should go into moderation — perhaps with whitelisting capability. I avoided the derivative version, then, and stuck with the original TBValidator.

If this doesn’t work, I’ll just pick apart the operation of the derivative plugin to see if it provides any obvious hints on how to simply catch all trackbacks and pingbacks, and send them all into moderation, so I can sort them myself.

I’ve also considered writing my own captcha-type “Are you human?” validation plugin for comments, so that for a change I’d have something available that works without requiring readers/commenters to accept cookies, register, log in, have JavaScript enabled, or something similarly limiting and ridiculous, but I’m still hopeful that I’ll find something that Just Works, in a sane and facilitating fashion rather than a broken, tyrannical, counterproductive fashion. Wish me luck with the TBValidator.

Since I’ve started using a feed aggregator . . .

Filed under: Metalog — apotheon @ 01:58

Since I’ve started using a feed aggregator (the Google Reader — excellent for my purposes, when combined with its widget on the personalized Google homepage), I’ve actually been keeping up with quite a few weblogs. It’s just not very easy to visit many weblogs individually, aggregation methodologies like the LiveJournal friends page are simultaneously both too limited and too easily filled up with stuff that puts me to sleep, and every email aggregator I’ve come across is broken in some way. Of course, part of the reason for that is likely the fact that I use a text-only mail user agent rather than some bloated HTML-email client, and that’s not likely to change.

Since I’ve started using a feed aggregator, I’ve discovered that the net result is that I visit others’ weblogs a lot more often. I do so when I want to link to them from SOB, when I want to save what I’ve read to go over it again later, when I want to IM URLs to people I know, when I want to submit them to reddit, and so on. A good syndication feed promotes more page views on the weblog itself, at least where I’m concerned — as statistical studies have shown that increased online music filetrading actually correlates with increased CD sales (though causation is as yet sketchy, whereas it is not at all sketchy in my online reading habits).

Since I’ve started using a feed aggregator, I’ve discovered some factors that contribute to my continued reading, and to my likelihood to unsubscribe:

  1. Only providing takeaways in the feed, rather than the full text of a post, increases the likelihood I will unsubscribe — which means I’ll probably stop reading, and visiting, your weblog altogether. If I can’t get enough information from the syndicated text to determine whether it’s worth reading the rest, I’m likely to decide it’s not. Rather than prompting people to come to the website, your first-paragraph feed may in fact be driving people away. So much for that all-important ad revenue.
  2. Lack of effective spam blocking increases the likelihood that I will unsubscribe from a comment feed. This being the case, I’m glad that I bit the bullet and decided to go ahead and use a really user-unfriendly spam filter for SOB rather than using one with an actually useable, helpful interface that doesn’t prevent spam trackbacks from getting through.
  3. Link posts don’t make me want to read your weblog. I like the occasional link post, as long as it’s absolutely clear what’s on the other ends of those links and as long as the links are of interest to me, though cryptic descriptions make me more likely to just ignore link posts entirely since I just don’t have time to click through every single link on the web to see whether any of them are interesting. On the other hand, original, thoughtful content on subjects of interest to me do make me want to read your weblog. They even make me want to check out your link posts. If your weblog consists of more link posts than original content, I’m likely to unsubscribe.
  4. The occasional analysis and introspection related to the broader weblogging social network is interesting stuff, especially if you discuss some theories of principle or provide data with interesting implications about the interconnectedness of things. Spending 90% of your time talking about your interactions with other webloggers, on the other hand, just gets boring after a while.
  5. Mislabeling your political leanings, talking up your affiliations and beliefs without actually talking about them in any substantive manner, or otherwise misrepresenting your weblog’s content such that it looks like it would be of interest to me based on politics when in fact it isn’t, is a great way to make me unsubscribe — even if some of the other content is interesting once in a while.

Since I’ve started using a feed aggregator, I’ve learned a lot about what not to do in my own weblog.

(note: I just unsubscribed from a few feeds today.)

All original content Copyright Chad Perrin: Distributed under the terms of the Open Works License