If we should have learned anything from fighting unwanted email, it’s that filtering based on content is difficult at best and requires lots of vigilance.
With Weblog comment spam, I wonder why we’re implementing the same techniques. Is it because they’re easier to do? I’d rather just ban the URLs that are being spammed, myself. Whether the URL shows up in the body of the comment or is the “author”‘s URL, it ought to be checked against a filter.
This doesn’t mean that you have to add URLs on a spam-by-spam basis. I think you can reasonably employ word-checking within URLs themselves. In dealing with referral spam, I’ve been finding that blocking by words inside the referral URL is the easiest solution for me to employ. If you set up to filter “xxx”, “pharmacy”, “viagara”, and “adult”, you clear a lot of URLs.
Of course, the next stage is obfuscatory URLs, but I think you have to deal with that as time goes by.
I’ll note that WordPress’s comment spam filtering does seem to scan URLs against your spam words.
[Note: this entry inspired by a running discussion with Jeff.]