SpamAssassin Bayes plugin brings me gift of 100% filter accuracy

Dec 09 was the first month I had achieved 100% spam catch rate since I started filtering my own mail a few years ago. Thanks, SpamAssassin Bayes plugin! Merry Xmas to you as well!

------------------------------------
Stats for Dec 2009
------------------------------------
Ham	SpamC	SpamR	SpamM	HamC
160	392	1273	0	0
--------------------------------------------------------------
1825		Total messages
1665		Total Spam (Caught + Missed + Rejected)
91.23%		Spam as % of all mail
76.45%		% of Spam rejected by Postfix at SMTP time
0%		False positive rate (Ham misclassified as Spam)
0%		False negative rate (Spam misclassified as Ham)
100.00%		Spam catch rate (Spam filter accuracy)
--------------------------------------------------------------

Long version:

My company had switched to Google Apps for email and we saw an increase in the number of false negatives for spam sent to one of the email addresses published on our website. The amount was unbearable enough (a few dozen per day) that I asked our email guy to send mail from that alias to my personal email address to see if my anti spam setup could do better than Google's. It did, but not well enough as the number of false negatives jumped from my usual 1-4 per month to 16 in Nov. The identification rate was still better than most commercial anti spam solutions out there (see test results by PC World, test results by Network Computing or test results by PC Magazine), but seeing more than a couple of spams per month is too much for me, regardless of how good a rate it might represent:

------------------------------------
Stats for Nov 2009
------------------------------------
Ham	SpamC	SpamR	SpamM	HamC
175	767	687	16	0
--------------------------------------------------------------
1645		Total messages
1470		Total Spam (Caught + Missed + Rejected)
89.36%		Spam as % of all mail
46.73%		% of Spam rejected by Postfix at SMTP time
0%		False positive rate (Ham misclassified as Spam)
.97%		False negative rate (Spam misclassified as Ham)
98.91%		Spam catch rate (Spam filter accuracy)
--------------------------------------------------------------

The reason for the jump in false negatives, was that my primary anti spam tool (rejecting spam at SMTP time with Postfix) was hampered by the fact that this spam was forwarded to me after having been accepted by Google, as such, it was coming from Google's SMTP servers and because Google MX servers aren't listed in RBL, my MX would accept the spam.

Previously, the amount of spam that got through was low enough that I didn't even bother with statistical analysis tools. But in Nov, I enabled/configured the Bayes plugin for SpamAssassin. Looks like this one piece is what I needed to achieve 100% spam filtering accuracy. 0 spam is a number I can live with!

When I have a spare minute, I'll see about complementing my existing toolkit with a solution based on Markovian discrimination, which, if you believe the Wikipedia article about it, might make for a more accurate statistical analysis based spam filter than the Bayesian method.

Update, Jan 13: I should note that after a few weeks, Google support finally came through with a solution which reduced false negatives from that particular email alias to zero. Aside from slow resolution time, I'm impressed with the quality of their filter.

Update, Feb 12: Spoke too soon -- so far in Feb, Google let 11 spams through. During the last two weeks of Jan, it let through 10 spams. While Google's spam filter may be better than those of some of the other free email providers, it still allows through an unacceptable amount of spam and falls well below what properly configured SpamAssassin setup can achieve.

Leave a comment

NOTE: Enclose quotes in <blockquote></blockquote>. Enclose code in <pre lang="LANG"></pre> (where LANG is one of these).