Clearing spamassassin BAYES filter tokens



I recently had a problem where my Spamassassin install started thinking that a lot of spam messages were really ham (non-spam). Since these were getting BAYES_00 scores of -2.5 they were almost all getting through my spam filter. These particular messages all were regarding STOCK quotes and were pretty obviously spam just by looking at the text of the messages. Somehow my Spamassassin install thought that they were not spam because the messages were being passed as ham by the Bayesian filter. Since they kept getting past, the bayesian filter kept learning them as HAM.

In order to break this vicious cycle, you just need to clear out the bayesian tokens. It’s very easy to do. As root user, type:

1
sa-learn --clear

This starts you fresh. By default, Spamassassin won’t use the bayes filter until it has 200 spam and ham messages, so until you get to that level it will continue to learn based the other Spamassassin detection settings.

Ideally, I would have sa-learn train using these spam messages. But since I use Outlook, and there is no “easy” way to have it interface with sa-learn, I find it easier to clean out the bayes tokens every once in a while. SpamAssassin Coach is a plugin for outlook which should connect to your spamd server and “learn” a message as ham or spam. But in practice, it did not work for me. It looks like the project has a lot of potential.

For more information on how Bayesian filtering works, check out this wikipedia article.