Spamassassin: Difference between revisions

Latest revision as of 11:54, 19 October 2024

Back
SpamAssassin is a mail filter to identify spam. It is an intelligent email filter which uses a diverse range of tests to identify unsolicited bulk email, more commonly known as Spam. These tests are applied to email headers and content to classify email using advanced statistical methods. In addition, SpamAssassin has a modular architecture that allows other technologies to be quickly wielded against spam and is designed for easy integration into virtually any email system.

This page is designed to give you an overview of how QmailToaster goes about configuring SpamAssassin.

Configuration and Rules

The SpamAssassin-Toaster uses the following configuration files:

/etc/mail/spamassassin/local.cf
/etc/mail/spamassassin/v310.pre
/etc/mail/spamassassin/v312.pre
/usr/share/spamassassin/*.cf

The local.cf file contains basic settings, like the score you must reach before a message is considered spam, what the subject line should be changed to if the score is reached (ie add ***SPAM*** to the subject) and whether Bayes Scoring should be used. The settings in here will apply to all users on your system.

The two .pre files tell SpamAssassin what plugins to load for applying different tests. these are in the format

loadplugin Mail::SpamAssassin::Plugin::MIMEHeader

You can find a list of available plugins on CPAN. Installing a plugin using CPAN goes like this:

   # cpan
   # install Mail::SpamAssassin::Plugin::URIDNSBL
   # quit

Here's how to find out what perl modules you have. If you are using the latest version of SpamAssassin-toaster then everything you need should already be installed.

The /usr/share/spamassassin/*.cf files are custom rule sets designed for catching spam using your installed modules. How each of them will add (or subtract) points from the mail's spam score is set by 50_scores.cf. If you are, for instance, a pharmaceutical retailer you probably want to lower the scores for the various drugs cf files.

Some of the files will only be used if the appropriate module is loaded, for instance 25_uribl.cf will only run if you have added

 loadplugin Mail::SpamAssassin::Plugin::URIDNSBL

to one of your .pre files.

You can find lots and lots of alternative rule sets at Rules Emporiumand you might want to join a SpamAssassin mailing list to keep your self up to date on the fight against spam while you are at it.

If you add rules (by creating a new .cf file in /usr/share/spamassassin) or add a module to a .pre file so new rules will be applied or basically make any changes to the SpamAssassin configuration files you must check that all the syntax is OK:

  # spamassassin -D --lint

If you see any errors, correct them before you restart the spamd service! The most likely thing you will see are missing perl modules. Add them using CPAN as you see above.

After you make any changes you need to restart the SpamAssassin service. You can do this using Jake's spamd script or by doing:

   # qmailctl stop
   # qmailctl start

Bayesian Statistical Scoring

SpamAssassin can score messages based on the words in a message because certain words are more probable to turn up in spam and others are more probable to show up in ham.

In order for this to be effective you need to train Spam Assassin. You will need a collection of spam messages and a collection of ham messages. You can do this by setting up a couple of email accounts on your server called spam@yourqmailtoaster.com and notspam@yourqmailtoaster.com. Forward all your spam mail to one and non-spam mail to the other, alright you might not want to forward all of your real mail to it but the more ham Spam Assassin has for comparison, the better. You should encourage your users to forward spam to the spam address and any false positives to the not-spam address. You might want to implement Squirrelmail Spam Buttons to make this easier.

Now create a script that looks like this:

#!/bin/bash
# Spam Assassin Bayes Training

# Learn spam!
DOMAIN=yourdomain
SPAM=your-spam-address
HAM=your-ham-address

cd /home/vpopmail/domains/$DOMAIN/$SPAM/Maildir/cur
/usr/bin/sa-learn --spam ./*
rm -rf /home/vpopmail/domains/$DOMAIN/$SPAM/Maildir/cur/*

cd /home/vpopmail/domains/$DOMAIN/$SPAM/Maildir/new
/usr/bin/sa-learn --spam ./*
rm -rf /home/vpopmail/domains/$DOMAIN/$SPAM/Maildir/new/*

# Learn ham!
cd /home/vpopmail/domains/$DOMAIN/$HAM/Maildir/cur
/usr/bin/sa-learn --ham ./*
rm -rf /home/vpopmail/domains/$DOMAIN/$HAM/Maildir/cur/*
cd /home/vpopmail/domains/$DOMAIN/$HAM/Maildir/new
/usr/bin/sa-learn --ham ./*
rm -rf /home/vpopmail/domains/$DOMAIN/$HAM/Maildir/new/*

Test it and use cron to run the script daily.

NOTE: just to belabor the point a bit more, this script deletes all the mail in the ham and spam directories. Do not just run this on your own inbox!

Use bounce a message to feed SA bayes

Jack Vickers' info per 2 Aug 2007:

Having the users forward the messages to an account like that is "a bad thing to do" according to the guys on the spamassassin mailing list. You need to bounce the messages to those address, not forward. By forwarding, programs like Outlook rewrite the headers, so your Bayes thinks that the spam messages are being sent from the user that is sending them.

How to bounce/redirect mail How to redirect/bounce mail for sa-learn

Further Info

SimScan is used by QMailToaster to run incoming mail through ClamAV and SpamAssassin. It is configured by the settings in /var/qmail/control/simcontrol. See Simscan for more details.
The SpamAssassin daemon is started by the /var/qmail/supervise/spamd/run script. man spamd for other options you can set in here.
SpamAssassin can be set up to check the body of messages against Spam URI Realtime Blocklists. See SURBL for more details.
You can also check incoming mail against Realtime Black Lists before the mail even reaches SpamAssassin. See RBLs for more details.

How to reset Spam Assassin Bayes Training

You can do any of the following commands:

# su vpopmail -c 'sa-learn --clear'

if you don't give user vpopmail a valid shell is:

# sudo -H -u vpopmail sa-learn --clear

User Sumbitted Scripts

Multi-domain sa-learn ham/spam script

Enhancement of the "Spam Assassin Bayes Training" script above. But use at your own risk :)

 #!/bin/bash
 ##
 ## Spam Assassin Bayes Training
 ##
 testrun=0 ## set to 1 to begin real-life use
 cd /home/vpopmail/domains/
 for i in *
   do
    echo -en "DOMAIN:\t$i\t"
    pre=$(echo $i | /bin/sed s/'\.'//g)
    spampre="$pre-spam" ## test.com ==> testcom-spam@test.com
    hampre="$pre-ham"   ## test.com ==> testcom-ham@test.com
    ##
    ## Process SPAM for the current domain
    ##
    echo -en "\tS: "
    if [ -d $i/$spampre ]
      then
        spamcount=0;
        cd $i; cd $spampre; cd Maildir; cd cur
        for spam in $(/usr/bin/find ./ -type f)
          do
            let spamcount=$spamcount 1
            if [ $testrun -eq 0 ]
              then
                /usr/bin/sudo -u vpopmail -H /usr/bin/sa-learn --spam $spam 1>/dev/null
                rm -f $spam
            fi
        done
        cd ..
        cd new
        for spam in $(/usr/bin/find ./ -type f)
          do
            let spamcount=$spamcount 1
            if [ $testrun -eq 0 ]
              then
                /usr/bin/sudo -u vpopmail -H /usr/bin/sa-learn --spam $spam 1>/dev/null
                rm -f $spam
            fi
        done
        cd ..; cd ..; cd ..; cd ..
        echo -en $spamcount
      else
        echo -en "NA"
    fi
    ##
    ## Process HAM for the current domain
    ##
    echo -en "\tS: "
    if [ -d $i/$hampre ]
      then
        hamcount=0;
        cd $i; cd $hampre; cd Maildir; cd cur
        for ham in $(/usr/bin/find ./ -type f)
          do
            let hamcount=$hamcount 1
            if [ $testrun -eq 0 ]
              then
                /usr/bin/sudo -u vpopmail -H /usr/bin/sa-learn --ham $ham 1>/dev/null
                rm -f $ham
            fi
        done
        cd ..; cd new; for ham in $(/usr/bin/find ./ -type f)
          do
            let hamcount=$hamcount 1
            if [ $testrun -eq 0 ]
              then
                /usr/bin/sudo -u vpopmail -H /usr/bin/sa-learn --ham $ham 1>/dev/null
                rm -f $ham
            fi
        done
        cd ..; cd ..; cd ..; cd ..
        echo $hamcount
      else
        echo "NA"
    fi
 done
 ##
 ## Update the Bayes DB
 ##
 if [ $testrun -eq 0 ]
   then
     /usr/bin/sudo -u vpopmail -H /usr/bin/sa-learn --sync
     /usr/bin/sudo -u vpopmail -H /usr/bin/sa-learn -u vpopmail --force-expire 1>/dev/null
 fi

Revision as of 09:10, 16 March 2024 (view source) Ebroch (talk \| contribs) (Created page with "SpamAssassin is a mail filter to identify spam. It is an intelligent email filter which uses a diverse range of tests to identify unsolicited bulk email, more commonly known as Spam. These tests are applied to email headers and content to classify email using advanced statistical methods. In addition, SpamAssassin has a modular architecture that allows other technologies to be quickly wielded against spam and is designed for easy integration into virtually any email syst...")		Latest revision as of 11:54, 19 October 2024 (view source) Ebroch (talk \| contribs) No edit summary
(One intermediate revision by the same user not shown)
Line 1:		Line 1:
			[[Configuration#Spamassassin\|Back]]<br>
	SpamAssassin is a mail filter to identify spam. It is an intelligent email filter which uses a diverse range of tests to identify unsolicited bulk email, more commonly known as Spam. These tests are applied to email headers and content to classify email using advanced statistical methods. In addition, SpamAssassin has a modular architecture that allows other technologies to be quickly wielded against spam and is designed for easy integration into virtually any email system.		SpamAssassin is a mail filter to identify spam. It is an intelligent email filter which uses a diverse range of tests to identify unsolicited bulk email, more commonly known as Spam. These tests are applied to email headers and content to classify email using advanced statistical methods. In addition, SpamAssassin has a modular architecture that allows other technologies to be quickly wielded against spam and is designed for easy integration into virtually any email system.

Spamassassin: Difference between revisions

Latest revision as of 11:54, 19 October 2024

Contents

Configuration and Rules

Bayesian Statistical Scoring

Use bounce a message to feed SA bayes

Further Info

How to reset Spam Assassin Bayes Training

User Sumbitted Scripts

Multi-domain sa-learn ham/spam script

Navigation menu

Spamassassin: Difference between revisions

Latest revision as of 11:54, 19 October 2024

Configuration and Rules

Bayesian Statistical Scoring

Use bounce a message to feed SA bayes

Further Info

How to reset Spam Assassin Bayes Training

User Sumbitted Scripts

Multi-domain sa-learn ham/spam script

Navigation menu

Search