Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Perl > Perl Misc > Spam Filter Pattern Matching

Reply
Thread Tools

Spam Filter Pattern Matching

 
 
mossoft
Guest
Posts: n/a
 
      01-30-2004
I use SpamAssassin as a SPAM detector, the rules for the Bayes filter
appear to be Perl based.
I need a rule which detects a string in the subject like "Re: ABCDE,
random three words", where the ABCDE bit can be between 2 and 8 upper
case characters, and I came up with:

/Re: [A-Z]{2,8}, .{1,20}? .{1,20}? .{1,20}?/i

Does this look about right to all you experts?

Ta.

M.
 
Reply With Quote
 
 
 
 
Dan Wilga
Guest
Posts: n/a
 
      01-30-2004
In article <(E-Mail Removed) >,
http://www.velocityreviews.com/forums/(E-Mail Removed) (mossoft) wrote:

> I use SpamAssassin as a SPAM detector, the rules for the Bayes filter
> appear to be Perl based.
> I need a rule which detects a string in the subject like "Re: ABCDE,
> random three words", where the ABCDE bit can be between 2 and 8 upper
> case characters, and I came up with:
>
> /Re: [A-Z]{2,8}, .{1,20}? .{1,20}? .{1,20}?/i


The one I wrote yesterday (but haven't tested yet) is:

^Re:\s[A-Z][A-Z]+,(\s[a-z]+){3}

I'd rather not assume the CAPS part will be from 2-8 chars, or that any
of the individual words will be from 1-20 chars.

In my experience, these subjects always have all lowercase alphas in the
three words after the comma, so using "." here is overkill, IMHO.

I've also found when writing regexps that \s is your friend. It's almost
always preferable to use \s (or even \s+), rather than assume the
character will be a real space. It might be a tab or a carriage return.
Granted, it's not too likely in an email subject, but as a general rule
it's very often true, and costs next to nothing.

--
Dan Wilga (E-Mail Removed)
** Remove the -MUNGE in my address to reply **
 
Reply With Quote
 
 
 
 
Dan Wilga
Guest
Posts: n/a
 
      01-30-2004
In article <(E-Mail Removed)>,
Dan Wilga <(E-Mail Removed)> wrote:

> The one I wrote yesterday (but haven't tested yet) is:
>
> ^Re:\s[A-Z][A-Z]+,(\s[a-z]+){3}


No sooner did I write the above, then I got a piece of spam with an
apostrophe in the three words at the end .

Perhaps this would work better:

^Re:\s[A-Z][A-Z]+,(\s[a-z\']+){3}

--
Dan Wilga (E-Mail Removed)
** Remove the -MUNGE in my address to reply **
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
What is Anti-Spam Filter.(thunderbird spam filter) zax75 Java 1 03-28-2008 06:43 AM
Pattern matching : not matching problem Marc Bissonnette Perl Misc 9 01-13-2004 05:52 PM



Advertisments