Velocity Reviews

Velocity Reviews (http://www.velocityreviews.com/forums/index.php)
-   Python (http://www.velocityreviews.com/forums/f43-python.html)
-   -   python spam filter: random words? (http://www.velocityreviews.com/forums/t320831-python-spam-filter-random-words.html)

revyakin 08-11-2003 01:13 AM

python spam filter: random words?
 
I know fighting spam is like fighting global worming, but still..
50% of spam I get these days contains a random combination of letters
at the end of the subject line. Has anyone tried using that feature in
antispam filters? Since python is the only lang I am more or less
fluent in as an amature scripter, I was wondering if anyone in this
goup has comments on this idea.
Also, is it reivial make a python script filter executable from a
generic mail program like OE, or NS messenger?
I am also wondering why spammers add that stuff to their subject lines
anyway.

Ben Finney 08-11-2003 01:28 AM

Re: python spam filter: random words?
 
On 10 Aug 2003 18:13:53 -0700, revyakin wrote:
> I know fighting spam is like fighting global worming, but still..

^^^^^^^^^^^^^^
Given that some spam contains e-mail worms, the typo is appropriate :-)

> 50% of spam I get these days contains a random combination of letters
> at the end of the subject line. Has anyone tried using that feature in
> antispam filters?


My experience has been that this practice is dropping off, since
Bayesian statistical-analysis filters will glide right by random words
as "not statistically significant.

What I'm seeing now is spam with words taken straight from the "likely
good" word lists of Bayesian filters :-)

> I am also wondering why spammers add that stuff to their subject lines
> anyway.


To defeat spam filters that check for the occurrence of a known spam
message they've seen before. As noted above, though, these are being
superseded by Bayesian word metric analysis.

--
Ben Finney

Sean 'Shaleh' Perry 08-11-2003 03:44 AM

Re: python spam filter: random words?
 
On Sunday 10 August 2003 18:28, Ben Finney wrote:
> What I'm seeing now is spam with words taken straight from the "likely
> good" word lists of Bayesian filters :-)
>


this was recently discussed on the spambayes list (the nifty Python
implementation of Paul Graham's ideas).

Apparently there are not enough uses of the word to make it statistically
interesting so spambayes ignores it. Or something like that. See the thread
there for full details.



Terry Reedy 08-11-2003 04:19 PM

Re: python spam filter: random words?
 

"Marc Wilson" <marc@cleopatra.co.uk> wrote in message
news:pa1fjvck8jidlj2ne5e63esmd2sk2ndl6v@4ax.com...
> In comp.lang.python, revyakin@yahoo.com (revyakin) (revyakin) wrote

in
> <fa06e058.0308101713.9679884@posting.google.com> ::
>
> |I know fighting spam is like fighting global worming, but still..
> |50% of spam I get these days contains a random combination of

letters
> |at the end of the subject line. Has anyone tried using that feature

in
> |antispam filters?
>
> How do you detect "random" letters? You can only (programmatically)
> determine that a character sequence is "random" if it doesn't appear

in some
> sort of dictionary, and even there you have the risk of false

positives due
> to typos, acronyms etc.


Looking at successive letter pairs would go a long way. Out of the
(26+space)**2 conbinations, perhaps half occur in real words (ie, 'qx'
is a giveaway). Using triples would allow inclusion of common
three-letter acronyms as legal.

Terry J. Reedy



Marc Wilson 08-12-2003 02:51 PM

Re: python spam filter: random words?
 
In comp.lang.python, "Terry Reedy" <tjreedy@udel.edu> (Terry Reedy) wrote
in <-CSdnbpTEIS0X6qiXTWJkQ@comcast.com>::

|> How do you detect "random" letters? You can only (programmatically)
|> determine that a character sequence is "random" if it doesn't appear
|in some
|> sort of dictionary, and even there you have the risk of false
|positives due
|> to typos, acronyms etc.
|
|Looking at successive letter pairs would go a long way. Out of the
|(26+space)**2 conbinations, perhaps half occur in real words (ie, 'qx'
|is a giveaway). Using triples would allow inclusion of common
|three-letter acronyms as legal.

For sale today on QXL.com....
--
Marc Wilson

Cleopatra Consultants Limited - IT Consultants
2 The Grange, Cricklade Street, Old Town, Swindon SN1 3HG
Tel: (44/0) 70-500-15051 Fax: (44/0) 870 164-0054
Mail: info@cleopatra.co.uk Web: http://www.cleopatra.co.uk
__________________________________________________ _______________
Try MailTraq at https://my.mailtraq.com/register.asp?code=cleopatra


All times are GMT. The time now is 04:52 PM.

Powered by vBulletin®. Copyright ©2000 - 2014, vBulletin Solutions, Inc.
SEO by vBSEO ©2010, Crawlability, Inc.