Velocity Reviews

Velocity Reviews (http://www.velocityreviews.com/forums/index.php)
-   Firefox (http://www.velocityreviews.com/forums/f20-firefox.html)
-   -   Examining word scores in Thunderbird spam filter (http://www.velocityreviews.com/forums/t10893-examining-word-scores-in-thunderbird-spam-filter.html)

Jeff Evans 03-31-2005 07:29 PM

Examining word scores in Thunderbird spam filter
 
I know that Thunderbird uses Bayesian filtering for its spam filter. Is
there any way to examine what probabilities TB has assigned to each word
in a form that is understandable by humans? Or is the model described at:

http://en.wikipedia.org/wiki/Bayesian_filtering

oversimplified? Also is it possible to view/modify the threshold?

Moz Champion 03-31-2005 11:59 PM

Re: Examining word scores in Thunderbird spam filter
 
Jeff Evans wrote:
> I know that Thunderbird uses Bayesian filtering for its spam filter. Is
> there any way to examine what probabilities TB has assigned to each word
> in a form that is understandable by humans? Or is the model described at:
>
> http://en.wikipedia.org/wiki/Bayesian_filtering
>
> oversimplified? Also is it possible to view/modify the threshold?


You teach TB's JMC by telling it which message is spam, and telling it
which message it marked as spam is not.
That adjusts the threshold, modifying it on an ongoiing basis.

Note: if you unmark a message JMC thought was spam, ALL the properties
used to determine it was spam are depreciatted


The 'threshold' is developed by each individual user, according to the
type and contents of the actual spam they recieve versus the non spam
they receive, so it varies accordingly. The actual threshold is
established when first activating JMC, and modified constantly everytime
you mark another message as spam (or unmark one).

Jeff Evans 04-01-2005 02:36 AM

Re: Examining word scores in Thunderbird spam filter
 
Moz Champion wrote:
> Jeff Evans wrote:
>
>> I know that Thunderbird uses Bayesian filtering for its spam filter.
>> Is there any way to examine what probabilities TB has assigned to each
>> word in a form that is understandable by humans? Or is the model
>> described at:
>>
>> http://en.wikipedia.org/wiki/Bayesian_filtering
>>
>> oversimplified? Also is it possible to view/modify the threshold?

>
>
> You teach TB's JMC by telling it which message is spam, and telling it
> which message it marked as spam is not.
> That adjusts the threshold, modifying it on an ongoiing basis.
>
> Note: if you unmark a message JMC thought was spam, ALL the properties
> used to determine it was spam are depreciatted
>
>
> The 'threshold' is developed by each individual user, according to the
> type and contents of the actual spam they recieve versus the non spam
> they receive, so it varies accordingly. The actual threshold is
> established when first activating JMC, and modified constantly everytime
> you mark another message as spam (or unmark one).


Thanks for your response, but let me clarify a bit more. I understand
that as e-mail arrives, the filter calculates the Prob{Spam given Words}
according to the Words in the message, then based on that probability,
may mark it. What I'm interested in seeing is the Prob{Words given
Spam}, which I'm presuming is part of the "training" and is updated with
each successful or unsuccessful attempt. Basically, for curiousity's
sake, I'd just like to see how I have trained my filter by looking at
these values, if possible.

Moz Champion 04-01-2005 08:42 AM

Re: Examining word scores in Thunderbird spam filter
 
Jeff Evans wrote:
> Moz Champion wrote:
>
>> Jeff Evans wrote:
>>
>>> I know that Thunderbird uses Bayesian filtering for its spam filter.
>>> Is there any way to examine what probabilities TB has assigned to
>>> each word in a form that is understandable by humans? Or is the
>>> model described at:
>>>
>>> http://en.wikipedia.org/wiki/Bayesian_filtering
>>>
>>> oversimplified? Also is it possible to view/modify the threshold?

>>
>>
>>
>> You teach TB's JMC by telling it which message is spam, and telling it
>> which message it marked as spam is not.
>> That adjusts the threshold, modifying it on an ongoiing basis.
>>
>> Note: if you unmark a message JMC thought was spam, ALL the properties
>> used to determine it was spam are depreciatted
>>
>>
>> The 'threshold' is developed by each individual user, according to the
>> type and contents of the actual spam they recieve versus the non spam
>> they receive, so it varies accordingly. The actual threshold is
>> established when first activating JMC, and modified constantly
>> everytime you mark another message as spam (or unmark one).

>
>
> Thanks for your response, but let me clarify a bit more. I understand
> that as e-mail arrives, the filter calculates the Prob{Spam given Words}
> according to the Words in the message, then based on that probability,
> may mark it. What I'm interested in seeing is the Prob{Words given
> Spam}, which I'm presuming is part of the "training" and is updated with
> each successful or unsuccessful attempt. Basically, for curiousity's
> sake, I'd just like to see how I have trained my filter by looking at
> these values, if possible.


You can see the various attributes added by lookiing at
training.dat in a text editor (its located in your profile folder)
each 'update' or modification may include several words or word groups
that increment or decrement the ratio, according to your usage.

Whether or not that will aid you in any meaningful manner tho, that I
cant say. Its only when the full training dat file is taken into
consideration that JMC determines whethere or not its spam

For example, here's a copy of the first few entries in my training.dat file

pokc510016


All times are GMT. The time now is 10:15 AM.

Powered by vBulletin®. Copyright ©2000 - 2014, vBulletin Solutions, Inc.
SEO by vBSEO ©2010, Crawlability, Inc.