Examining word scores in Thunderbird spam filter

Discussion in 'Firefox' started by Jeff Evans, Mar 31, 2005.

  1. Jeff Evans

    Jeff Evans Guest

    I know that Thunderbird uses Bayesian filtering for its spam filter. Is
    there any way to examine what probabilities TB has assigned to each word
    in a form that is understandable by humans? Or is the model described at:

    http://en.wikipedia.org/wiki/Bayesian_filtering

    oversimplified? Also is it possible to view/modify the threshold?
     
    Jeff Evans, Mar 31, 2005
    #1
    1. Advertising

  2. Jeff Evans

    Moz Champion Guest

    Jeff Evans wrote:
    > I know that Thunderbird uses Bayesian filtering for its spam filter. Is
    > there any way to examine what probabilities TB has assigned to each word
    > in a form that is understandable by humans? Or is the model described at:
    >
    > http://en.wikipedia.org/wiki/Bayesian_filtering
    >
    > oversimplified? Also is it possible to view/modify the threshold?


    You teach TB's JMC by telling it which message is spam, and telling it
    which message it marked as spam is not.
    That adjusts the threshold, modifying it on an ongoiing basis.

    Note: if you unmark a message JMC thought was spam, ALL the properties
    used to determine it was spam are depreciatted


    The 'threshold' is developed by each individual user, according to the
    type and contents of the actual spam they recieve versus the non spam
    they receive, so it varies accordingly. The actual threshold is
    established when first activating JMC, and modified constantly everytime
    you mark another message as spam (or unmark one).
     
    Moz Champion, Apr 1, 2005
    #2
    1. Advertising

  3. Jeff Evans

    Jeff Evans Guest

    Moz Champion wrote:
    > Jeff Evans wrote:
    >
    >> I know that Thunderbird uses Bayesian filtering for its spam filter.
    >> Is there any way to examine what probabilities TB has assigned to each
    >> word in a form that is understandable by humans? Or is the model
    >> described at:
    >>
    >> http://en.wikipedia.org/wiki/Bayesian_filtering
    >>
    >> oversimplified? Also is it possible to view/modify the threshold?

    >
    >
    > You teach TB's JMC by telling it which message is spam, and telling it
    > which message it marked as spam is not.
    > That adjusts the threshold, modifying it on an ongoiing basis.
    >
    > Note: if you unmark a message JMC thought was spam, ALL the properties
    > used to determine it was spam are depreciatted
    >
    >
    > The 'threshold' is developed by each individual user, according to the
    > type and contents of the actual spam they recieve versus the non spam
    > they receive, so it varies accordingly. The actual threshold is
    > established when first activating JMC, and modified constantly everytime
    > you mark another message as spam (or unmark one).


    Thanks for your response, but let me clarify a bit more. I understand
    that as e-mail arrives, the filter calculates the Prob{Spam given Words}
    according to the Words in the message, then based on that probability,
    may mark it. What I'm interested in seeing is the Prob{Words given
    Spam}, which I'm presuming is part of the "training" and is updated with
    each successful or unsuccessful attempt. Basically, for curiousity's
    sake, I'd just like to see how I have trained my filter by looking at
    these values, if possible.
     
    Jeff Evans, Apr 1, 2005
    #3
  4. Jeff Evans

    Moz Champion Guest

    Jeff Evans wrote:
    > Moz Champion wrote:
    >
    >> Jeff Evans wrote:
    >>
    >>> I know that Thunderbird uses Bayesian filtering for its spam filter.
    >>> Is there any way to examine what probabilities TB has assigned to
    >>> each word in a form that is understandable by humans? Or is the
    >>> model described at:
    >>>
    >>> http://en.wikipedia.org/wiki/Bayesian_filtering
    >>>
    >>> oversimplified? Also is it possible to view/modify the threshold?

    >>
    >>
    >>
    >> You teach TB's JMC by telling it which message is spam, and telling it
    >> which message it marked as spam is not.
    >> That adjusts the threshold, modifying it on an ongoiing basis.
    >>
    >> Note: if you unmark a message JMC thought was spam, ALL the properties
    >> used to determine it was spam are depreciatted
    >>
    >>
    >> The 'threshold' is developed by each individual user, according to the
    >> type and contents of the actual spam they recieve versus the non spam
    >> they receive, so it varies accordingly. The actual threshold is
    >> established when first activating JMC, and modified constantly
    >> everytime you mark another message as spam (or unmark one).

    >
    >
    > Thanks for your response, but let me clarify a bit more. I understand
    > that as e-mail arrives, the filter calculates the Prob{Spam given Words}
    > according to the Words in the message, then based on that probability,
    > may mark it. What I'm interested in seeing is the Prob{Words given
    > Spam}, which I'm presuming is part of the "training" and is updated with
    > each successful or unsuccessful attempt. Basically, for curiousity's
    > sake, I'd just like to see how I have trained my filter by looking at
    > these values, if possible.


    You can see the various attributes added by lookiing at
    training.dat in a text editor (its located in your profile folder)
    each 'update' or modification may include several words or word groups
    that increment or decrement the ratio, according to your usage.

    Whether or not that will aid you in any meaningful manner tho, that I
    cant say. Its only when the full training dat file is taken into
    consideration that JMC determines whethere or not its spam

    For example, here's a copy of the first few entries in my training.dat file

    pokc510016
     
    Moz Champion, Apr 1, 2005
    #4
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. NoMoreMrNiceGuy

    Thunderbird - AVG antivirus - K9 spam filter

    NoMoreMrNiceGuy, Feb 27, 2005, in forum: Firefox
    Replies:
    3
    Views:
    9,557
    NoMoreMrNiceGuy
    Feb 27, 2005
  2. David Ellis

    Thunderbird Spam Filter

    David Ellis, Feb 18, 2006, in forum: Firefox
    Replies:
    10
    Views:
    1,253
    John Thompson
    Feb 25, 2006
  3. C A Preston

    Spam-Spam and more Spam

    C A Preston, Apr 12, 2004, in forum: Computer Support
    Replies:
    2
    Views:
    620
    Hywel
    Apr 12, 2004
  4. Mark D. Fain

    Utility for Examining File Allocation Tables (FAT)??

    Mark D. Fain, Jun 16, 2004, in forum: Computer Information
    Replies:
    2
    Views:
    372
    Mark D. Fain
    Jun 17, 2004
  5. Clwddncr
    Replies:
    6
    Views:
    712
    Dave - Dave.net.nz
    Feb 7, 2005
Loading...

Share This Page