Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Ruby > [ANN] Bishop 0.3.0 - bayesian classifier for Ruby ported from Python

Reply
Thread Tools

[ANN] Bishop 0.3.0 - bayesian classifier for Ruby ported from Python

 
 
Matt Mower
Guest
Posts: n/a
 
      04-15-2005
Hi folks,

I've recently released a Ruby port "Bishop" of the "Reverend" bayesian
classifier written in Python. Bishop-0.3.0 is available as a Gem and
from RubyForge

http://rubyforge.org/projects/bishop/

Bishop is a reasonably direct port of the original Python code, bug
reports and suggestions for improving the structure of the code would
be welcomed.

Bishop includes both Robinson and Robinson-Fisher algorithms for
classification. It is presumed that they were correctly implemented
in Reverend. I aim to test this in my own use of the code.

Support is included for saving/loading the trained classifier to/from YAML.

An example of using Bishop:

require 'bishop'

b = Bishop::Bayes.new
b.train( "ham", "a great message from a close friend" )
b.train( "spam", "buy viagra here" )
puts b.guess( "would a friend send you a viagra advert?" )

=> [ [ "ham", <prob> ], [ "spam", <prob> ] ]

Bishop defaults to using the Robinson algorithm. To use a different
algorithm construct the classifier passing a block which will call the
choosen algorithm:

Bishop::Bayes.new { |probs,ignore| Bishop::robinson_fisher( probs, ignore ) }

To save to a YAML file:

b.save "myclassifier.yaml"

To load from a YAML file:

b.load "myclassifier.yaml"

You can uniquely identify training items

b.train( "ham", "friends don't let friends develop on shared
hosting", "<(E-Mail Removed)>" )

An can untrain items:

b.untrain( <pool>, <item>[, <uid> ] )

I'm using this in a project of my own and would welcome any feedback
or suggested improvements.

Regards,

Matt

--
Matt Mower :: http://matt.blogs.it/



 
Reply With Quote
 
 
 
 
Douglas Livingstone
Guest
Posts: n/a
 
      04-15-2005
On 4/15/05, Matt Mower <(E-Mail Removed)> wrote:
> Hi folks,
>
> I've recently released a Ruby port "Bishop" of the "Reverend" bayesian
> classifier written in Python. Bishop-0.3.0 is available


Could this be combined with http://rubyforge.org/projects/classifier/ ?

It looks like they both have a similar syntax:

classifier.train :symbol, "content"

Would the method_missing syntax be easy to add to Bishop? Would
untrain be easy to add to projects/classifier? From what I've looked
at them so far, sounds like the answer to both would be yes. If they
had the same API, they could go in the same module so that swapping
filter types would be as simple as changing the Classifier::XXX.new
line.

Cheers,
Douglas



 
Reply With Quote
 
 
 
 
gabriele renzi
Guest
Posts: n/a
 
      04-15-2005
Douglas Livingstone ha scritto:
> On 4/15/05, Matt Mower <(E-Mail Removed)> wrote:
>
>>Hi folks,
>>
>>I've recently released a Ruby port "Bishop" of the "Reverend" bayesian
>>classifier written in Python. Bishop-0.3.0 is available

>
>
> Could this be combined with http://rubyforge.org/projects/classifier/ ?


+1 on this question/suggestion.
There may be reasons to have two different libraries, but IMVHO it
would be better to have one slightly bigger library sharing APIs,
services and keeping the useful differences.
 
Reply With Quote
 
Jaypee
Guest
Posts: n/a
 
      04-18-2005
Matt Mower a écrit :
> Hi folks,
>
> I've recently released a Ruby port "Bishop" of the "Reverend" bayesian
> classifier written in Python. Bishop-0.3.0 is available as a Gem and
> from RubyForge

....
>
> Regards,
>
> Matt
>

Hello Matt,

Thank you for this useful librbary.
I am trying to use it to analyse the project of text for the european
constitution (Is it social? liberal? respectful of human rights?) I am
doing this for myself, just out of curiosity, there is no responsibility
or any liability involved in the usage of the classifier or in the result.
I'd like to know what the behaviour of the training of a classifier is
when two different set of words are submitted in two successive "train"
method invocations for a given category. Does the second invocation
resets the training or does it accumulate the "experience" progressively.

Thanks again ...
Jean-Pierre
 
Reply With Quote
 
Matt Mower
Guest
Posts: n/a
 
      04-19-2005
On 4/16/05, gabriele renzi <(E-Mail Removed)> wrote:
> Douglas Livingstone ha scritto:
> > On 4/15/05, Matt Mower <(E-Mail Removed)> wrote:
> >
> >>Hi folks,
> >>
> >>I've recently released a Ruby port "Bishop" of the "Reverend" bayesian
> >>classifier written in Python. Bishop-0.3.0 is available

> >
> >
> > Could this be combined with http://rubyforge.org/projects/classifier/ ?

>
> +1 on this question/suggestion.
> There may be reasons to have two different libraries, but IMVHO it
> would be better to have one slightly bigger library sharing APIs,
> services and keeping the useful differences.
>


I thought it was about time I responded to this.

If I had known Lucas was working on his classifier library before I
did the port of Reverend I probably wouldn't have bothered. However I
have done it and am using it in another project of my own and have had
some ideas about possible future developments.

One example is to build a version which runs directly from a SQL
database (possibly using ActiveRecord). I'm also interested in new
algorithms and possible improvements for support classifying RSS items
within a tag space.

None of which precludes rolling Bishop and Classifier into one project.

However right now I'd like to keep control of Bishop and not be
constrained from making possibly incompatible changes to the API or
implementation. Similarly Lucas may have his own plans for how he
wants to see Classifier develop.

I don't see the harm in having two projects and what I've suggested to
Lucas is that we should compare notes periodically and see if it makes
sense to merge the projects. I guess also if a lot of users of the
libraries made a fuss this would affect my opinon.

Regards,

Matt

---
Matt Mower :: http://matt.blogs.it/



 
Reply With Quote
 
Matt Mower
Guest
Posts: n/a
 
      04-19-2005
Hi Jean-Pierre,

On 4/18/05, Jaypee <(E-Mail Removed)> wrote:
> Thank you for this useful librbary.


You're welcome.

> I am trying to use it to analyse the project of text for the european
> constitution (Is it social? liberal? respectful of human rights?)
> [..snip..]
> I'd like to know what the behaviour of the training of a classifier is
> when two different set of words are submitted in two successive "train"
> method invocations for a given category. Does the second invocation
> resets the training or does it accumulate the "experience" progressively.
>


You're right when you say it accumulates. Further training supplies
more evidence to the classifier about which words are associated with
which categories . It uses this evidence to work out conditional
probabilities which are then combined to make a guess about the
approriate category for an item.

There is an #untrain method if you want to remove previously trained
information.

Regards,

Matt

--
Matt Mower :: http://matt.blogs.it/



 
Reply With Quote
 
gabriele renzi
Guest
Posts: n/a
 
      04-19-2005
Matt Mower ha scritto:

<snip all>
thanks for taking time to answer, I can understand your reasons and I'm
glad to know there is at least a touch beetween different hackers on
similar projects, thanks both
 
Reply With Quote
 
Jaypee
Guest
Posts: n/a
 
      04-19-2005
Matt Mower a écrit :
> Hi Jean-Pierre,
>
> On 4/18/05, Jaypee <(E-Mail Removed)> wrote:
>
>>Thank you for this useful librbary.

>
>
> You're welcome.
>
>
>>I am trying to use it to analyse the project of text for the european
>>constitution (Is it social? liberal? respectful of human rights?)
>>[..snip..]
>>I'd like to know what the behaviour of the training of a classifier is
>>when two different set of words are submitted in two successive "train"
>>method invocations for a given category. Does the second invocation
>>resets the training or does it accumulate the "experience" progressively.
>>

>
>
> You're right when you say it accumulates. Further training supplies
> more evidence to the classifier about which words are associated with
> which categories . It uses this evidence to work out conditional
> probabilities which are then combined to make a guess about the
> approriate category for an item.
>
> There is an #untrain method if you want to remove previously trained
> information.
>
> Regards,
>
> Matt
>

Thank you,
Jean-Pierre
 
Reply With Quote
 
Lucas Carlson
Guest
Posts: n/a
 
      04-21-2005
The subversion trunk of projects/classifier (see
http://rufy.com/svn/classifier/trunk) has the untrain method in it.
This will be released soon under Classifier 1.2.

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
ruby classifier Ryo Fojiba Ruby 5 04-10-2007 07:43 PM
to david bishop dilou VHDL 4 04-06-2006 01:34 AM
classifier lsi and ruby gsl Tom Reilly Ruby 2 05-05-2005 12:03 AM
[ANN| Classifier 1.2 with Bayesian and NEW LSI classification Lucas Carlson Ruby 4 04-25-2005 06:54 PM
Lost 2x tele lens. Bishop Ca. area 12/13. Reward Jesse Digital Photography 0 12-15-2003 07:46 AM



Advertisments