Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Ruby > bayesian classifiers in ruby?

Reply
Thread Tools

bayesian classifiers in ruby?

 
 
Giles Bowkett
Guest
Posts: n/a
 
      11-01-2006
I'm researching existing Bayesian classifiers in Ruby -- it looks as
if there are two, one called Bishop, a Python port, and another called
Classifier.

Has anybody worked with them? Any upsides, downsides? Both theoretical
and practical perspectives. Partly to expand my brain and partly for
the sake of putting some real software together.

--
Giles Bowkett
http://www.gilesgoatboy.org

 
Reply With Quote
 
 
 
 
Jaypee
Guest
Posts: n/a
 
      11-01-2006
Giles Bowkett a écrit :
> I'm researching existing Bayesian classifiers in Ruby -- it looks as
> if there are two, one called Bishop, a Python port, and another called
> Classifier.
>
> Has anybody worked with them? Any upsides, downsides? Both theoretical
> and practical perspectives. Partly to expand my brain and partly for
> the sake of putting some real software together.
>

I have used Bishop to classify the 488 articles of the Project of
European Constitution and it was helpful.
Bishop is very simple and consists of just one file and a couple of
classes. There is room for improvement in the way it tokenizes code
source.
Classifier is more complex, with multiple files and more classes.
Morever it may use the Gnu Scientific Library to perform its calculation,
I have recently tried to use both of them to help a teacher classify
CS homeworks and analyze how many different solutions the students
had come up with. I used simple example like printing number from 0 to
10 in C. In one case, I used a "for" loop, and in another I used a
"while" loop. Then I tested a candidate program using a while loop with
different variable naems. Bishop's guess was that it was more like the
'for' loop. Not very conclusive.
Classifier did not better in that it failed to tokenize the C source
text. It was trying to stem a keyword but failed.

As long as you analyze natural language, both seem suited, although with
different degrees of complexity under the hood, both have a very simple
interface: define a category and train it. Then a guess interface to
evaluate candidates.

J-P
 
Reply With Quote
 
 
 
 
Tom Reilly
Guest
Posts: n/a
 
      11-01-2006
I wrote a bayesian classifier to classify nursing home calls. If you
want I can email you the source. It works with about a 95% accuracy.
Tom Reilly

Giles Bowkett wrote:

> I'm researching existing Bayesian classifiers in Ruby -- it looks as
> if there are two, one called Bishop, a Python port, and another called
> Classifier.
>
> Has anybody worked with them? Any upsides, downsides? Both theoretical
> and practical perspectives. Partly to expand my brain and partly for
> the sake of putting some real software together.
>



 
Reply With Quote
 
Giles Bowkett
Guest
Posts: n/a
 
      11-02-2006
Definitely! That would be very cool.

On 11/1/06, Tom Reilly <(E-Mail Removed)> wrote:
> I wrote a bayesian classifier to classify nursing home calls. If you
> want I can email you the source. It works with about a 95% accuracy.
> Tom Reilly
>
> Giles Bowkett wrote:
>
> > I'm researching existing Bayesian classifiers in Ruby -- it looks as
> > if there are two, one called Bishop, a Python port, and another called
> > Classifier.
> >
> > Has anybody worked with them? Any upsides, downsides? Both theoretical
> > and practical perspectives. Partly to expand my brain and partly for
> > the sake of putting some real software together.
> >

>
>
>



--
Giles Bowkett
http://www.gilesgoatboy.org

 
Reply With Quote
 
Giles Bowkett
Guest
Posts: n/a
 
      11-02-2006
> As long as you analyze natural language, both seem suited, although with
> different degrees of complexity under the hood, both have a very simple
> interface: define a category and train it. Then a guess interface to
> evaluate candidates.


I'm hoping to develop yet another spam filter. in that sense I can
only say I'm sort of analyzing natural language. Not all of it is
natural language, some of it is code. In the Paul Graham thing where
he came up with this idea, if I remember right, he said that a font
tag with the color red turned out to be the single most reliable
indicator of spam. Obviously in HTML e-mail there are going to be
similar trends. However if the tokenizer is the only problem that may
be something I can change without too much stress.

--
Giles Bowkett
http://www.gilesgoatboy.org

 
Reply With Quote
 
Booker C. Bense
Guest
Posts: n/a
 
      11-02-2006
-----BEGIN PGP SIGNED MESSAGE-----

In article <2d81dedb0611021036t5e47d35ex55c294e634873b59@mail .gmail.com>,
Giles Bowkett <(E-Mail Removed)> wrote:
>> As long as you analyze natural language, both seem suited, although with
>> different degrees of complexity under the hood, both have a very simple
>> interface: define a category and train it. Then a guess interface to
>> evaluate candidates.

>
>I'm hoping to develop yet another spam filter. in that sense I can
>only say I'm sort of analyzing natural language. Not all of it is
>natural language, some of it is code. In the Paul Graham thing where
>he came up with this idea, if I remember right, he said that a font
>tag with the color red turned out to be the single most reliable
>indicator of spam. Obviously in HTML e-mail there are going to be
>similar trends. However if the tokenizer is the only problem that may
>be something I can change without too much stress.
>


Long ago, I wrote an interface to the ifile program and I use
that in my spam/email filtering. ifile is abandomware at the
moment. I think I posted it on the ruby mailing list at ome
point, you might try searching for it.


_ Booker C. Bense


-----BEGIN PGP SIGNATURE-----
Version: 2.6.2

iQCVAwUBRUpDBWTWTAjn5N/lAQFx9QP+NqHWWcudTBnJK3u2qofqheu6p0hJ3W2I
L6elwknvioDWRuwWO/rksM2DZXwQ6trTHkpEnh0REEsWGl6n683ckuYBbr/ElVA2
9SfGWM0cXspEVX6Xsx/xFsnpF8mdF6le6SdxSEHr0HGhq+8NY1HFoLSOEKdEIBo6
p2sZwJ6+94Q=
=1IG0
-----END PGP SIGNATURE-----
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Bayesian RSS feed aggregator Mike HTML 0 11-17-2004 08:28 PM
Bayesian RSS feed aggregator Mike XML 0 11-17-2004 08:21 PM
Bayesian RSS feed aggregator mlavespere@hotmail.com XML 0 11-17-2004 08:18 PM
Re: Bayesian kids content filtering in Python? John J. Lee Python 2 08-30-2003 10:35 PM
Re: Bayesian kids content filtering in Python? Paul Paterson Python 0 08-29-2003 11:52 PM



Advertisments