Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Java > n-gram based & edit distance based comparisons

Reply
Thread Tools

n-gram based & edit distance based comparisons

 
 
Ezee
Guest
Posts: n/a
 
      07-26-2005
Hi

Can anybody please suggest me any useful idea about how to perform
n-gram based comparisons and edit-distance based comparisons between
words.(Words are strings and they are in Vectors).

Thanx in anticipation

 
Reply With Quote
 
 
 
 
Harald
Guest
Posts: n/a
 
      07-26-2005
"Ezee" <(E-Mail Removed)> writes:

> Can anybody please suggest me any useful idea about how to perform
> n-gram based comparisons and edit-distance based comparisons between
> words.(Words are strings and they are in Vectors).


n-gram: use java.util.Set to build the intersection of n-grams.

Your question is not exactly specific. Are you looking for code
examples, pointers to algorithm descriptions, class libraries, ...?

Harald.

--
---------------------+---------------------------------------------
Harald Kirsch (@home)|
Java Text Crunching: http://www.ebi.ac.uk/Rebholz-srv/whatizit/software
 
Reply With Quote
 
 
 
 
Ezee
Guest
Posts: n/a
 
      07-27-2005


Harald wrote:

> Your question is not exactly specific. Are you looking for code
> examples, pointers to algorithm descriptions, class libraries, ...?


I am looking for code examples. Actually I have to perform comparison
b/w two words e.g peace & piece on base of n-grams. i-e if we consider
all 3-grams of these two words (peace = pea pec pee eac eae ace) and
(piece = pie pic pie iec ice). So that, even if these two words are not
exactly similar, but if compared on basis of n-grams, then they are
similar to some extent, and this degree of similarity is to be
calcultaed. (may be in %age, like piece is 60% similar to peace). I am
not sure if I am right in calling this n-gram based comparison...

Ezee
--
You can't run away forever,
But there's nothing wrong with getting a good head start.

 
Reply With Quote
 
Ingo R. Homann
Guest
Posts: n/a
 
      07-27-2005
Hi,

Ezee wrote:
> I am looking for code examples. Actually I have to perform comparison
> b/w two words e.g peace & piece on base of n-grams. i-e if we consider
> all 3-grams of these two words (peace = pea pec pee eac eae ace) and
> (piece = pie pic pie iec ice). ... I am
> not sure if I am right in calling this n-gram based comparison...


AFAIK, this is only called NGram, if you take neighboured letters (for N=3):

piece=pie iec ece
peace=pea esc ace

(In this special case, piece and peace are not similar at all.)

As for your question: Sorry, I don't have source for that, and I think,
there is no official standard package for that, but it should be easy to
implement or to google for "java" and "ngram".

Ciao,
Ingo

 
Reply With Quote
 
HK
Guest
Posts: n/a
 
      07-27-2005


Ezee wrote:
> Harald wrote:
>
> > Your question is not exactly specific. Are you looking for code
> > examples, pointers to algorithm descriptions, class libraries, ...?

>
> I am looking for code examples. Actually I have to perform comparison
> b/w two words e.g peace & piece on base of n-grams. i-e if we consider
> all 3-grams of these two words (peace = pea pec pee eac eae ace) and
> (piece = pie pic pie iec ice). So that, even if these two words are not
> exactly similar, but if compared on basis of n-grams, then they are
> similar to some extent, and this degree of similarity is to be
> calcultaed. (may be in %age, like piece is 60% similar to peace). I am
> not sure if I am right in calling this n-gram based comparison...


Well, then at least for the n-gram approach my previous
comment is just right. Foreach word, create its n-grams
and put them in a Set. Then use set intersection to find
the common ones. Count them to get a ranking. If you
want, you give weights to different n-grams depending
on how often you find them in the unique words of
a corpus. More frequent means they should be less
decisive.

See, for example http://www.cs.ualberta.ca/~lindek/papers/sim.pdf

Harald.

 
Reply With Quote
 
Ezee
Guest
Posts: n/a
 
      07-27-2005



Thanks for the help. I am gonna try it & if I need some help, perhaps I
will bother you again .
Ezee

 
Reply With Quote
 
timjowers@gmail.com
Guest
Posts: n/a
 
      07-27-2005

Ezee,

Search on sourceforge for this. If not found, add to appropriate
project or create a new project.

Tim

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Speeding Up Branches Based on Comparisons Between Floats rembremading C Programming 5 02-19-2009 06:01 PM
edit distance algorithms Rogan Dawes Java 7 06-11-2005 03:38 AM
Snapshot restraint - edit, edit, edit Alan Browne Digital Photography 24 05-10-2005 10:15 PM
Snapshot restraint - edit, edit, edit Patrick Digital Photography 0 05-06-2005 10:53 PM
copying value of DDL in a Datagrid "pre-edit command" to value in "post edit command" San Diego Guy ASP .Net 0 08-07-2003 08:59 PM



Advertisments