Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > Re: String similarity

Reply
Thread Tools

Re: String similarity

 
 
Tim Churches
Guest
Posts: n/a
 
      10-10-2003
Luca Montecchiani <(E-Mail Removed)> wrote:
>
> Introduction
> ------------
> The need to find files that "resembled" in the name has pushed me to
> write
> an utility that unlike the other it was not based on the content of
> the files
> but on its name. Initially I start adding this functionality to
> one "C" program for Unix called "fdupes" witch give me good
> performance
> and good precision.
> The algorithm that I have chosen for the comparison between string
> was
> "Levenshtein Distance".


This starts to look like a probabilistic record linkage (matching) problem. See the
Febrl project at http://datamining.anu.edu.au/projects/linkage.html - amongst
other things it contains a library of string comparators written in Python.

Tim C


 
Reply With Quote
 
 
 
 
Luca Montecchiani
Guest
Posts: n/a
 
      10-10-2003
Tim Churches wrote:

> This starts to look like a probabilistic record linkage (matching) problem. See the
> Febrl project at http://datamining.anu.edu.au/projects/linkage.html - amongst
> other things it contains a library of string comparators written in Python.


Thanks for the link, the stringcmp.py contains some cool code that I'll try later.
Unfortunally I can't gain speed but another bunch of algo to improve results quality

ciao,
luca

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
String matching/comparing, statistical similarity Chris Chris Ruby 3 07-09-2008 11:21 AM
What are the similarity and difference b/w EBJ and COM+? =?iso-8859-1?B?bW9vcJk=?= Java 1 05-30-2006 12:12 PM
Document-Document similarity Fabian Leitritz Java 0 01-14-2005 03:18 PM
string similarity in python Achim Domma Python 5 11-24-2003 08:46 PM
String similarity Luca Montecchiani Python 0 10-10-2003 12:44 AM



Advertisments