Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Ruby > Levenshtein_distance and recreate the string

Reply
Thread Tools

Levenshtein_distance and recreate the string

 
 
Rick DeNatale
Guest
Posts: n/a
 
      04-30-2008
On Wed, Apr 30, 2008 at 8:25 AM, Vidar Hokstad <(E-Mail Removed)> wrote:

> Several people have given great answers as to why this isn't possible,
> but let me give you another reason why, just
> because I feel like it, and it's a useful thing to keep in mind.
>
> Whenever you get a problem like this, a good way of approaching it if
> you don't understand the algorithm is to think:
>
> - If it _was_ possible, would that allow you to make impossibly
> efficient compression algorithms?
>
> In this case, the answer is that yes it would: You could pick a fixed
> S2, for example an empty string, calculate the
> Levensthein distance from S1 to the fixed S2. The distance would be
> proportional to the length of S1. In fact, if
> you choose S2 to be the empty string, then the distance would be equal
> to the length of S1 (it'd take one deletion
> per position in S1 to end up with an empty string)
>
> In other words, you'd "compress" your test string "BRUY" down to the
> number 4, but more importantly that "compression"
> method would compress any string of any length, which is obviously
> impossible (the reason this is impossible is that
> if it was possible you could apply it to it's own output over and over
> until you had compressed an arbitrary input
> down to a bit).
>
> In this case it's not a very hard problem to solve, but I find that
> for a large number of questions about data
> transformations, it's helpful to think about it in terms of
> compression because the answers often become blatantly
> obvious once you restate them that way.


Another thing to consider is why it's called a distance rather than,
say, a difference.

Consider this analogy.

Given that the highway distance between Raleigh, NC, and Washington,
DC is 250 miles. Given Washington, DC, and 250 miles, can I uniquely
identify Raleigh, NC.

No since the highway distance between Washington, DC and New York, NY
(as well as many other places) is also 250 miles.

--
Rick DeNatale

My blog on Ruby
http://talklikeaduck.denhaven2.com/

 
Reply With Quote
 
 
 
 
Ken Bloom
Guest
Posts: n/a
 
      05-01-2008
On Tue, 29 Apr 2008 20:56:30 -0500, Phillip Gawlowski wrote:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> David A. Black wrote:
> |
> | But isn't it the same for RUBY -> RUBE and RUBA -> RUBE ? In which |
> case, if you had RUBE and the L. distance, you could not recreate |
> RUBY; you'd have a whole set of strings that were exactly that |
> distance from RUBE.
>
> Good point. I don't know if the L. distance is unique for each
> possibility or not.


It's not unique. Any single deletion has edit distance 1, any single
insertion has edit distance 1.

Thus, the strings that have edit distance 1 from "RUBY" include (but are
not limited to):

#delete a letter
RUB
RUY
RBY
UBY
#add the letter A
ARUBY
RAUBY
RUABY
RUBAY
RUBYA
#add the letter B
BRUBY
RBUBY
RUBBY
RUBYB
....

That's not to say there aren't reasons for recreating all of these. For
Mengmeng Du[1], recreating all of these and checking for their existance
in a Bloom filter was orders of magnitude faster than computing the edit
distance between a query and every entry in a 200,000 entry list of last
names. (Just a paper I happen to be reading this week.)

--Ken

[1] Du, Mengmeng (2005) "Approximate Name Matching" Master's Degree
Project, Royal Institute of Technology, Stockholm, Sweden

--
Ken (Chanoch) Bloom. PhD candidate. Linguistic Cognition Laboratory.
Department of Computer Science. Illinois Institute of Technology.
http://www.iit.edu/~kbloom1/
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Damerau-Levenshtein_distance Jeremy Ruby 2 05-18-2007 08:40 AM
How recreate "registry.dat" file? Martin Leese Firefox 7 08-01-2005 12:41 PM
Recreate Outlook Web Access 2003 Look and Feel kevinsaucier ASP .Net 0 05-12-2005 03:29 AM
Text files read multiple files into single file, and then recreate the multiple files googlinggoogler@hotmail.com Python 4 02-13-2005 05:44 PM
Recreate msdn drill down look and feel Mark ASP .Net 2 02-23-2004 09:37 PM



Advertisments