Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Perl > Perl Misc > Newbie asking, interesting question

Reply
Thread Tools

Newbie asking, interesting question

 
 
Wondering
Guest
Posts: n/a
 
      02-04-2005
I'm struggling to learn Perl, with some degree of success. I have a
question that's a bit more advanced than I am, but I hope someone can
help (thanks in advance to all who read this and biger thanks to
responders).

I'm trying to match name and address records in a large (~300,000
record) database with potential new records to avoid duplicates. Anyone
who has tried this knows that there are problems with exact matching,
especially if no convention has been followed for entering data.
(Consider all the possible variations of "avenue" - "avenue", "av",
"ave", etc., and when you consider drive, boulevard, etc. and all their
possible abbreviations, you begin to get the picture). So, I want to be
able to extract just the numeric characters in a strings so I can do
the matching on those (it's fuzzy, but with other feilds being
considered, too, we can get a fairly high matching rate). Anyone know
how to extract just the numeric charaters?
I'll also accept any other ideas for doing the match.

 
Reply With Quote
 
 
 
 
Wondering
Guest
Posts: n/a
 
      02-04-2005
Right on. I know tr from *nix, just didn't occur to me to use it for
this. Big thanks!

 
Reply With Quote
 
 
 
 
Tad McClellan
Guest
Posts: n/a
 
      02-04-2005
Wondering <(E-Mail Removed)> wrote:

> Subject: Newbie asking, interesting question



Please put the subject of your article in the Subject of your article.

Your article was not about a newbie asking interesting questions.


--
Tad McClellan SGML consulting
http://www.velocityreviews.com/forums/(E-Mail Removed) Perl programming
Fort Worth, Texas
 
Reply With Quote
 
Anno Siegel
Guest
Posts: n/a
 
      02-06-2005
Wondering <(E-Mail Removed)> wrote in comp.lang.perl.misc:
> I'm struggling to learn Perl, with some degree of success. I have a
> question that's a bit more advanced than I am, but I hope someone can
> help (thanks in advance to all who read this and biger thanks to
> responders).
>
> I'm trying to match name and address records in a large (~300,000
> record) database with potential new records to avoid duplicates. Anyone
> who has tried this knows that there are problems with exact matching,
> especially if no convention has been followed for entering data.
> (Consider all the possible variations of "avenue" - "avenue", "av",
> "ave", etc., and when you consider drive, boulevard, etc. and all their
> possible abbreviations, you begin to get the picture). So, I want to be
> able to extract just the numeric characters in a strings so I can do
> the matching on those (it's fuzzy, but with other feilds being
> considered, too, we can get a fairly high matching rate). Anyone know
> how to extract just the numeric charaters?


tr/0..9//cd;

That will delete everything except digits.

> I'll also accept any other ideas for doing the match.


There's the Soundex method with a corresponding standard module
Text::Soundex. It tries to map words so that similar-sounding ones
map to the same thing. It may also map different-sounding words to
the same thing, but you're not overly concerned about false positives.
Your fields may need some pre-processing (as breaking into words in
a useful way).

Anno
 
Reply With Quote
 
Tad McClellan
Guest
Posts: n/a
 
      02-06-2005
Anno Siegel <(E-Mail Removed)-berlin.de> wrote:

> tr/0..9//cd;
>
> That will delete everything except digits.



Make that

tr/0-9//cd;

please.

--
Tad McClellan SGML consulting
(E-Mail Removed) Perl programming
Fort Worth, Texas
 
Reply With Quote
 
Anno Siegel
Guest
Posts: n/a
 
      02-06-2005
Tad McClellan <(E-Mail Removed)> wrote in comp.lang.perl.misc:
> Anno Siegel <(E-Mail Removed)-berlin.de> wrote:
>
> > tr/0..9//cd;
> >
> > That will delete everything except digits.

>
>
> Make that
>
> tr/0-9//cd;
>
> please.


Yes. Oh boy. Looks like I violated the copy/paste rule.

Anno
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Interesting result of a newbie mistake VICTOR GOLDBERG Ruby 2 05-07-2008 11:36 AM
Interesting problem with NAT and VPN (not the usual question) Jim Westwood Cisco 6 10-15-2005 05:07 PM
Interesting VoIP Bandwidth Question Kengie Cisco 2 12-10-2003 08:25 AM
dumb newbie question (or newbie dumb question) Jerry C. Perl Misc 8 11-23-2003 04:11 AM
Interesting Database Query Question!! Jay ASP .Net 1 10-02-2003 10:58 PM



Advertisments