Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Perl > Perl Misc > Re: String parsing (2 questions)

Thread Tools

Re: String parsing (2 questions)

Rainer Weikusat
Posts: n/a
Ben Morrow <(E-Mail Removed)> writes:


> Parsing names is extremely difficult, because they are extremely
> variable. If at all possible you want to design your systems so that you
> don't need to, and instead ask your users questions like 'what is your
> full name' and 'how would you like us to address you'.
> In this case, how would you distinguish between these two?
> Van Horn Tim
> Watson Mary Jane
> Can you make a complete list of 'von's which might start a two-part
> surname, or do you have to handle cases like married women who have
> taken both surnames without a hyphen (and, potentially, their
> children)?

The solution to this is to define a grammar for 'supported name
formats' which catches the expected cases and live with the fact that
any heuristic fails in some situations. This means that 'Watson Mary
Jane' may have to decide if he is 'Watson M Jane' or if she is 'Mary J
Watson' or any other permutation of the given set of letter and

Reply With Quote

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off

Similar Threads
Thread Thread Starter Forum Replies Last Post
What libraries should I use for MIME parsing, XML parsing, and MySQL ? John Levine Ruby 0 02-02-2012 11:15 PM
[ANN] Parsing Tutorial and YARD 1.0: A C++ Parsing Framework Christopher Diggins C++ 0 07-09-2007 09:01 PM
[ANN] Parsing Tutorial and YARD 1.0: A C++ Parsing Framework Christopher Diggins C++ 0 07-09-2007 08:58 PM
SAX Parsing - Weird results when parsing content between tags. Naren XML 0 05-11-2004 07:25 PM
Perl expression for parsing CSV (ignoring parsing commas when in double quotes) GIMME Perl 2 02-11-2004 05:40 PM