Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Perl > Perl Misc > understanding regexp, Text::ParseWords

Reply
Thread Tools

understanding regexp, Text::ParseWords

 
 
ccc31807
Guest
Posts: n/a
 
      11-05-2010
This is copied from Text:arseWords. It appears in the function
parse_line(delimiter, boolean, string). I understand most of this, but
need some help understanding some if it. This appears in a loop:
while (length($line)) {
and parses a line with this call:
my ($f, $m, $l) = parse_line(/,/, 0, $line)
where line will be like this:
"Barack","Hussein","Obama"
I have numbered the lines for reference.

<quote>
# This pattern is optimised to be stack conservative on older perls.
# Do not refactor without being careful and testing it on very long
strings.
# See Perl bug #42980 for an example of a stack busting input.
1 $line =~ s/^
2 (?:
# double quoted string
3 (") # $quote
4 ((?>[^\\"]*(?:\\.[^\\"]*)*))" # $quoted
5 | # --OR--
# singe quoted string
6 (') # $quote
7 ((?>[^\\']*(?:\\.[^\\']*)*))' # $quoted
8 | # --OR--
# unquoted string
9 ( # $unquoted
10 (?:\\.|[^\\"'])*?
11 )
# followed by
12 ( # $delim
13 \Z(?!\n) # EOL
14 | # --OR--
15 (?-x:$delimiter) # delimiter
16 | # --OR--
17 (?!^)(?=["']) # a quote
18 )
)//xs or return; # extended layout
my ($quote, $quoted, $unquoted, $delim) = (($1 ? ($1,$2) : ($3,$4)),
$5, $6);
</quote>

Thanks, CC.
 
Reply With Quote
 
 
 
 
sln@netherlands.com
Guest
Posts: n/a
 
      11-05-2010
On Fri, 5 Nov 2010 07:41:10 -0700 (PDT), ccc31807 <(E-Mail Removed)> wrote:

>This is copied from Text:arseWords. It appears in the function
>parse_line(delimiter, boolean, string). I understand most of this, but
>need some help understanding some if it. This appears in a loop:
> while (length($line)) {
>and parses a line with this call:
> my ($f, $m, $l) = parse_line(/,/, 0, $line)
>where line will be like this:
> "Barack","Hussein","Obama"
>I have numbered the lines for reference.
>


What is it you want to understand about it?
Its basically 3 sections that peels off chunks of the line into some
apparent quoted/unquoted, delimited/undelimited order.

-sln

-------------------------
use strict;
#use warnings;

my @lines = (
q{ "Barack", "Hussein", "Obama" },
q{ "Bar'a'ck", "test", hello, "Hussein", 'Obama" },
q{ 'Bar'a'ck", "test", hello, "Hussein", 'Obama" },
);

my $delimiter = ',';
print "\n";

for my $line (@lines) {
print "** start line = [$line]\n\n";
while (length($line)) {

$line =~ s/^
(?:
# double quoted string
(") # $quote
((?>[^\\"]*(?:\\.[^\\"]*)*))" # $quoted
| # --OR--
# singe quoted string
(') # $quote
((?>[^\\']*(?:\\.[^\\']*)*))' # $quoted
| # --OR--
# unquoted string
( # $unquoted
(?:\\.|[^\\"'])*?
)
# followed by
( # $delim
\Z(?!\n) # EOL
| # --OR--
(?-x:$delimiter) # delimiter
| # --OR--
(?!^)(?=["']) # a quote
)
)//xs or last; # extended layout

my ($quote, $quoted, $unquoted, $delim) = (($1 ? ($1,$2) : ($3,$4)), $5, $6);
print "quote= <$quote> quoted= <$quoted> unquoted= <$unquoted> delim= <$delim>\n";
print " <$line>\n";
}
print "end line = [$line]\n",'-'x20,"\n\n";
}

__END__
Output:

** start line = [ "Barack", "Hussein", "Obama" ]

quote= <> quoted= <> unquoted= < > delim= <>
<"Barack", "Hussein", "Obama" >
quote= <"> quoted= <Barack> unquoted= <> delim= <>
<, "Hussein", "Obama" >
quote= <> quoted= <> unquoted= <> delim= <,>
< "Hussein", "Obama" >
quote= <> quoted= <> unquoted= < > delim= <>
<"Hussein", "Obama" >
quote= <"> quoted= <Hussein> unquoted= <> delim= <>
<, "Obama" >
quote= <> quoted= <> unquoted= <> delim= <,>
< "Obama" >
quote= <> quoted= <> unquoted= < > delim= <>
<"Obama" >
quote= <"> quoted= <Obama> unquoted= <> delim= <>
< >
quote= <> quoted= <> unquoted= < > delim= <>
<>
end line = []
--------------------

** start line = [ "Bar'a'ck", "test", hello, "Hussein", 'Obama" ]

quote= <> quoted= <> unquoted= < > delim= <>
<"Bar'a'ck", "test", hello, "Hussein", 'Obama" >
quote= <"> quoted= <Bar'a'ck> unquoted= <> delim= <>
<, "test", hello, "Hussein", 'Obama" >
quote= <> quoted= <> unquoted= <> delim= <,>
< "test", hello, "Hussein", 'Obama" >
quote= <> quoted= <> unquoted= < > delim= <>
<"test", hello, "Hussein", 'Obama" >
quote= <"> quoted= <test> unquoted= <> delim= <>
<, hello, "Hussein", 'Obama" >
quote= <> quoted= <> unquoted= <> delim= <,>
< hello, "Hussein", 'Obama" >
quote= <> quoted= <> unquoted= < hello> delim= <,>
< "Hussein", 'Obama" >
quote= <> quoted= <> unquoted= < > delim= <>
<"Hussein", 'Obama" >
quote= <"> quoted= <Hussein> unquoted= <> delim= <>
<, 'Obama" >
quote= <> quoted= <> unquoted= <> delim= <,>
< 'Obama" >
quote= <> quoted= <> unquoted= < > delim= <>
<'Obama" >
end line = ['Obama" ]
--------------------

** start line = [ 'Bar'a'ck", "test", hello, "Hussein", 'Obama" ]

quote= <> quoted= <> unquoted= < > delim= <>
<'Bar'a'ck", "test", hello, "Hussein", 'Obama" >
quote= <'> quoted= <Bar> unquoted= <> delim= <>
<a'ck", "test", hello, "Hussein", 'Obama" >
quote= <> quoted= <> unquoted= <a> delim= <>
<'ck", "test", hello, "Hussein", 'Obama" >
quote= <'> quoted= <ck", "test", hello, "Hussein", > unquoted= <> delim= <>
<Obama" >
quote= <> quoted= <> unquoted= <Obama> delim= <>
<" >
end line = [" ]
--------------------

 
Reply With Quote
 
 
 
 
ccc31807
Guest
Posts: n/a
 
      11-05-2010
On Nov 5, 1:57*pm, (E-Mail Removed) wrote:
> What is it you want to understand about it?


Line 2 -- the (?: construct
Lines 4, 7, 10 -- same thing
Line 13 -- \Z(?!\n)
Line 15 -- (?-x:$delimiter)
$delimiter would be the COMMA character
Line 17 -- (?!^)(?=["'])
the ["'] means either one quote or one double-quote

Thanks, CC.
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Understanding your BIOS TYCOON Hardware 6 06-29-2005 09:54 AM
Confirm my wireless understanding please? Evil Uncle Chris Wireless Networking 1 05-01-2005 03:19 PM
Understanding voice AIMs Ghazan Haider Cisco 1 11-28-2004 03:15 PM
Re: understanding an error Alvin Andries VHDL 0 09-12-2003 11:38 AM
Why does Microsoft have such a hard time understanding what they say? George Hester ASP .Net 3 08-11-2003 09:16 PM



Advertisments