Velocity Reviews

Velocity Reviews (http://www.velocityreviews.com/forums/index.php)
-   Perl Misc (http://www.velocityreviews.com/forums/f67-perl-misc.html)
-   -   reg expression with input line (http://www.velocityreviews.com/forums/t889812-reg-expression-with-input-line.html)

sam 12-23-2004 07:22 AM

reg expression with input line
 
Hi,

I would like to write a perl script to parse each line read from a text
file.
I ended up some perl code as shown below:

($prodcode,$custname,$qty,$cost,$date,$prodname) =
/^([0-9\-]+) +([A-Za-z0-9\-]+) +([0-9]+\.[0-9][0-9])
+([0-9]+\.[0-9][0-9])([0-9]+)(.*)/,
"12031361 ABC3 567.00
5177.6620041127\xbd\xba\xa6w\xc5@\xb9\xea\xb4f\xc5 \xd6\xa5\xa9(\xacX\xb2n\xb4\xd6\
xbch)\xa4\xe9\xa5\xce12x20's";

print "Result:
".$prodcode.",".$custname.",".$qty.",".$cost.",".$ date.",".$prodname . "\n";

if ($prodcode eq "" or $custname eq "" or $qty eq "" or $cost eq "" or
$date eq "" or $prodname eq "") {
print "Failed to parse input file.\n";
exit;
}

But the parser failed to parse the input text, it returns empty string.
What is wrong with the above code, especially the parser I created for
parsing the $date.

Thanks
Sam

Arndt Jonasson 12-23-2004 09:47 AM

Re: reg expression with input line
 

sam <sam.wun@authtec.com> writes:
>
> I would like to write a perl script to parse each line read from a
> text file.
> I ended up some perl code as shown below:
>
> ($prodcode,$custname,$qty,$cost,$date,$prodname) =
> /^([0-9\-]+) +([A-Za-z0-9\-]+) +([0-9]+\.[0-9][0-9])
> +([0-9]+\.[0-9][0-9])([0-9]+)(.*)/,
> "12031361 ABC3 567.00
> 5177.6620041127\xbd\xba\xa6w\xc5@\xb9\xea\xb4f\xc5 \xd6\xa5\xa9(\xacX\xb2n\xb4\xd6\
> xbch)\xa4\xe9\xa5\xce12x20's";
>
> print "Result:
> ".$prodcode.",".$custname.",".$qty.",".$cost.",".$ date.",".$prodname
> . "\n";
>
> if ($prodcode eq "" or $custname eq "" or $qty eq "" or $cost eq "" or
> $date eq "" or $prodname eq "") {
> print "Failed to parse input file.\n";
> exit;
> }
>
> But the parser failed to parse the input text, it returns empty string.
> What is wrong with the above code, especially the parser I created for
> parsing the $date.


To begin with, you should ask perl for warnings, either with the -w
option, or with the directive "use warnings;". Then it will tell you
that you get uninitialized values on the "print" line. Your test already
shows that, but you will see that in fact all variables are uninitialized
(meaning their value is 'undef').

It also tells you "Useless use of a constant in void context". It points
out the line where the statement starts, not the place where the constant
starts, but there is only one constant here anyway, and it's the data
string.

The immediate suspicion is that
($var) = /regexp/, "string";
may not be the way to ask perl to match a string with a regexp. And
it isn't. Look it up and you'll see that it is
($var) = "string" =~ /regexp/;

Now that still won't work, because you only get a list from a regexp
if you ask for all matches, which you do with the 'g' modifier. So
you want
($var) = "string" =~ /regexp/g;

The parenthesized items in your regexp match their counterpart in the
string, so after rewriting as I described, it will work.


I don't see much of a parser to parse $date. [0-9]+ seems to work here
for extracting that part of the string, as long as you're sure that
the first following character is not a digit. You can use \d instead
of [0-9], it means the same thing.

Anno Siegel 12-23-2004 10:00 AM

Re: reg expression with input line
 
sam <sam.wun@authtec.com> wrote in comp.lang.perl.misc:
> Hi,
>
> I would like to write a perl script to parse each line read from a text
> file.
> I ended up some perl code as shown below:
>
> ($prodcode,$custname,$qty,$cost,$date,$prodname) =
> /^([0-9\-]+) +([A-Za-z0-9\-]+) +([0-9]+\.[0-9][0-9])
> +([0-9]+\.[0-9][0-9])([0-9]+)(.*)/,


Up to here, it looks like a regex of sorts, but what is this:

> "12031361 ABC3 567.00
> 5177.6620041127\xbd\xba\xa6w\xc5@\xb9\xea\xb4f\xc5 \xd6\xa5\xa9(\xacX\xb2n\xb4\xd6\
> xbch)\xa4\xe9\xa5\xce12x20's";


> print "Result:
> ".$prodcode.",".$custname.",".$qty.",".$cost.",".$ date.",".$prodname . "\n";


Use string interpolation, not concatenation if there are lots of
variables. Better yet, collect the result in an array @data, then
say

print "Result: ", join( ',', @data), "\n";

> if ($prodcode eq "" or $custname eq "" or $qty eq "" or $cost eq "" or
> $date eq "" or $prodname eq "") {
> print "Failed to parse input file.\n";
> exit;
> }


....and this could be written

print "Failed to parse input file.\n" if grep length() == 0, @data;

> But the parser failed to parse the input text, it returns empty string.
> What is wrong with the above code, especially the parser I created for
> parsing the $date.


Which part of the regex is supposed to parse a date, and in what format?
What does the input data look like anyway? It's probably possible to
infer that from the (mangled) code you've given, but I'm not going to.

Anno

Anno Siegel 12-23-2004 10:09 AM

Re: reg expression with input line
 
Arndt Jonasson <do-not-use@invalid.net> wrote in comp.lang.perl.misc:

[...]

> ($var) = "string" =~ /regexp/;
>
> Now that still won't work, because you only get a list from a regexp
> if you ask for all matches, which you do with the 'g' modifier. So


That is not true. /g is only needed when the regex doesn't capture
anything. If it does, the captures will be delivered in list context.

Anno

Arndt Jonasson 12-23-2004 10:23 AM

Re: reg expression with input line
 

anno4000@lublin.zrz.tu-berlin.de (Anno Siegel) writes:
> Arndt Jonasson <do-not-use@invalid.net> wrote in comp.lang.perl.misc:
>
> [...]
>
> > ($var) = "string" =~ /regexp/;
> >
> > Now that still won't work, because you only get a list from a regexp
> > if you ask for all matches, which you do with the 'g' modifier. So

>
> That is not true. /g is only needed when the regex doesn't capture
> anything. If it does, the captures will be delivered in list context.


Oops. I'm sorry for being misleading. Clearly described in the regexp
section, too...


Brian McCauley 12-23-2004 12:38 PM

Re: reg expression with input line
 


sam wrote:

> I ended up some perl code as shown below:
>
> ($prodcode,$custname,$qty,$cost,$date,$prodname) =
> /^([0-9\-]+) +([A-Za-z0-9\-]+) +([0-9]+\.[0-9][0-9])
> +([0-9]+\.[0-9][0-9])([0-9]+)(.*)/,
> "12031361 ABC3 567.00
> 5177.6620041127\xbd\xba\xa6w\xc5@\xb9\xea\xb4f\xc5 \xd6\xa5\xa9(\xacX\xb2n\xb4\xd6\
>
> xbch)\xa4\xe9\xa5\xce12x20's";


What are you expecting the comma operator in the above code to do?
Where did you get this expectation? Compare your expectation to what
comma actually does (RTFM). Compare it also to the =~ operator which
does do what I'm guessing you think the comma does, but it's operands
are the other way around.

You should always compile Perl with strictures and warnings enabled.
Perl would then have told you something was wrong.

You should always delare all variables as lexically scoped in the
smallest applicable scope. This means there's a 95% chance that you
should have had a my() in there.

> print "Result:
> ".$prodcode.",".$custname.",".$qty.",".$cost.",".$ date.",".$prodname .
> "\n";


Why have you obfucated this?

print "Result: $prodcode,$custname,$qty,$cost,$date,$prodname\n";

>
> if ($prodcode eq "" or $custname eq "" or $qty eq "" or $cost eq "" or
> $date eq "" or $prodname eq "") {
> print "Failed to parse input file.\n";
> exit;
> }


There is no way any of those variables except $prodname can be an empty
string. If the match succedes then all the others must all be non-empty
as none of the other captures could match the empty string. If the
match failed then all the variables will be undefined. Although (undef
eq '') is true it makes your code clearer if you test definedness with
defined(). (Also it avoids a warning). It is also only necessary to
check the definedness of one of the variables. Better still just use
the return value of the list assignment statement that will be true if
the match succeded.

> But the parser failed to parse the input text, it returns empty string.


This is nonsense there is no return value from your code.

> What is wrong with the above code, especially the parser I created for
> parsing the $date.


The parser you created for parsing $date was not included in the code
you posted so we can't possbily comment.

[ Please excuse the line-wrap damage in the following ]

#!/usr/bin/perl
use strict;
use warnings;

$_= "12031361 ABC3 567.00
5177.6620041127\xbd\xba\xa6w\xc5@\xb9\xea\xb4f\xc5 \xd6\xa5\xa9(\xacX\xb2n\xb4\xd6\xbch)\xa4\xe9\xa5\ xce12x20's";

if ( my($prodcode,$custname,$qty,$cost,$date,$prodname) =
/^([0-9\-]+) +([A-Za-z0-9\-]+) +([0-9]+\.[0-9][0-9])
+([0-9]+\.[0-9][0-9])([0-9]+)(.*)/ ) {

print "Result: $prodcode,$custname,$qty,$cost,$date,$prodname\n";
} else {
print "Failed to parse input file.\n";
exit;
}



sam 12-23-2004 02:59 PM

Re: reg expression with input line
 
Anno Siegel wrote:

> sam <sam.wun@authtec.com> wrote in comp.lang.perl.misc:
>
>>Hi,
>>
>>I would like to write a perl script to parse each line read from a text
>>file.
>>I ended up some perl code as shown below:
>>
>>($prodcode,$custname,$qty,$cost,$date,$prodnam e) =
>> /^([0-9\-]+) +([A-Za-z0-9\-]+) +([0-9]+\.[0-9][0-9])
>>+([0-9]+\.[0-9][0-9])([0-9]+)(.*)/,

>
>
> Up to here, it looks like a regex of sorts, but what is this:
>
>
>> "12031361 ABC3 567.00
>>5177.6620041127\xbd\xba\xa6w\xc5@\xb9\xea\xb4f\x c5\xd6\xa5\xa9(\xacX\xb2n\xb4\xd6\
>>xbch)\xa4\xe9\xa5\xce12x20's";

>
>
>>print "Result:
>>".$prodcode.",".$custname.",".$qty.",".$cost."," .$date.",".$prodname . "\n";

>
>
> Use string interpolation, not concatenation if there are lots of
> variables. Better yet, collect the result in an array @data, then
> say
>
> print "Result: ", join( ',', @data), "\n";
>
>
>>if ($prodcode eq "" or $custname eq "" or $qty eq "" or $cost eq "" or
>>$date eq "" or $prodname eq "") {
>> print "Failed to parse input file.\n";
>> exit;
>>}

>
>
> ...and this could be written
>
> print "Failed to parse input file.\n" if grep length() == 0, @data;
>

Thanks very much. This is very helpful indeed.

Thanks
Sam

>
>
> Anno



All times are GMT. The time now is 04:43 PM.

Powered by vBulletin®. Copyright ©2000 - 2014, vBulletin Solutions, Inc.
SEO by vBSEO ©2010, Crawlability, Inc.