Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Perl > Perl Misc > reg expression with input line

Reply
Thread Tools

reg expression with input line

 
 
sam
Guest
Posts: n/a
 
      12-23-2004
Hi,

I would like to write a perl script to parse each line read from a text
file.
I ended up some perl code as shown below:

($prodcode,$custname,$qty,$cost,$date,$prodname) =
/^([0-9\-]+) +([A-Za-z0-9\-]+) +([0-9]+\.[0-9][0-9])
+([0-9]+\.[0-9][0-9])([0-9]+)(.*)/,
"12031361 ABC3 567.00
5177.6620041127\xbd\xba\xa6w\xc5@\xb9\xea\xb4f\xc5 \xd6\xa5\xa9(\xacX\xb2n\xb4\xd6\
xbch)\xa4\xe9\xa5\xce12x20's";

print "Result:
".$prodcode.",".$custname.",".$qty.",".$cost.",".$ date.",".$prodname . "\n";

if ($prodcode eq "" or $custname eq "" or $qty eq "" or $cost eq "" or
$date eq "" or $prodname eq "") {
print "Failed to parse input file.\n";
exit;
}

But the parser failed to parse the input text, it returns empty string.
What is wrong with the above code, especially the parser I created for
parsing the $date.

Thanks
Sam
 
Reply With Quote
 
 
 
 
Arndt Jonasson
Guest
Posts: n/a
 
      12-23-2004

sam <(E-Mail Removed)> writes:
>
> I would like to write a perl script to parse each line read from a
> text file.
> I ended up some perl code as shown below:
>
> ($prodcode,$custname,$qty,$cost,$date,$prodname) =
> /^([0-9\-]+) +([A-Za-z0-9\-]+) +([0-9]+\.[0-9][0-9])
> +([0-9]+\.[0-9][0-9])([0-9]+)(.*)/,
> "12031361 ABC3 567.00
> 5177.6620041127\xbd\xba\xa6w\xc5@\xb9\xea\xb4f\xc5 \xd6\xa5\xa9(\xacX\xb2n\xb4\xd6\
> xbch)\xa4\xe9\xa5\xce12x20's";
>
> print "Result:
> ".$prodcode.",".$custname.",".$qty.",".$cost.",".$ date.",".$prodname
> . "\n";
>
> if ($prodcode eq "" or $custname eq "" or $qty eq "" or $cost eq "" or
> $date eq "" or $prodname eq "") {
> print "Failed to parse input file.\n";
> exit;
> }
>
> But the parser failed to parse the input text, it returns empty string.
> What is wrong with the above code, especially the parser I created for
> parsing the $date.


To begin with, you should ask perl for warnings, either with the -w
option, or with the directive "use warnings;". Then it will tell you
that you get uninitialized values on the "print" line. Your test already
shows that, but you will see that in fact all variables are uninitialized
(meaning their value is 'undef').

It also tells you "Useless use of a constant in void context". It points
out the line where the statement starts, not the place where the constant
starts, but there is only one constant here anyway, and it's the data
string.

The immediate suspicion is that
($var) = /regexp/, "string";
may not be the way to ask perl to match a string with a regexp. And
it isn't. Look it up and you'll see that it is
($var) = "string" =~ /regexp/;

Now that still won't work, because you only get a list from a regexp
if you ask for all matches, which you do with the 'g' modifier. So
you want
($var) = "string" =~ /regexp/g;

The parenthesized items in your regexp match their counterpart in the
string, so after rewriting as I described, it will work.


I don't see much of a parser to parse $date. [0-9]+ seems to work here
for extracting that part of the string, as long as you're sure that
the first following character is not a digit. You can use \d instead
of [0-9], it means the same thing.
 
Reply With Quote
 
 
 
 
Anno Siegel
Guest
Posts: n/a
 
      12-23-2004
sam <(E-Mail Removed)> wrote in comp.lang.perl.misc:
> Hi,
>
> I would like to write a perl script to parse each line read from a text
> file.
> I ended up some perl code as shown below:
>
> ($prodcode,$custname,$qty,$cost,$date,$prodname) =
> /^([0-9\-]+) +([A-Za-z0-9\-]+) +([0-9]+\.[0-9][0-9])
> +([0-9]+\.[0-9][0-9])([0-9]+)(.*)/,


Up to here, it looks like a regex of sorts, but what is this:

> "12031361 ABC3 567.00
> 5177.6620041127\xbd\xba\xa6w\xc5@\xb9\xea\xb4f\xc5 \xd6\xa5\xa9(\xacX\xb2n\xb4\xd6\
> xbch)\xa4\xe9\xa5\xce12x20's";


> print "Result:
> ".$prodcode.",".$custname.",".$qty.",".$cost.",".$ date.",".$prodname . "\n";


Use string interpolation, not concatenation if there are lots of
variables. Better yet, collect the result in an array @data, then
say

print "Result: ", join( ',', @data), "\n";

> if ($prodcode eq "" or $custname eq "" or $qty eq "" or $cost eq "" or
> $date eq "" or $prodname eq "") {
> print "Failed to parse input file.\n";
> exit;
> }


....and this could be written

print "Failed to parse input file.\n" if grep length() == 0, @data;

> But the parser failed to parse the input text, it returns empty string.
> What is wrong with the above code, especially the parser I created for
> parsing the $date.


Which part of the regex is supposed to parse a date, and in what format?
What does the input data look like anyway? It's probably possible to
infer that from the (mangled) code you've given, but I'm not going to.

Anno
 
Reply With Quote
 
Anno Siegel
Guest
Posts: n/a
 
      12-23-2004
Arndt Jonasson <(E-Mail Removed)> wrote in comp.lang.perl.misc:

[...]

> ($var) = "string" =~ /regexp/;
>
> Now that still won't work, because you only get a list from a regexp
> if you ask for all matches, which you do with the 'g' modifier. So


That is not true. /g is only needed when the regex doesn't capture
anything. If it does, the captures will be delivered in list context.

Anno
 
Reply With Quote
 
Arndt Jonasson
Guest
Posts: n/a
 
      12-23-2004

http://www.velocityreviews.com/forums/(E-Mail Removed)-berlin.de (Anno Siegel) writes:
> Arndt Jonasson <(E-Mail Removed)> wrote in comp.lang.perl.misc:
>
> [...]
>
> > ($var) = "string" =~ /regexp/;
> >
> > Now that still won't work, because you only get a list from a regexp
> > if you ask for all matches, which you do with the 'g' modifier. So

>
> That is not true. /g is only needed when the regex doesn't capture
> anything. If it does, the captures will be delivered in list context.


Oops. I'm sorry for being misleading. Clearly described in the regexp
section, too...

 
Reply With Quote
 
Brian McCauley
Guest
Posts: n/a
 
      12-23-2004


sam wrote:

> I ended up some perl code as shown below:
>
> ($prodcode,$custname,$qty,$cost,$date,$prodname) =
> /^([0-9\-]+) +([A-Za-z0-9\-]+) +([0-9]+\.[0-9][0-9])
> +([0-9]+\.[0-9][0-9])([0-9]+)(.*)/,
> "12031361 ABC3 567.00
> 5177.6620041127\xbd\xba\xa6w\xc5@\xb9\xea\xb4f\xc5 \xd6\xa5\xa9(\xacX\xb2n\xb4\xd6\
>
> xbch)\xa4\xe9\xa5\xce12x20's";


What are you expecting the comma operator in the above code to do?
Where did you get this expectation? Compare your expectation to what
comma actually does (RTFM). Compare it also to the =~ operator which
does do what I'm guessing you think the comma does, but it's operands
are the other way around.

You should always compile Perl with strictures and warnings enabled.
Perl would then have told you something was wrong.

You should always delare all variables as lexically scoped in the
smallest applicable scope. This means there's a 95% chance that you
should have had a my() in there.

> print "Result:
> ".$prodcode.",".$custname.",".$qty.",".$cost.",".$ date.",".$prodname .
> "\n";


Why have you obfucated this?

print "Result: $prodcode,$custname,$qty,$cost,$date,$prodname\n";

>
> if ($prodcode eq "" or $custname eq "" or $qty eq "" or $cost eq "" or
> $date eq "" or $prodname eq "") {
> print "Failed to parse input file.\n";
> exit;
> }


There is no way any of those variables except $prodname can be an empty
string. If the match succedes then all the others must all be non-empty
as none of the other captures could match the empty string. If the
match failed then all the variables will be undefined. Although (undef
eq '') is true it makes your code clearer if you test definedness with
defined(). (Also it avoids a warning). It is also only necessary to
check the definedness of one of the variables. Better still just use
the return value of the list assignment statement that will be true if
the match succeded.

> But the parser failed to parse the input text, it returns empty string.


This is nonsense there is no return value from your code.

> What is wrong with the above code, especially the parser I created for
> parsing the $date.


The parser you created for parsing $date was not included in the code
you posted so we can't possbily comment.

[ Please excuse the line-wrap damage in the following ]

#!/usr/bin/perl
use strict;
use warnings;

$_= "12031361 ABC3 567.00
5177.6620041127\xbd\xba\xa6w\xc5@\xb9\xea\xb4f\xc5 \xd6\xa5\xa9(\xacX\xb2n\xb4\xd6\xbch)\xa4\xe9\xa5\ xce12x20's";

if ( my($prodcode,$custname,$qty,$cost,$date,$prodname) =
/^([0-9\-]+) +([A-Za-z0-9\-]+) +([0-9]+\.[0-9][0-9])
+([0-9]+\.[0-9][0-9])([0-9]+)(.*)/ ) {

print "Result: $prodcode,$custname,$qty,$cost,$date,$prodname\n";
} else {
print "Failed to parse input file.\n";
exit;
}


 
Reply With Quote
 
sam
Guest
Posts: n/a
 
      12-23-2004
Anno Siegel wrote:

> sam <(E-Mail Removed)> wrote in comp.lang.perl.misc:
>
>>Hi,
>>
>>I would like to write a perl script to parse each line read from a text
>>file.
>>I ended up some perl code as shown below:
>>
>>($prodcode,$custname,$qty,$cost,$date,$prodnam e) =
>> /^([0-9\-]+) +([A-Za-z0-9\-]+) +([0-9]+\.[0-9][0-9])
>>+([0-9]+\.[0-9][0-9])([0-9]+)(.*)/,

>
>
> Up to here, it looks like a regex of sorts, but what is this:
>
>
>> "12031361 ABC3 567.00
>>5177.6620041127\xbd\xba\xa6w\xc5@\xb9\xea\xb4f\x c5\xd6\xa5\xa9(\xacX\xb2n\xb4\xd6\
>>xbch)\xa4\xe9\xa5\xce12x20's";

>
>
>>print "Result:
>>".$prodcode.",".$custname.",".$qty.",".$cost."," .$date.",".$prodname . "\n";

>
>
> Use string interpolation, not concatenation if there are lots of
> variables. Better yet, collect the result in an array @data, then
> say
>
> print "Result: ", join( ',', @data), "\n";
>
>
>>if ($prodcode eq "" or $custname eq "" or $qty eq "" or $cost eq "" or
>>$date eq "" or $prodname eq "") {
>> print "Failed to parse input file.\n";
>> exit;
>>}

>
>
> ...and this could be written
>
> print "Failed to parse input file.\n" if grep length() == 0, @data;
>

Thanks very much. This is very helpful indeed.

Thanks
Sam

>
>
> Anno

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
C/C++ language proposal: Change the 'case expression' from "integral constant-expression" to "integral expression" Adem C++ 42 11-04-2008 12:39 PM
C/C++ language proposal: Change the 'case expression' from "integral constant-expression" to "integral expression" Adem C Programming 45 11-04-2008 12:39 PM
Windows Reg Pro vs Tweak now Reg Cleaner vs Registry fix, vs RegCleaner jl Computer Support 3 05-31-2005 12:53 AM
What does this reg expression mean? Lyn Perl 5 05-06-2004 12:36 AM
Re: ...newbie reg expression for this replacement... Daniel Bass ASP .Net 6 10-13-2003 10:12 AM



Advertisments