Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Perl > Perl Misc > Need expert help matching a line

Reply
Thread Tools

Need expert help matching a line

 
 
Ramon F Herrera
Guest
Posts: n/a
 
      09-08-2009

This is really a parsing question, but I figure that nobody knows more
about regex and pattern matching than Perl programmers.

I have many files which contain multiple lines of variable-value pair
assignments. I need to break down each lines into its 3 constituent
components.

Variable Name = Variable Value

IOW, each line contains 3 parts:

VariableName
Equal Sign
VariableValue

As opposed to the variable names used by many programming languages,
my variable names accept embedded space.

Here's some examples of the lines I am trying to match:

My Favorite Baseball Player = George Herman "Babe" Ruth
What did your do on Christmas = I rested, computed the % mortgage and
visited my brother + sister.
Favorite Curse = That umpire is a #&*%!

What I need is a way to specify valid characters.

VariableName: Alphanumeric (and perhaps underscore), blank space.
VariableValue: Pretty much anything is valid on the RHS except an '='
sign (I guess)

Thanks for your kind assistance.

-Ramon

 
Reply With Quote
 
 
 
 
Ramon F Herrera
Guest
Posts: n/a
 
      09-08-2009
On Sep 8, 8:23*am, Ramon F Herrera <(E-Mail Removed)> wrote:
> This is really a parsing question, but I figure that nobody knows more
> about regex and pattern matching than Perl programmers.
>
> I have many files which contain multiple lines of variable-value pair
> assignments. I need to break down each lines into its 3 constituent
> components.
>
> Variable Name = Variable Value
>
> IOW, each line contains 3 parts:
>
> VariableName
> Equal Sign
> VariableValue
>
> As opposed to the variable names used by many programming languages,
> my variable names accept embedded space.
>
> Here's some examples of the lines I am trying to match:
>
> My Favorite Baseball Player = George Herman "Babe" Ruth
> What did your do on Christmas = I rested, computed the % mortgage and
> visited my brother + sister.
> Favorite Curse = That umpire is a #&*%!
>
> What I need is a way to specify valid characters.
>
> VariableName: Alphanumeric (and perhaps underscore), blank space.
> VariableValue: Pretty much anything is valid on the RHS except an '='
> sign (I guess)
>
> Thanks for your kind assistance.
>
> -Ramon


Just to make the exercise a little harder -and fun- the assignment
syntax should be able to support continuation lines, where the RHS is
very long:

Describe your summer vacation = Well, we traveled to the beach
and to the mountains, and debated whether we
should go to the Grand Canyon and Niagara falls.
The GPS you gave me turned out to be very useful!

A continuation line always starts with blank space.

TIA,

-Ramon

 
Reply With Quote
 
 
 
 
Lucius Sanctimonious
Guest
Posts: n/a
 
      09-08-2009
On Sep 8, 9:28*am, Don Piven <(E-Mail Removed)> wrote:
> Ramon F Herrera wrote:
> > This is really a parsing question, but I figure that nobody knows more
> > about regex and pattern matching than Perl programmers.

>
> perlre (the manpage for Perl regular expressions) is your friend.
> Seriously. *It will answer all the questions you raised.



Thanks Don, seriously.

You are essentially telling me to RTFM. I have already RTFM.

The question remains open...

Thx,

-Ramon

 
Reply With Quote
 
Charlton Wilbur
Guest
Posts: n/a
 
      09-08-2009
>>>>> "LS" == Lucius Sanctimonious <(E-Mail Removed)> writes:

LS> You are essentially telling me to RTFM. I have already RTFM.

Your question shows no evidence of this.

LS> The question remains open...

Post what you've already tried, and let us know what you're having
problems with.

Also, review the posting guidelines that are posted here frequently, or
online at http://www.rehabitation.com/clpmisc/...uidelines.html --
they're a summary of what works best if you really want to get help,
instead of just wanting to stir up drama.

Charlton



--
Charlton Wilbur
http://www.velocityreviews.com/forums/(E-Mail Removed)
 
Reply With Quote
 
ccc31807
Guest
Posts: n/a
 
      09-08-2009
CODE:
use strict;
use warnings;

my ($var, $val);
my %variables;
while (<DATA>)
{
chomp;
if (/=/) { ($var, $val) = split /=/; }
elsif (/^ +\w+/) { $val .= $_; }
else { next; }
$var =~ s/^\s+//;
$var =~ s/\s+$//;
$variables{$var} = $val;
}

foreach my $key (keys %variables) { print "$key => $variables{$key}
\n"; }
exit(0);

__DATA__
My Favorite Baseball Player = George Herman "Babe" Ruth
What did your do on Christmas = I rested, computed the % mortgage and
visited my brother + sister.
Describe your summer vacation = Well, we traveled to the beach
and to the mountains, and debated whether we
should go to the Grand Canyon and Niagara falls.
The GPS you gave me turned out to be very useful!
Favorite Curse = That umpire is a #&*%!

OUTPUT:
My Favorite Baseball Player => George Herman "Babe" Ruth
Describe your summer vacation => Well, we traveled to the beach and
to the mountains, and debated whether we should go to the Grand
Canyon and Niagara falls. The GPS you gave me turned out to be very
useful!
Favorite Curse => That umpire is a #&*%!
What did your do on Christmas => I rested, computed the % mortgage
and visited my brother + sister.
 
Reply With Quote
 
sln@netherlands.com
Guest
Posts: n/a
 
      09-08-2009
On Tue, 8 Sep 2009 09:58:56 -0700 (PDT), ccc31807 <(E-Mail Removed)> wrote:

>CODE:
>use strict;
>use warnings;
>
>my ($var, $val);

= ('','');
>my %variables;
>while (<DATA>)
>{
> chomp;
> if (/=/) { ($var, $val) = split /=/; }
> elsif (/^ +\w+/) { $val .= $_; }
> else { next; }
> $var =~ s/^\s+//;
> $var =~ s/\s+$//;
> $variables{$var} = $val;
>}
>
>foreach my $key (keys %variables) { print "$key => $variables{$key}
>\n"; }
>exit(0);
>

Looks good. I like the way you did this.
Might need initial condition check
elsif (/^ +\w+/ and length($var)) { $val .= $_; }

-sln
 
Reply With Quote
 
sln@netherlands.com
Guest
Posts: n/a
 
      09-08-2009
On Tue, 8 Sep 2009 05:23:32 -0700 (PDT), Ramon F Herrera <(E-Mail Removed)> wrote:

>
>This is really a parsing question, but I figure that nobody knows more
>about regex and pattern matching than Perl programmers.
>
>I have many files which contain multiple lines of variable-value pair
>assignments. I need to break down each lines into its 3 constituent
>components.
>
>Variable Name = Variable Value
>
>IOW, each line contains 3 parts:
>
>VariableName
>Equal Sign
>VariableValue
>
>As opposed to the variable names used by many programming languages,
>my variable names accept embedded space.
>
>Here's some examples of the lines I am trying to match:
>
>My Favorite Baseball Player = George Herman "Babe" Ruth
>What did your do on Christmas = I rested, computed the % mortgage and
>visited my brother + sister.
>Favorite Curse = That umpire is a #&*%!
>
>What I need is a way to specify valid characters.
>
>VariableName: Alphanumeric (and perhaps underscore), blank space.
>VariableValue: Pretty much anything is valid on the RHS except an '='
>sign (I guess)
>
>Thanks for your kind assistance.
>
>-Ramon


-sln

use strict;
use warnings;

my $buf = '';

while (<DATA>)
{
if (/=/ or eof) {
if ($buf =~ /\s*([\w ]+)\s*=\s*((?:.+(?:\n .+)*)|)/)
{
my ($var,$val) = ($1,$2);
$val =~ s/\n +/\n/g;
print "$var => $val\n\n";
}
$buf = '';
}
$buf .= $_;
}
__DATA__

My Favorite Baseball Player = George Herman = "Babe" Ruth
What did your do on Christmas = I rested, computed the % mortgage and
visited my brother + sister.
asdfasdf=
Favorite Curse = That umpire is a #&*%!
errnngsf
sngdnsdg
Describe your summer vacation = Well, we traveled to the beach
and to the mountains, and debated whether we
should go to the Grand Canyon and Niagara falls.
The GPS you gave me turned out to be very useful!

 
Reply With Quote
 
Ramon F Herrera
Guest
Posts: n/a
 
      09-08-2009
On Sep 8, 1:35*pm, (E-Mail Removed) wrote:
> On Tue, 8 Sep 2009 05:23:32 -0700 (PDT), Ramon F Herrera <(E-Mail Removed)> wrote:
>
>
>
>
>
> >This is really a parsing question, but I figure that nobody knows more
> >about regex and pattern matching than Perl programmers.

>
> >I have many files which contain multiple lines of variable-value pair
> >assignments. I need to break down each lines into its 3 constituent
> >components.

>
> >Variable Name = Variable Value

>
> >IOW, each line contains 3 parts:

>
> >VariableName
> >Equal Sign
> >VariableValue

>
> >As opposed to the variable names used by many programming languages,
> >my variable names accept embedded space.

>
> >Here's some examples of the lines I am trying to match:

>
> >My Favorite Baseball Player = George Herman "Babe" Ruth
> >What did your do on Christmas = I rested, computed the % mortgage and
> >visited my brother + sister.
> >Favorite Curse = That umpire is a #&*%!

>
> >What I need is a way to specify valid characters.

>
> >VariableName: Alphanumeric (and perhaps underscore), blank space.
> >VariableValue: Pretty much anything is valid on the RHS except an '='
> >sign (I guess)

>
> >Thanks for your kind assistance.

>
> >-Ramon

>
> -sln
>
> use strict;
> use warnings;
>
> my $buf *= '';
>
> while (<DATA>)
> {
> * * * * if (/=/ or eof) {
> * * * * * * * * if ($buf =~ /\s*([\w ]+)\s*=\s*((?:.+(?:\n .+)*)|)/)
> * * * * * * * * {
> * * * * * * * * * * * * my ($var,$val) = ($1,$2);
> * * * * * * * * * * * * $val =~ s/\n +/\n/g;
> * * * * * * * * * * * * print "$var => $val\n\n";
> * * * * * * * * }
> * * * * * * * * $buf = '';
> * * * * }
> * * * * $buf .= $_; * *}
>
> __DATA__
>
> My Favorite Baseball Player = George Herman = *"Babe" Ruth
> What did your do on Christmas = I rested, computed the % mortgage and
> *visited my brother + sister.
> *asdfasdf=
> Favorite Curse = That umpire is a #&*%!
> errnngsf
> sngdnsdg
> Describe your summer vacation = Well, we traveled to the beach
> * and to the mountains, and debated whether we
> * should go to the Grand Canyon and Niagara falls.
> * The GPS you gave me turned out to be very useful!



Thank you, sln!

I have to clarify that my program is not written in Perl (language
that I haven't used in ages) but in C++. The reason I posted my
question in this NG will be understood by reading this:

http://www.boost.org/doc/libs/1_40_0...x/syntax..html

I am sticking with the default (Perl) Regex syntax.

This is the relevant code that I have so far. As you can see it is
rather simplistic. I am not implementing continuation lines yet.

const string variable = "([\\w ]+)";
const char equal_sign = '=';
const string value = "([\\w ]+)";

const string assignment = variable + equal_sign + value;

The question that I have is this: how do I restrict the LHS to begin
with an alphabetic characters? IOW: The LHS may contain blanks but
they cannot be the first character of the line. I will also be
accepting digits, periods and underscores on the LHS but again, the
variable name cannot begin with any of them.

TIA,

-Ramon

 
Reply With Quote
 
Ramon F Herrera
Guest
Posts: n/a
 
      09-08-2009
On Sep 8, 6:20*pm, Ramon F Herrera <(E-Mail Removed)> wrote:
> On Sep 8, 1:35*pm, (E-Mail Removed) wrote:
>
>
>
> > On Tue, 8 Sep 2009 05:23:32 -0700 (PDT), Ramon F Herrera <ra...@conexus..net> wrote:

>
> > >This is really a parsing question, but I figure that nobody knows more
> > >about regex and pattern matching than Perl programmers.

>
> > >I have many files which contain multiple lines of variable-value pair
> > >assignments. I need to break down each lines into its 3 constituent
> > >components.

>
> > >Variable Name = Variable Value

>
> > >IOW, each line contains 3 parts:

>
> > >VariableName
> > >Equal Sign
> > >VariableValue

>
> > >As opposed to the variable names used by many programming languages,
> > >my variable names accept embedded space.

>
> > >Here's some examples of the lines I am trying to match:

>
> > >My Favorite Baseball Player = George Herman "Babe" Ruth
> > >What did your do on Christmas = I rested, computed the % mortgage and
> > >visited my brother + sister.
> > >Favorite Curse = That umpire is a #&*%!

>
> > >What I need is a way to specify valid characters.

>
> > >VariableName: Alphanumeric (and perhaps underscore), blank space.
> > >VariableValue: Pretty much anything is valid on the RHS except an '='
> > >sign (I guess)

>
> > >Thanks for your kind assistance.

>
> > >-Ramon

>
> > -sln

>
> > use strict;
> > use warnings;

>
> > my $buf *= '';

>
> > while (<DATA>)
> > {
> > * * * * if (/=/ or eof) {
> > * * * * * * * * if ($buf =~ /\s*([\w ]+)\s*=\s*((?:..+(?:\n .+)*)|)/)
> > * * * * * * * * {
> > * * * * * * * * * * * * my ($var,$val) = ($1,$2);
> > * * * * * * * * * * * * $val =~ s/\n +/\n/g;
> > * * * * * * * * * * * * print "$var => $val\n\n";
> > * * * * * * * * }
> > * * * * * * * * $buf = '';
> > * * * * }
> > * * * * $buf .= $_; * *}

>
> > __DATA__

>
> > My Favorite Baseball Player = George Herman = *"Babe" Ruth
> > What did your do on Christmas = I rested, computed the % mortgage and
> > *visited my brother + sister.
> > *asdfasdf=
> > Favorite Curse = That umpire is a #&*%!
> > errnngsf
> > sngdnsdg
> > Describe your summer vacation = Well, we traveled to the beach
> > * and to the mountains, and debated whether we
> > * should go to the Grand Canyon and Niagara falls.
> > * The GPS you gave me turned out to be very useful!

>
> Thank you, sln!
>
> I have to clarify that my program is not written in Perl (language
> that I haven't used in ages) but in C++. The reason I posted my
> question in this NG will be understood by reading this:
>
> http://www.boost.org/doc/libs/1_40_0...l/boost_regex/...
>
> I am sticking with the default (Perl) Regex syntax.
>
> This is the relevant code that I have so far. As you can see it is
> rather simplistic. I am not implementing continuation lines yet.
>
> const string variable = "([\\w ]+)";
> const char equal_sign = '=';
> const string value * *= "([\\w ]+)";
>
> const string assignment = variable + equal_sign + value;
>
> The question that I have is this: how do I restrict the LHS to begin
> with an alphabetic characters? IOW: The LHS may contain blanks but
> they cannot be the first character of the line. I will also be
> accepting digits, periods and underscores on the LHS but again, the
> variable name cannot begin with any of them.
>
> TIA,
>
> -Ramon


I have made some progress here:

const string variable = "(\\w+[\\w\\d\\. ]*)";
const char equal_sign = '=';
const string value = "(.+)";

I think the above will cover most real cases, but not sure what will
happen if the RHS contains an '=' sign?

-RFH

 
Reply With Quote
 
sln@netherlands.com
Guest
Posts: n/a
 
      09-08-2009
On Tue, 8 Sep 2009 15:20:04 -0700 (PDT), Ramon F Herrera <(E-Mail Removed)> wrote:

>On Sep 8, 1:35*pm, (E-Mail Removed) wrote:
>> On Tue, 8 Sep 2009 05:23:32 -0700 (PDT), Ramon F Herrera <(E-Mail Removed)> wrote:
>>

<snip>
>I have to clarify that my program is not written in Perl (language
>that I haven't used in ages) but in C++. The reason I posted my
>question in this NG will be understood by reading this:
>
>http://www.boost.org/doc/libs/1_40_0...ex/syntax.html
>
>I am sticking with the default (Perl) Regex syntax.
>

This uses Perl 5.8 as a reference to describe the syntax. That is its library default?
http://www.boost.org/doc/libs/1_40_0...rl_syntax.html

>This is the relevant code that I have so far. As you can see it is
>rather simplistic. I am not implementing continuation lines yet.
>
>const string variable = "([\\w ]+)";
>const char equal_sign = '=';
>const string value = "([\\w ]+)";
>
>const string assignment = variable + equal_sign + value;
>
>The question that I have is this: how do I restrict the LHS to begin
>with an alphabetic characters? IOW: The LHS may contain blanks but
>they cannot be the first character of the line. I will also be
>accepting digits, periods and underscores on the LHS but again, the
>variable name cannot begin with any of them.
>

To just add those restrictions just requires this:
const string variable = "([a-zA-Z][\\w. ]+)";

There is nothing in the regex that is not in Perl 5.8, if
thats what they will be using.

-sln
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Help with Pattern matching. Matching multiple lines from while reading from a file. Bobby Chamness Perl Misc 2 05-03-2007 06:02 PM
Need help from .NEt remote expert ?? =?Utf-8?B?c2VyZ2UgY2FsZGVyYXJh?= MCSD 3 08-12-2006 10:29 AM
Need expert help with sharing a satellite internet system mlsteen1 Wireless Networking 3 03-28-2006 12:21 AM
HELP! Anyone Here an Expert With Modems and multi-line Analog Lines in Businesses?? Sens Fan Happy In Ohio Computer Support 4 07-29-2004 12:48 AM
Pattern matching : not matching problem Marc Bissonnette Perl Misc 9 01-13-2004 05:52 PM



Advertisments