Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Perl > Perl Misc > Should be a simple parsing problem

Reply
Thread Tools

Should be a simple parsing problem

 
 
Tim
Guest
Posts: n/a
 
      04-07-2007
Hello all, I am trying to get a simple Perl script working to
transform some data, but have an incredibly onerous solution that
looks more like visual basic than perl using lots of conditional
statements, while loops and the shift function. Something inside
doesn't feel right about that, plus I think I am missing out on
expanding my limited Perl knowledge.

My data is in the general form:
_________________________
TEXT {
title: test
font: script
}

POLYGON {
name: foo
type: good
POINTS 3 {
4,3
2,16
633,2
}
}

JUNK {
title: nothing
}

POLYGON {
name: foo2
type: bad
POINTS 2 {
7,9
3,2
}

}

Now I want to extract the points where the polygon type is 'good' so
my
output would be:
4,3
2,16
633,2

I can get Text::Balanced to work on a single line, but don't know how
to elegantly parse down to the fields I need. Any thoughts would be
greatly appreciated. Not committed to Text::Balanced, but it seems
like it should work.

best,

Tim

 
Reply With Quote
 
 
 
 
Martijn Lievaart
Guest
Posts: n/a
 
      04-07-2007
On Sat, 07 Apr 2007 04:21:22 -0700, Tim wrote:

> Hello all, I am trying to get a simple Perl script working to transform
> some data, but have an incredibly onerous solution that looks more like
> visual basic than perl using lots of conditional statements, while loops
> and the shift function. Something inside doesn't feel right about that,
> plus I think I am missing out on expanding my limited Perl knowledge.
>
> My data is in the general form:
> _________________________
> TEXT {
> title: test
> font: script
> }
>
> POLYGON {
> name: foo
> type: good
> POINTS 3 {
> 4,3
> 2,16
> 633,2
> }
> }
>
> JUNK {
> title: nothing
> }
>
> POLYGON {
> name: foo2
> type: bad
> POINTS 2 {
> 7,9
> 3,2
> }
>
> }
>
> Now I want to extract the points where the polygon type is 'good' so my
> output would be:
> 4,3
> 2,16
> 633,2
>
> I can get Text::Balanced to work on a single line, but don't know how to
> elegantly parse down to the fields I need. Any thoughts would be greatly
> appreciated. Not committed to Text::Balanced, but it seems like it
> should work.
>


OTTOMH:

while (<>) {
/^POLYGON\s+{/ and do {
while (<>) {
/\stype:\sgood\s*$/ {
//handle point here in the same way
/^}\s*$/ and last;
}
/^}\s*$/ and last;
}
}
}

This obviously assumes your input is always formatted in the same way, is
always correct and type: comes before the POINT.

HTH
M4

 
Reply With Quote
 
 
 
 
Tad McClellan
Guest
Posts: n/a
 
      04-07-2007
Tim <(E-Mail Removed)> wrote:

> Now I want to extract the points where the polygon type is 'good' so
> my
> output would be:
> 4,3
> 2,16
> 633,2



--------------------------
#!/usr/bin/perl
use warnings;
use strict;
use Text::Balanced 'extract_bracketed';

local $/ = ''; # enable paragraph mode, see perlvar.pod

while ( <DATA> ) {
next unless /type: good.*POINTS\s+\d+/gs;
my $bracketed = extract_bracketed();
print "$bracketed\n";
}


__DATA__
TEXT {
title: test
font: script
}

POLYGON {
name: foo
type: good
POINTS 3 {
4,3
2,16
633,2
}
}

JUNK {
title: nothing
}

POLYGON {
name: foo2
type: bad
POINTS 2 {
7,9
3,2
}

}
--------------------------


--
Tad McClellan SGML consulting
http://www.velocityreviews.com/forums/(E-Mail Removed) Perl programming
Fort Worth, Texas
 
Reply With Quote
 
Tim
Guest
Posts: n/a
 
      04-08-2007
On Apr 7, 9:21 am, Martijn Lievaart <(E-Mail Removed)> wrote:
> On Sat, 07 Apr 2007 04:21:22 -0700, Tim wrote:
> > Hello all, I am trying to get a simple Perl script working to transform
> > some data, but have an incredibly onerous solution that looks more like
> > visual basic than perl using lots of conditional statements, while loops
> > and the shift function. Something inside doesn't feel right about that,
> > plus I think I am missing out on expanding my limited Perl knowledge.

>
> > My data is in the general form:
> > _________________________
> > TEXT {
> > title: test
> > font: script
> > }

>
> > POLYGON {
> > name: foo
> > type: good
> > POINTS 3 {
> > 4,3
> > 2,16
> > 633,2
> > }
> > }

>
> > JUNK {
> > title: nothing
> > }

>
> > POLYGON {
> > name: foo2
> > type: bad
> > POINTS 2 {
> > 7,9
> > 3,2
> > }

>
> > }

>
> > Now I want to extract the points where the polygon type is 'good' so my
> > output would be:
> > 4,3
> > 2,16
> > 633,2

>
> > I can get Text::Balanced to work on a single line, but don't know how to
> > elegantly parse down to the fields I need. Any thoughts would be greatly
> > appreciated. Not committed to Text::Balanced, but it seems like it
> > should work.

>
> OTTOMH:
>
> while (<>) {
> /^POLYGON\s+{/ and do {
> while (<>) {
> /\stype:\sgood\s*$/ {
> //handle point here in the same way
> /^}\s*$/ and last;
> }
> /^}\s*$/ and last;
> }
> }
>
> }
>
> This obviously assumes your input is always formatted in the same way, is
> always correct and type: comes before the POINT.
>
> HTH
> M4


This is great, and very instructive. Thank you so much, so what I
wasn't understanding is how you can nest while(<>) statements. I have
to think a bit more about what is really going on there, but this is
what I was looking for: code that works more in line with how I think
instead of going line-by line and doing careful book-keeping. I also
need to look up the 'and last' statement and see what that is doing.
Thanks again.

Tim

 
Reply With Quote
 
Martijn Lievaart
Guest
Posts: n/a
 
      04-08-2007
On Sat, 07 Apr 2007 19:27:38 -0700, Tim wrote:

> This is great, and very instructive. Thank you so much, so what I wasn't
> understanding is how you can nest while(<>) statements. I have to think
> a bit more about what is really going on there, but this is what I was
> looking for: code that works more in line with how I think instead of
> going line-by line and doing careful book-keeping. I also need to look
> up the 'and last' statement and see what that is doing. Thanks again.


I like Tads solution much better, but fwiw:

- Yes you can nest while (<>) like this. Normally it is more trouble than
it's worth, but in this case it is appropriate. Just remember that you
are reading the same file with the same filepointer so if you get to
another while (<>) (or back to) that reads on where the last one left of.

- The "<condition> and last;" construct is another way of saying "if
(<condition>) { last; }",, only shorter. Often seen like this:

# parse config file
while (<$fh>) {
/^\s*$/ and continue; # skip empty lines
/^\s*#/ and continue; # skip comments
....
}

HTH,
M4
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
What libraries should I use for MIME parsing, XML parsing, and MySQL ? John Levine Ruby 0 02-02-2012 11:15 PM
This should be simple but Simple left the building. ft310 Javascript 2 06-29-2008 08:33 AM
Simple sscanf parsing problem Timo C Programming 4 06-28-2008 02:11 PM
XML::Simple Parsing with Attributes problem John Perl Misc 1 02-03-2006 10:47 PM
XML::Simple Parsing with Attributes problem John Perl Misc 1 02-03-2006 04:39 PM



Advertisments