Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Perl > Perl Misc > newbie help

Reply
Thread Tools

newbie help

 
 
Ram
Guest
Posts: n/a
 
      02-03-2004
How do I search for just the ordsts start(<ordsts>) and end tags(</ordsts>)
and the data between them, and get just the last matched one. Also would
need an idea of how to get the last two matches.

Thanks for the pointers.


Sample Input file:
<logos>
<ordsts>
<gname>
</gname>
</ordsts>
<ordadd>
<aname>
</aname>
</ordadd>
</logos>
<customer>
<contact>
<pname>
</pname>
</contact>
<ordsts>
<name>
</name>
</ordsts>
<shipname>
<sname>
</sname>
</shipname>
</customer>
<ordsts>
<doc_hdr>
<type_code>ORDSTS</type_code>
<type_suffix>LE</type_suffix>
<direction>IN</direction>
</doc_hdr>
<ord_keys>
<ordno>200000</ordno>
</ord_keys>
<req_obj>
<obj>order_header</obj>
<obj>order_line</obj>
</req_obj>
</ordsts>
<order> <doc_hdr> <type_code>ORDER</type_code>
<type_suffix>LE</type_suffix> <direction>IN</direction> <client_da
a>User Supplied Data</client_data> <client_id>User Supplied
Data</client_id> <correlation_id>414D51204C45555343433033202020
040001EEE00042583</correlation_id>
<response_channel>CC.ORDER.REPLY</response_channel>
<correlation_id>41,4d,51,20,4c,45,55
53,43,43,30,33,20,20,20,20,40,0,1e,ee,0,4,25,83,</correlation_id>
<response_channel>LEUSCS01::CC.ORDER.REPLY.CS.S. Q</response_c
annel> </doc_hdr> <customer> <cus_num>3374831</cus_num>
<bill_to> <contact> <con_num>2</con_num> </
ontact> </bill_to> <ship_to> <address>
<adr_num>1</adr_num> </address> <taxwaregeocode> <
eocode>331003600</geocode></order>
<ordsts> <doc_hdr> <type_code>ORDER</type_code>
<type_suffix>LE</type_suffix> <direction>IN</direction> <client_d
ta>User Supplied Data</client_data> <client_id>User Supplied
Data</client_id> <correlation_id>414D51204C4555534343303320202
2040001EEE00042583</correlation_id>
<response_channel>CC.ORDER.REPLY</response_channel>
<correlation_id>41,4d,51,20,4c,45,5
,53,43,43,30,33,20,20,20,20,40,0,1e,ee,0,4,25,83,</correlation_id>
<response_channel>LEUSCS01::CC.ORDER.REPLY.CS.S. Q</response_
hannel> </doc_hdr> <customer> <cus_num>3374831</cus_num>
<bill_to> <contact> <con_num>2</con_num> <
contact> </bill_to> <ship_to> <address>
<adr_num>1</adr_num> </address> <taxwaregeocode>
geocode>331003600</geocode></ordsts>


 
Reply With Quote
 
 
 
 
Gunnar Hjalmarsson
Guest
Posts: n/a
 
      02-03-2004
Ram wrote:
> How do I search for just the ordsts start(<ordsts>) and end
> tags(</ordsts>) and the data between them, and get just the last
> matched one.


Assuming the data is in $_:

my ($lastmatch) = /.*(<ordsts>.*<\/ordsts>).*/s;

> Also would need an idea of how to get the last two matches.


I leave that as an excercise to you.

> Thanks for the pointers.


http://www.perldoc.com/perl5.8.0/pod/perlre.html

--
Gunnar Hjalmarsson
Email: http://www.gunnar.cc/cgi-bin/contact.pl

 
Reply With Quote
 
 
 
 
J Krugman
Guest
Posts: n/a
 
      02-03-2004
In <bvp3d5$ujeo2$(E-Mail Removed)-berlin.de> Gunnar Hjalmarsson <(E-Mail Removed)> writes:

>Assuming the data is in $_:


> my ($lastmatch) = /.*(<ordsts>.*<\/ordsts>).*/s;


Why doesn't this match everthing between the very first <ordsts>
in the file and the last </ordsts>? Isn't the regexp engine supposed
to give the longest match?

jill


 
Reply With Quote
 
Gunnar Hjalmarsson
Guest
Posts: n/a
 
      02-03-2004
J Krugman wrote:
> Gunnar Hjalmarsson writes:
>> Assuming the data is in $_:
>>
>> my ($lastmatch) = /.*(<ordsts>.*<\/ordsts>).*/s;

>
> Why doesn't this match everthing between the very first <ordsts> in
> the file and the last </ordsts>?


Because the first .* is greedy.

> Isn't the regexp engine supposed to give the longest match?


Nope.

Please read about greediness in perldoc perlre.

--
Gunnar Hjalmarsson
Email: http://www.gunnar.cc/cgi-bin/contact.pl

 
Reply With Quote
 
J Krugman
Guest
Posts: n/a
 
      02-03-2004
In <bvp7nu$v8rpc$(E-Mail Removed)-berlin.de> Gunnar Hjalmarsson <(E-Mail Removed)> writes:

>J Krugman wrote:
>> Gunnar Hjalmarsson writes:
>>> Assuming the data is in $_:
>>>
>>> my ($lastmatch) = /.*(<ordsts>.*<\/ordsts>).*/s;

>>
>> Why doesn't this match everthing between the very first <ordsts> in
>> the file and the last </ordsts>?


>Because the first .* is greedy.


OK, I missed that. Thanks.

jill
 
Reply With Quote
 
Ram
Guest
Posts: n/a
 
      02-04-2004
This string does not match if <ordsts> and </ordsts> has child tags spread
across multiple lines.

If I stick this to the end of file, it does not match:
<ordsts>
<gname>
</gname>
</ordsts>
But it matches:
<ordsts> <gname> </gname> </ordsts>

For my case, it should match the both, including the child tags.

Thanks!!



"Gunnar Hjalmarsson" <(E-Mail Removed)> wrote in message
news:bvp3d5$ujeo2$(E-Mail Removed)-berlin.de...
> Ram wrote:
> > How do I search for just the ordsts start(<ordsts>) and end
> > tags(</ordsts>) and the data between them, and get just the last
> > matched one.

>
> Assuming the data is in $_:
>
> my ($lastmatch) = /.*(<ordsts>.*<\/ordsts>).*/s;
>
> > Also would need an idea of how to get the last two matches.

>
> I leave that as an excercise to you.
>
> > Thanks for the pointers.

>
> http://www.perldoc.com/perl5.8.0/pod/perlre.html
>
> --
> Gunnar Hjalmarsson
> Email: http://www.gunnar.cc/cgi-bin/contact.pl
>



 
Reply With Quote
 
Chris
Guest
Posts: n/a
 
      02-04-2004
Ram wrote:
> How do I search for just the ordsts start(<ordsts>) and end tags(</ordsts>)
> and the data between them, and get just the last matched one. Also would
> need an idea of how to get the last two matches.
>
> Thanks for the pointers.
>
> [snipped sample XML]


If this is XML, as it appears to be, you might do better parsing and get
better overall mileage from using XML::Simple or one of its close cousins.

(Wondering if this is the "Ram" that *I* know. If so, I hope you are
doing well.)

Chris
-----
Chris Olive
chris -at- technologEase -dot- com
http://www.technologEase.com
(pronounced "technologies")

 
Reply With Quote
 
Gunnar Hjalmarsson
Guest
Posts: n/a
 
      02-04-2004
[ Please do not top post! ]

Ram wrote:
> Gunnar Hjalmarsson wrote:
>>
>> Assuming the data is in $_:
>>
>> my ($lastmatch) = /.*(<ordsts>.*<\/ordsts>).*/s;

>
> This string does not match if <ordsts> and </ordsts> has child
> tags spread across multiple lines.


It's not a string, it's a regular expression, and it does match over
multiple lines.

> If I stick this to the end of file, it does not match:
> <ordsts>
> <gname>
> </gname>
> </ordsts>
> But it matches:
> <ordsts> <gname> </gname> </ordsts>


Would you mind showing us the code you used to end up to that conclusion?

> For my case, it should match the both, including the child tags.


And my suggestion does that perfectly well.

Have you began to study perldoc perlre yet? You'd better do so right
away, and don't forget to read about the /s modifier.

--
Gunnar Hjalmarsson
Email: http://www.gunnar.cc/cgi-bin/contact.pl

 
Reply With Quote
 
gnari
Guest
Posts: n/a
 
      02-04-2004
"Ram" <(E-Mail Removed)> wrote in message
news:bvr8mb$jaj$(E-Mail Removed)...

[note: if you do not top-post then it is more likely we want to help.
it si annoying when you put your follow-up at the top of your message,
quoting the message you are rplying to under that (in this case in whole)]


> This string does not match if <ordsts> and </ordsts> has child tags

spread
> across multiple lines.
> ...


> "Gunnar Hjalmarsson" <(E-Mail Removed)> wrote in message
> news:bvp3d5$ujeo2$(E-Mail Removed)-berlin.de...
> >
> > Assuming the data is in $_:


key sentence, perhaps?

> >
> > my ($lastmatch) = /.*(<ordsts>.*<\/ordsts>).*/s;


are you matching one line at a time?

gnari



 
Reply With Quote
 
James Willmore
Guest
Posts: n/a
 
      02-04-2004
[please don't top post - reordered to proper format] On Wed, 04 Feb 2004
11:05:06 -0600, Ram wrote:
> "Gunnar Hjalmarsson" <(E-Mail Removed)> wrote in message
> news:bvp3d5$ujeo2$(E-Mail Removed)-berlin.de...
>> Ram wrote:
>> > How do I search for just the ordsts start(<ordsts>) and end
>> > tags(</ordsts>) and the data between them, and get just the last
>> > matched one.

>>
>> Assuming the data is in $_:
>>
>> my ($lastmatch) = /.*(<ordsts>.*<\/ordsts>).*/s;
>>
>> > Also would need an idea of how to get the last two matches.

>>
>> I leave that as an excercise to you.
>>
>> > Thanks for the pointers.

>>
>> http://www.perldoc.com/perl5.8.0/pod/perlre.html
>>
>> --
>> Gunnar Hjalmarsson
>> Email: http://www.gunnar.cc/cgi-bin/contact.pl
>>

> This string does not match if <ordsts> and </ordsts> has child tags
> spread across multiple lines.
>
> If I stick this to the end of file, it does not match: <ordsts>
> <gname>
> </gname>
> </ordsts>
> But it matches:
> <ordsts> <gname> </gname> </ordsts>
>
> For my case, it should match the both, including the child tags.


I'd follow the suggestion offered by Chris Olive - use an XML module to
parse your data. It will save you lots of time and effort - and reduce
the amount of "mistakes" made in parsing. Right now, if someone changes
the format of the file, you'll have to go through a similar type exercise
again in the future.

Again, it's just a suggestion

HTH

--
Jim

Copyright notice: all code written by the author in this post is
released under the GPL. http://www.gnu.org/licenses/gpl.txt
for more information.

a fortune quote ...
You never know how many friends you have until you rent a house
<on the beach.

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
newbie with newbie questions JohnE ASP .Net 3 08-17-2009 10:10 PM
VONAGE Newbie w/newbie question New_kid@nowhere.new VOIP 0 08-11-2007 01:40 PM
another newbie question from another newbie.... Lee UK VOIP 4 05-17-2005 04:10 PM
newbie: cisco vlan newbie question No Spam Cisco 3 06-07-2004 10:02 AM
Newbie! I'm a newbie! What's wrong with this program? Id0x Python 4 07-20-2003 11:40 PM



Advertisments