Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Perl > Perl Misc > Regular Expression for XML Parsing

Reply
Thread Tools

Regular Expression for XML Parsing

 
 
tushar.saxena@gmail.com
Guest
Posts: n/a
 
      12-27-2007
Hi,

I have a set of XML files from which I need to extract some data. The
format of the file is as follows :

<tag1>
<tag3>DATA1</tag3>
</tag1>

<tag2>
<tag3>DATA2</tag3>
</tag2>

I need to extract the DATA part of the xml structure

Note : tag3 can be contained either within tag1 or tag2, but I need to
extract data only from tag1. i.e. DATA1 should be extracted, but not
DATA2

If I want to get both DATA1 and DATA2 I can use a simple regex like :

if (($_ =~ /<tag3>(\w+)<\/tag3>/g))
{
print $1
}

But if I try to get only DATA1 (embedded within tag1) I try using
something like this, but am unable to get it to work

if (($_ =~ /<tag1>[\n\s\S\w\W]*<tag2>(\w+)<\/tag2>[\n\s\S\w\W]*<\/
tag1>/g))
{
print $1
}

In this second case, the match itself fails.

Any help would be appreciated !
 
Reply With Quote
 
 
 
 
Jürgen Exner
Guest
Posts: n/a
 
      12-27-2007
On http://www.velocityreviews.com/forums/(E-Mail Removed) wrote:
>I have a set of XML files
>I need to extract the DATA part of the xml structure
>If I want to get both DATA1 and DATA2 I can use a simple regex like :


It's a bad idea in the first place. XML is not a regular language, why would
you use regular expressions to parse it?

>Any help would be appreciated !


Use a tool that is designed to parse XML like e.g. any of the XML parser
modules on CPAN.

jue
 
Reply With Quote
 
 
 
 
patriknym@hotmail.com
Guest
Posts: n/a
 
      12-27-2007
On 27 Dec, 20:59, (E-Mail Removed) wrote:
> Hi,
>
> I have a set of XML files from which I need to extract some data. The
> format of the file is as follows :
>
> <tag1>
> <tag3>DATA1</tag3>
> </tag1>
>
> <tag2>
> <tag3>DATA2</tag3>
> </tag2>
>
> I need to extract the DATA part of the xml structure
>
> Note : tag3 can be contained either within tag1 or tag2, but I need to
> extract data only from tag1. i.e. DATA1 should be extracted, but not
> DATA2
>
> If I want to get both DATA1 and DATA2 I can use a simple regex like :
>
> if (($_ =~ /<tag3>(\w+)<\/tag3>/g))
> {
> print $1
>
> }
>
> But if I try to get only DATA1 (embedded within tag1) I try using
> something like this, but am unable to get it to work
>
> if (($_ =~ /<tag1>[\n\s\S\w\W]*<tag2>(\w+)<\/tag2>[\n\s\S\w\W]*<\/
> tag1>/g))
> {
> print $1
>
> }
>
> In this second case, the match itself fails.
>
> Any help would be appreciated !


$/ = "";

while (<>) {
if ( m{<tag1>.*?<tag3>(\w+)</tag3>.*?</tag1>}gs )
{
print "$1\n";
}
}
 
Reply With Quote
 
Tad J McClellan
Guest
Posts: n/a
 
      12-28-2007
(E-Mail Removed) <(E-Mail Removed)> wrote:

> I have a set of XML files from which I need to extract some data. The
> format of the file is as follows :
>
><tag1>
> <tag3>DATA1</tag3>
></tag1>
>
><tag2>
> <tag3>DATA2</tag3>
></tag2>



I thought you said you had an XML file.

That is not a valid XML file...


> I need to extract the DATA part of the xml structure
>
> Note : tag3 can be contained either within tag1 or tag2, but I need to
> extract data only from tag1. i.e. DATA1 should be extracted, but not
> DATA2
>
> If I want to get both DATA1 and DATA2 I can use a simple regex like :



Using a regular expression to "parse" a non-regular language is
fraught with peril, and nearly always a Bad Idea.

Use a module that understands XML for processing XML data.


> Any help would be appreciated !



Assuming that you have actual valid XML in $xml, then:

use XML::Simple;

my $ref = XMLin($xml);
foreach my $child ( @{ $ref->{tag1} } ) {
print "$child->{tag3}\n";
}


--
Tad McClellan
email: perl -le "print scalar reverse qq/moc.noitatibaher\100cmdat/"
 
Reply With Quote
 
Michele Dondi
Guest
Posts: n/a
 
      12-28-2007
On Thu, 27 Dec 2007 12:59:12 -0800 (PST), (E-Mail Removed)
wrote:

>Subject: Regular Expression for XML Parsing


Nope. Perhaps a Regex for XML Parsing, in the Perl 6 acceptation of a
"Regex" which is not assumed to be a "Regular Expression" any more.
You will have to wait for quite a while, though...


Michele
--
{$_=pack'B8'x25,unpack'A8'x32,$a^=sub{pop^pop}->(map substr
(($a||=join'',map--$|x$_,(unpack'w',unpack'u','G^<R<Y]*YB='
..'KYU;*EVH[.FHF2W+#"\Z*5TI/ER<Z`S(G.DZZ9OX0Z')=~/./g)x2,$_,
256),7,249);s/[^\w,]/ /g;$ \=/^J/?$/:"\r";print,redo}#JAPH,
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Regular Expression Parsing In Java ArdGre Java 9 01-09-2007 04:06 AM
Different results parsing a XML file with XML::Simple (XML::Sax vs. XML::Parser) Erik Wasser Perl Misc 5 03-05-2006 10:09 PM
parsing XML using a regular expression Leif Wessman Perl Misc 6 09-09-2004 12:10 PM
perl-like regular expression parsing for C++ Bill Chiu C++ 4 09-12-2003 05:37 AM
Dynamically changing the regular expression of Regular Expression validator VSK ASP .Net 2 08-24-2003 02:47 PM



Advertisments