Velocity Reviews

Velocity Reviews (http://www.velocityreviews.com/forums/index.php)
-   Perl Misc (http://www.velocityreviews.com/forums/f67-perl-misc.html)
-   -   Regular Expression for XML Parsing (http://www.velocityreviews.com/forums/t905879-regular-expression-for-xml-parsing.html)

tushar.saxena@gmail.com 12-27-2007 08:59 PM

Regular Expression for XML Parsing
 
Hi,

I have a set of XML files from which I need to extract some data. The
format of the file is as follows :

<tag1>
<tag3>DATA1</tag3>
</tag1>

<tag2>
<tag3>DATA2</tag3>
</tag2>

I need to extract the DATA part of the xml structure

Note : tag3 can be contained either within tag1 or tag2, but I need to
extract data only from tag1. i.e. DATA1 should be extracted, but not
DATA2

If I want to get both DATA1 and DATA2 I can use a simple regex like :

if (($_ =~ /<tag3>(\w+)<\/tag3>/g))
{
print $1
}

But if I try to get only DATA1 (embedded within tag1) I try using
something like this, but am unable to get it to work

if (($_ =~ /<tag1>[\n\s\S\w\W]*<tag2>(\w+)<\/tag2>[\n\s\S\w\W]*<\/
tag1>/g))
{
print $1
}

In this second case, the match itself fails.

Any help would be appreciated !

Jürgen Exner 12-27-2007 10:49 PM

Re: Regular Expression for XML Parsing
 
On tushar.saxena@gmail.com wrote:
>I have a set of XML files
>I need to extract the DATA part of the xml structure
>If I want to get both DATA1 and DATA2 I can use a simple regex like :


It's a bad idea in the first place. XML is not a regular language, why would
you use regular expressions to parse it?

>Any help would be appreciated !


Use a tool that is designed to parse XML like e.g. any of the XML parser
modules on CPAN.

jue

patriknym@hotmail.com 12-27-2007 11:06 PM

Re: Regular Expression for XML Parsing
 
On 27 Dec, 20:59, tushar.sax...@gmail.com wrote:
> Hi,
>
> I have a set of XML files from which I need to extract some data. The
> format of the file is as follows :
>
> <tag1>
> <tag3>DATA1</tag3>
> </tag1>
>
> <tag2>
> <tag3>DATA2</tag3>
> </tag2>
>
> I need to extract the DATA part of the xml structure
>
> Note : tag3 can be contained either within tag1 or tag2, but I need to
> extract data only from tag1. i.e. DATA1 should be extracted, but not
> DATA2
>
> If I want to get both DATA1 and DATA2 I can use a simple regex like :
>
> if (($_ =~ /<tag3>(\w+)<\/tag3>/g))
> {
> print $1
>
> }
>
> But if I try to get only DATA1 (embedded within tag1) I try using
> something like this, but am unable to get it to work
>
> if (($_ =~ /<tag1>[\n\s\S\w\W]*<tag2>(\w+)<\/tag2>[\n\s\S\w\W]*<\/
> tag1>/g))
> {
> print $1
>
> }
>
> In this second case, the match itself fails.
>
> Any help would be appreciated !


$/ = "";

while (<>) {
if ( m{<tag1>.*?<tag3>(\w+)</tag3>.*?</tag1>}gs )
{
print "$1\n";
}
}

Tad J McClellan 12-28-2007 12:19 AM

Re: Regular Expression for XML Parsing
 
tushar.saxena@gmail.com <tushar.saxena@gmail.com> wrote:

> I have a set of XML files from which I need to extract some data. The
> format of the file is as follows :
>
><tag1>
> <tag3>DATA1</tag3>
></tag1>
>
><tag2>
> <tag3>DATA2</tag3>
></tag2>



I thought you said you had an XML file.

That is not a valid XML file...


> I need to extract the DATA part of the xml structure
>
> Note : tag3 can be contained either within tag1 or tag2, but I need to
> extract data only from tag1. i.e. DATA1 should be extracted, but not
> DATA2
>
> If I want to get both DATA1 and DATA2 I can use a simple regex like :



Using a regular expression to "parse" a non-regular language is
fraught with peril, and nearly always a Bad Idea.

Use a module that understands XML for processing XML data.


> Any help would be appreciated !



Assuming that you have actual valid XML in $xml, then:

use XML::Simple;

my $ref = XMLin($xml);
foreach my $child ( @{ $ref->{tag1} } ) {
print "$child->{tag3}\n";
}


--
Tad McClellan
email: perl -le "print scalar reverse qq/moc.noitatibaher\100cmdat/"

Michele Dondi 12-28-2007 12:35 PM

Re: Regular Expression for XML Parsing
 
On Thu, 27 Dec 2007 12:59:12 -0800 (PST), tushar.saxena@gmail.com
wrote:

>Subject: Regular Expression for XML Parsing


Nope. Perhaps a Regex for XML Parsing, in the Perl 6 acceptation of a
"Regex" which is not assumed to be a "Regular Expression" any more.
You will have to wait for quite a while, though...


Michele
--
{$_=pack'B8'x25,unpack'A8'x32,$a^=sub{pop^pop}->(map substr
(($a||=join'',map--$|x$_,(unpack'w',unpack'u','G^<R<Y]*YB='
..'KYU;*EVH[.FHF2W+#"\Z*5TI/ER<Z`S(G.DZZ9OX0Z')=~/./g)x2,$_,
256),7,249);s/[^\w,]/ /g;$ \=/^J/?$/:"\r";print,redo}#JAPH,


All times are GMT. The time now is 06:19 AM.

Powered by vBulletin®. Copyright ©2000 - 2014, vBulletin Solutions, Inc.
SEO by vBSEO ©2010, Crawlability, Inc.