Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Perl > Perl Misc > Get XML content using XML::Twig

Reply
Thread Tools

Get XML content using XML::Twig

 
 
alwaysonnet
Guest
Posts: n/a
 
      04-21-2010
Hello all,
I'm trying to parse the XML using XML::Twig Module as my XML could be
very large to handle using XML::Simple. Please help me out of how to
print the values based on the following...
<B>get the values of Sender, Receiver</B>
<B>get the FileType. In this case possible values are
InitTAP,FatalRAP,ReTxTAP</B>

<CODE>
get the values of Sender, Receiver
get the FileType. In this case possible values are
InitTAP,FatalRAP,ReTxTAP
</CODE>
<P>Here is the XML content....</P>
<CODE>
<?xml version="1.0" encoding="UTF-8"?>
<Data>
<ConnectionList>
<Connection>
<Sender>BRADD</Sender>
<Receiver>SHANE</Receiver>
<FileItemList>
<FileItem>
<FileID>378910</FileID>
<Tmstp>2009-01-16T16:59:07+01:00</Tmstp>
<FileType>
<InitTAP>
<TAPSeqNo>00083</TAPSeqNo>
<NotifFileInd>false</NotifFileInd>
<ChargeInfo>
<TAPTxCutoffTmstp>2009-01-16T09:43:26+02:00</
TAPTxCutoffTmstp>
<TAPAvailTmstp>2009-01-16T16:59:07+01:00</
TAPAvailTmstp>
<TAPCurrency>XDR</TAPCurrency>
<TotalNoOfCalls>39</TotalNoOfCalls>
<TotalNetCharge>11.470</TotalNetCharge>
<TotalTax>0.000</TotalTax>
</ChargeInfo>
</InitTAP>
</FileType>
</FileItem>
<FileItem>
<FileID>380582</FileID>
<Tmstp>2009-01-20T18:00:00+01:00</Tmstp>
<FileType>
<ReTxTAP>
<TAPSeqNo>00083</TAPSeqNo>
<NotifFileInd>false</NotifFileInd>
<RefRAPSeqNo>00044</RefRAPSeqNo>
<RefRAPID>380573</RefRAPID>
<ChargeInfo>
<TAPTxCutoffTmstp>2009-01-16T09:43:26+02:00</
TAPTxCutoffTmstp>
<TAPAvailTmstp>2009-01-20T18:00:00+01:00</
TAPAvailTmstp>
<TAPCurrency>XDR</TAPCurrency>
<TotalNoOfCalls>39</TotalNoOfCalls>
<TotalNetCharge>11.470</TotalNetCharge>
<TotalTax>0.000</TotalTax>
</ChargeInfo>
</ReTxTAP>
</FileType>
</FileItem>
<FileItem>
<FileID>380573</FileID>
<Tmstp>2009-01-16T20:34:45+01:00</Tmstp>
<FileType>
<FatalRAP>
<RAPSeqNo>00044</RAPSeqNo>
<RAPStatus>Exchanged</RAPStatus>
<RefTAPSeqNo>00083</RefTAPSeqNo>
<RefTAPID>378910</RefTAPID>
<RAPCreatTmstp>2009-01-16T20:21:30+01:00</
RAPCreatTmstp>
<RAPAvailTmstp>2009-01-16T20:21:30+01:00</
RAPAvailTmstp>
<ChargeInfo>
<TAPTxCutoffTmstp>2009-01-16T09:43:26+02:00</
TAPTxCutoffTmstp>
<TAPAvailTmstp>2009-01-16T16:59:07+01:00</
TAPAvailTmstp>
<TAPCurrency>XDR</TAPCurrency>
<TotalNoOfCalls>-39</TotalNoOfCalls>
<TotalNetCharge>-11.470</TotalNetCharge>
<TotalTax>0.000</TotalTax>
</ChargeInfo>
</FatalRAP>
</FileType>
</FileItem>
</FileItemList>
</Connection>
</ConnectionList>
</Data>
</CODE>
 
Reply With Quote
 
 
 
 
John Bokma
Guest
Posts: n/a
 
      04-21-2010
alwaysonnet <(E-Mail Removed)> writes:

> Hello all,
> I'm trying to parse the XML using XML::Twig Module as my XML could be
> very large to handle using XML::Simple. Please help me out of how to
> print the values based on the following...
> <B>get the values of Sender, Receiver</B>
> <B>get the FileType. In this case possible values are
> InitTAP,FatalRAP,ReTxTAP</B>


For very simple things like this I would (probably, based on what I just
read) use XML::SAX or (even) XML:arser. Regarding the latter,
http://johnbokma.com/perl/ has some simple examples under "XML
Processing using Perl"

--
John Bokma j3b

Hacking & Hiking in Mexico - http://johnbokma.com/
http://castleamber.com/ - Perl & Python Development
 
Reply With Quote
 
 
 
 
Klaus
Guest
Posts: n/a
 
      04-21-2010
On 21 avr, 14:35, alwaysonnet <(E-Mail Removed)> wrote:
> Hello all,
> I'm trying to parse the XML using XML::Twig Module as my XML could be
> very large to handle using XML::Simple. Please help me out of how to
> print the values based on the following...
> *<B>get the values of Sender, Receiver</B>
> *<B>get the FileType. In this case possible values are
> InitTAP,FatalRAP,ReTxTAP</B>
>
> <CODE>
> *get the values of Sender, Receiver
> *get the FileType. In this case possible values are
> InitTAP,FatalRAP,ReTxTAP
> </CODE>


What Tad McClellan and John Bokma suggested should be your first path
of investigation.

However, let me bring in a shameless plug:

You could also use my module XML::Reader
http://search.cpan.org/~keichner/XML.../XML/Reader.pm

This module is specifically designed to handle very big XML files, it
only uses the memory it needs to have one XML element at a time in
memory (plus a small additional memory for buffering, which is
independent of the size of the XML file)

Here is a sample program:

use strict;
use warnings;
use XML::Reader;

my $rdr = XML::Reader->newhd(\*DATA, {filter => 5},
{ root => '/Data/ConnectionList/Connection/Sender', branch =>
[ '/' ] },
{ root => '/Data/ConnectionList/Connection/Receiver', branch =>
[ '/' ] },
{ root => '/Data/ConnectionList/Connection/FileItemList/FileItem/
FileType', branch => [
'/InitTAP/TAPSeqNo',
'/ReTxTAP/TAPSeqNo',
'/FatalRAP/RAPSeqNo',
] },
);

my ($sender, $receiver);

while ($rdr->iterate) {
if ($rdr->rx == 0) { $sender = $rdr->rvalue->[0]; }
elsif ($rdr->rx == 1) { $receiver = $rdr->rvalue->[0]; }
else {
my ($InitTAP, $ReTxTAP, $FatalRAP) = @{$rdr->rvalue};
my ($type, $seqno) = defined $InitTAP ? ('InitTAP',
$InitTAP)
: defined $ReTxTAP ? ('ReTxTAP',
$ReTxTAP)
: defined $FatalRAP ? ('FatalRAP',
$FatalRAP)
: ('???', '???');

printf "Sender: %-5s, Receiver: %-5s, Type: %-8s, Seqno: %s
\n",
$sender, $receiver, $type, $seqno;
}
}

__DATA__
<?xml version="1.0" encoding="UTF-8"?>
<Data>
<ConnectionList>
<Connection>
<Sender>BRADD</Sender>
<Receiver>SHANE</Receiver>
<FileItemList>
<FileItem>
<FileID>378910</FileID>
<Tmstp>2009-01-16T16:59:07+01:00</Tmstp>
<FileType>
<InitTAP>
<TAPSeqNo>00083</TAPSeqNo>
<NotifFileInd>false</NotifFileInd>
<ChargeInfo>
<TAPTxCutoffTmstp>2009-01-16T09:43:26+02:00</
TAPTxCutoffTmstp>
<TAPAvailTmstp>2009-01-16T16:59:07+01:00</
TAPAvailTmstp>
<TAPCurrency>XDR</TAPCurrency>
<TotalNoOfCalls>39</TotalNoOfCalls>
<TotalNetCharge>11.470</TotalNetCharge>
<TotalTax>0.000</TotalTax>
</ChargeInfo>
</InitTAP>
</FileType>
</FileItem>
<FileItem>
<FileID>380582</FileID>
<Tmstp>2009-01-20T18:00:00+01:00</Tmstp>
<FileType>
<ReTxTAP>
<TAPSeqNo>00083</TAPSeqNo>
<NotifFileInd>false</NotifFileInd>
<RefRAPSeqNo>00044</RefRAPSeqNo>
<RefRAPID>380573</RefRAPID>
<ChargeInfo>
<TAPTxCutoffTmstp>2009-01-16T09:43:26+02:00</
TAPTxCutoffTmstp>
<TAPAvailTmstp>2009-01-20T18:00:00+01:00</
TAPAvailTmstp>
<TAPCurrency>XDR</TAPCurrency>
<TotalNoOfCalls>39</TotalNoOfCalls>
<TotalNetCharge>11.470</TotalNetCharge>
<TotalTax>0.000</TotalTax>
</ChargeInfo>
</ReTxTAP>
</FileType>
</FileItem>
<FileItem>
<FileID>380573</FileID>
<Tmstp>2009-01-16T20:34:45+01:00</Tmstp>
<FileType>
<FatalRAP>
<RAPSeqNo>00044</RAPSeqNo>
<RAPStatus>Exchanged</RAPStatus>
<RefTAPSeqNo>00083</RefTAPSeqNo>
<RefTAPID>378910</RefTAPID>
<RAPCreatTmstp>2009-01-16T20:21:30+01:00</
RAPCreatTmstp>
<RAPAvailTmstp>2009-01-16T20:21:30+01:00</
RAPAvailTmstp>
<ChargeInfo>
<TAPTxCutoffTmstp>2009-01-16T09:43:26+02:00</
TAPTxCutoffTmstp>
<TAPAvailTmstp>2009-01-16T16:59:07+01:00</
TAPAvailTmstp>
<TAPCurrency>XDR</TAPCurrency>
<TotalNoOfCalls>-39</TotalNoOfCalls>
<TotalNetCharge>-11.470</TotalNetCharge>
<TotalTax>0.000</TotalTax>
</ChargeInfo>
</FatalRAP>
</FileType>
</FileItem>
</FileItemList>
</Connection>
</ConnectionList>
</Data>

=======
Here is the output:

Sender: BRADD, Receiver: SHANE, Type: InitTAP , Seqno: 00083
Sender: BRADD, Receiver: SHANE, Type: ReTxTAP , Seqno: 00083
Sender: BRADD, Receiver: SHANE, Type: FatalRAP, Seqno: 00044
 
Reply With Quote
 
sln@netherlands.com
Guest
Posts: n/a
 
      04-21-2010
On Wed, 21 Apr 2010 10:06:14 -0700 (PDT), Klaus <(E-Mail Removed)> wrote:

>On 21 avr, 14:35, alwaysonnet <(E-Mail Removed)> wrote:
>> Hello all,
>> I'm trying to parse the XML using XML::Twig Module as my XML could be
>> very large to handle using XML::Simple. Please help me out of how to
>> print the values based on the following...
>> *<B>get the values of Sender, Receiver</B>
>> *<B>get the FileType. In this case possible values are
>> InitTAP,FatalRAP,ReTxTAP</B>
>>
>> <CODE>
>> *get the values of Sender, Receiver
>> *get the FileType. In this case possible values are
>> InitTAP,FatalRAP,ReTxTAP
>> </CODE>

>
>What Tad McClellan and John Bokma suggested should be your first path
>of investigation.
>
>However, let me bring in a shameless plug:
>
>You could also use my module XML::Reader
>http://search.cpan.org/~keichner/XML.../XML/Reader.pm

Indeed shameless.
>
>This module is specifically designed to handle very big XML files, it
>only uses the memory it needs to have one XML element at a time in
>memory (plus a small additional memory for buffering, which is
>independent of the size of the XML file)

Is memory at a premium?
>
>Here is a sample program:
>
>use strict;
>use warnings;
>use XML::Reader;
>
>my $rdr = XML::Reader->newhd(\*DATA, {filter => 5},
> { root => '/Data/ConnectionList/Connection/Sender', branch =>
>[ '/' ] },
> { root => '/Data/ConnectionList/Connection/Receiver', branch =>
>[ '/' ] },
> { root => '/Data/ConnectionList/Connection/FileItemList/FileItem/
>FileType', branch => [
> '/InitTAP/TAPSeqNo',
> '/ReTxTAP/TAPSeqNo',
> '/FatalRAP/RAPSeqNo',

^^^^^^^^^^^^
What do these have to do with it?
> ] },
> );
>
>my ($sender, $receiver);
>
>while ($rdr->iterate) {
> if ($rdr->rx == 0) { $sender = $rdr->rvalue->[0]; }
> elsif ($rdr->rx == 1) { $receiver = $rdr->rvalue->[0]; }
> else {
> my ($InitTAP, $ReTxTAP, $FatalRAP) = @{$rdr->rvalue};

^^^^^^^^^^^^^^^^^^^^^^^^^^^
Again, what do these have to do with it?
[snip]
>=======
>Here is the output:
>
>Sender: BRADD, Receiver: SHANE, Type: InitTAP , Seqno: 00083
>Sender: BRADD, Receiver: SHANE, Type: ReTxTAP , Seqno: 00083
>Sender: BRADD, Receiver: SHANE, Type: FatalRAP, Seqno: 00044


Thats nice. Lets say he generally said "in this case its:"
InitTAP ReTxTAP FatalRAP
Why? Because its the file type.
Maybe he wants all file types of the sender/reciever's.
But its hard to know what the OP wants isin't it.

-sln
 
Reply With Quote
 
Klaus
Guest
Posts: n/a
 
      04-21-2010
On 21 avr, 20:07, (E-Mail Removed) wrote:
> On Wed, 21 Apr 2010 10:06:14 -0700 (PDT), Klaus <(E-Mail Removed)> wrote:
> >On 21 avr, 14:35, alwaysonnet <(E-Mail Removed)> wrote:
> >> Hello all,
> >> I'm trying to parse the XML using XML::Twig Module as my XML could be
> >> very large to handle using XML::Simple. Please help me out of how to
> >> print the values based on the following...
> >> *<B>get the values of Sender, Receiver</B>
> >> *<B>get the FileType. In this case possible values are
> >> InitTAP,FatalRAP,ReTxTAP</B>


> Thats nice. Lets say he generally said "in this case its:"
> InitTAP *ReTxTAP *FatalRAP
> Why? Because its the file type.
> Maybe he wants all file types of the sender/reciever's.


in that case you use XML::Reader->newhd(... {filter => 2});

use strict;
use warnings;
use XML::Reader;

my $rdr = XML::Reader->newhd(\*DATA, {filter => 2});

my ($sender, $receiver);

while ($rdr->iterate) {
if ($rdr->path eq '/Data/ConnectionList/Connection/Sender') {
$sender = $rdr->value;
}
elsif ($rdr->path eq '/Data/ConnectionList/Connection/Receiver') {
$receiver = $rdr->value;
}
elsif ($rdr->is_start
and $rdr->path =~ m{\A /Data/ConnectionList/Connection/
FileItemList/FileItem/FileType/ (\w+) \z}xms) {
printf "Sender: %-5s, Receiver: %-5s, Type: %s\n",
$sender, $receiver, $1;
}
}

Here is the output

Sender: BRADD, Receiver: SHANE, Type: InitTAP
Sender: BRADD, Receiver: SHANE, Type: ReTxTAP
Sender: BRADD, Receiver: SHANE, Type: FatalRAP
 
Reply With Quote
 
sln@netherlands.com
Guest
Posts: n/a
 
      04-22-2010
On Wed, 21 Apr 2010 11:48:59 -0700 (PDT), Klaus <(E-Mail Removed)> wrote:

>On 21 avr, 20:07, (E-Mail Removed) wrote:
>> On Wed, 21 Apr 2010 10:06:14 -0700 (PDT), Klaus <(E-Mail Removed)> wrote:
>> >On 21 avr, 14:35, alwaysonnet <(E-Mail Removed)> wrote:
>> >> Hello all,
>> >> I'm trying to parse the XML using XML::Twig Module as my XML could be
>> >> very large to handle using XML::Simple. Please help me out of how to
>> >> print the values based on the following...
>> >> *<B>get the values of Sender, Receiver</B>
>> >> *<B>get the FileType. In this case possible values are
>> >> InitTAP,FatalRAP,ReTxTAP</B>

>
>> Thats nice. Lets say he generally said "in this case its:"
>> InitTAP *ReTxTAP *FatalRAP
>> Why? Because its the file type.
>> Maybe he wants all file types of the sender/reciever's.

>
>in that case you use XML::Reader->newhd(... {filter => 2});
>
>use strict;
>use warnings;
>use XML::Reader;
>
>my $rdr = XML::Reader->newhd(\*DATA, {filter => 2});
>
>my ($sender, $receiver);
>
>while ($rdr->iterate) {
> if ($rdr->path eq '/Data/ConnectionList/Connection/Sender') {
> $sender = $rdr->value;
> }
> elsif ($rdr->path eq '/Data/ConnectionList/Connection/Receiver') {
> $receiver = $rdr->value;
> }
> elsif ($rdr->is_start
> and $rdr->path =~ m{\A /Data/ConnectionList/Connection/
>FileItemList/FileItem/FileType/ (\w+) \z}xms) {
> printf "Sender: %-5s, Receiver: %-5s, Type: %s\n",
> $sender, $receiver, $1;
> }
>}
>
>Here is the output
>
>Sender: BRADD, Receiver: SHANE, Type: InitTAP
>Sender: BRADD, Receiver: SHANE, Type: ReTxTAP
>Sender: BRADD, Receiver: SHANE, Type: FatalRAP


This is pretty good. I assume it does attribute/value as well.
It appears to be a lot of regex work, the more unknown the
elements become, but thats a tree stack.

It would be good though to have a capture mechanism, where
xml capture can be triggered on/off by the user, later to
be regurgitated to the user (on demand), and given to an
xml::simple style mechanism to turn it into filtered records.

It wouldn't change the simple, low memmory stream parsing at all,
just the source would be captured (appended) on/off to a named buffer,
on demand.

Its not as easy as it seems though. CaptureON/OFF (bufname, before/after),
nested capture's, single data pool. I think I've done this before.

-sln
 
Reply With Quote
 
Klaus
Guest
Posts: n/a
 
      04-22-2010
On 22 avr, 02:31, (E-Mail Removed) wrote:
> On Wed, 21 Apr 2010 11:48:59 -0700 (PDT), Klaus <(E-Mail Removed)> wrote:
> >On 21 avr, 20:07, (E-Mail Removed) wrote:
> >> On Wed, 21 Apr 2010 10:06:14 -0700 (PDT), Klaus <(E-Mail Removed)> wrote:
> >> >On 21 avr, 14:35, alwaysonnet <(E-Mail Removed)> wrote:
> >> >> Hello all,
> >> >> I'm trying to parse the XML using XML::Twig Module as my XML could be
> >> >> very large to handle using XML::Simple. Please help me out of how to
> >> >> print the values based on the following...
> >> >> *<B>get the values of Sender, Receiver</B>
> >> >> *<B>get the FileType. In this case possible values are
> >> >> InitTAP,FatalRAP,ReTxTAP</B>


> This is pretty good. I assume it does attribute/value as well.


Yes it does, just put an '@' symbol in the path, for example
'/InitTAP/ChargeInfo/@attrib1'

> It appears to be a lot of regex work, the more unknown the
> elements become, but thats a tree stack.
>
> It would be good though to have a capture mechanism, where
> xml capture can be triggered on/off by the user, later to
> be regurgitated to the user (on demand), and given to an
> xml::simple style mechanism to turn it into filtered records.


For simple structures where you know exactly what you are looking for,
you can use {filter => 5} like so

use strict;
use warnings;
use XML::Reader;

use Data:umper;

my $rdr = XML::Reader->newhd(\*DATA, {filter => 5},
{ root => '/Data/ConnectionList/Connection/FileItemList/FileItem/
FileType', branch => [
'/InitTAP/TAPSeqNo',
'/ReTxTAP/TAPSeqNo',
'/FatalRAP/RAPSeqNo',
'/InitTAP/ChargeInfo/@attrib1',
'/InitTAP/ChargeInfo/TAPCurrency',
'/ReTxTAP/ChargeInfo/TAPCurrency',
'/FatalRAP/ChargeInfo/TAPCurrency',
] },
);

while ($rdr->iterate) {
print Dumper($rdr->rvalue), "\n";
}

> It wouldn't change the simple, low memmory stream parsing at all,
> just the source would be captured (appended) on/off to a named buffer,
> on demand.
> Its not as easy as it seems though. CaptureON/OFF (bufname, before/after),
> nested capture's, single data pool. I think I've done this before.


For general capture into a buffer, you would use {filter => 3, using
=> '/Data/ConnectionList/Connection/FileItemList/FileItem/FileType'}

use strict;
use warnings;
use XML::Reader;

my $rdr = XML::Reader->newhd(\*DATA, {filter => 3,
using => '/Data/ConnectionList/Connection/FileItemList/FileItem/
FileType'});

my $buffer = '';

while ($rdr->iterate) {
my $indentation = ' ' x ($rdr->level - 1);

if ($rdr->path eq '/') {
if ($rdr->is_start) {
$buffer = '';
}
elsif ($rdr->is_end) {
print "\n\n buffer ==>\n", $buffer, "\n\n";
}
next;
}

if ($rdr->is_start) {
$buffer .= $indentation.'<'.$rdr->tag.
join('', map{" $_='".$rdr->att_hash->{$_}."'"} sort keys %
{$rdr->att_hash}).
'>'."\n";
}

if ($rdr->type eq 'T' and $rdr->value ne '') {
$buffer .= $indentation.' '.$rdr->value."\n";
}

if ($rdr->is_end) {
$buffer .= $indentation.'</'.$rdr->tag.'>'."\n";
}
}
 
Reply With Quote
 
alwaysonnet
Guest
Posts: n/a
 
      04-22-2010
On Apr 22, 12:39*pm, Klaus <(E-Mail Removed)> wrote:
> On 22 avr, 02:31, (E-Mail Removed) wrote:
>
> > On Wed, 21 Apr 2010 11:48:59 -0700 (PDT), Klaus <(E-Mail Removed)> wrote:
> > >On 21 avr, 20:07, (E-Mail Removed) wrote:
> > >> On Wed, 21 Apr 2010 10:06:14 -0700 (PDT), Klaus <(E-Mail Removed)> wrote:
> > >> >On 21 avr, 14:35, alwaysonnet <(E-Mail Removed)> wrote:
> > >> >> Hello all,
> > >> >> I'm trying to parse the XML using XML::Twig Module as my XML could be
> > >> >> very large to handle using XML::Simple. Please help me out of howto
> > >> >> print the values based on the following...
> > >> >> *<B>get the values of Sender, Receiver</B>
> > >> >> *<B>get the FileType. In this case possible values are
> > >> >> InitTAP,FatalRAP,ReTxTAP</B>

> > This is pretty good. I assume it does attribute/value as well.

>
> Yes it does, just put an '@' symbol in the path, for example
> '/InitTAP/ChargeInfo/@attrib1'
>
> > It appears to be a lot of regex work, the more unknown the
> > elements become, but thats a tree stack.

>
> > It would be good though to have a capture mechanism, where
> > xml capture can be triggered on/off by the user, later to
> > be regurgitated to the user (on demand), and given to an
> > xml::simple style mechanism to turn it into filtered records.

>
> For simple structures where you know exactly what you are looking for,
> you can use {filter => 5} like so
>
> use strict;
> use warnings;
> use XML::Reader;
>
> use Data:umper;
>
> my $rdr = XML::Reader->newhd(\*DATA, {filter => 5},
> * * { root => '/Data/ConnectionList/Connection/FileItemList/FileItem/
> FileType', branch => [
> * * * '/InitTAP/TAPSeqNo',
> * * * '/ReTxTAP/TAPSeqNo',
> * * * '/FatalRAP/RAPSeqNo',
> * * * '/InitTAP/ChargeInfo/@attrib1',
> * * * '/InitTAP/ChargeInfo/TAPCurrency',
> * * * '/ReTxTAP/ChargeInfo/TAPCurrency',
> * * * '/FatalRAP/ChargeInfo/TAPCurrency',
> * * ] },
> * );
>
> while ($rdr->iterate) {
> * * print Dumper($rdr->rvalue), "\n";
>
> }
> > It wouldn't change the simple, low memmory stream parsing at all,
> > just the source would be captured (appended) on/off to a named buffer,
> > on demand.
> > Its not as easy as it seems though. CaptureON/OFF (bufname, before/after),
> > nested capture's, single data pool. I think I've done this before.

>
> For general capture into a buffer, you would use {filter => 3, using
> => '/Data/ConnectionList/Connection/FileItemList/FileItem/FileType'}
>
> use strict;
> use warnings;
> use XML::Reader;
>
> my $rdr = XML::Reader->newhd(\*DATA, {filter => 3,
> * * using => '/Data/ConnectionList/Connection/FileItemList/FileItem/
> FileType'});
>
> my $buffer = '';
>
> while ($rdr->iterate) {
> * * my $indentation = ' *' x ($rdr->level - 1);
>
> * * if ($rdr->path eq '/') {
> * * * * if ($rdr->is_start) {
> * * * * * * $buffer = '';
> * * * * }
> * * * * elsif ($rdr->is_end) {
> * * * * * * print "\n\n buffer ==>\n", $buffer, "\n\n";
> * * * * }
> * * * * next;
> * * }
>
> * * if ($rdr->is_start) {
> * * * * $buffer .= $indentation.'<'.$rdr->tag.
> * * * * * join('', map{" $_='".$rdr->att_hash->{$_}."'"} sortkeys %
> {$rdr->att_hash}).
> * * * * * '>'."\n";
> * * }
>
> * * if ($rdr->type eq 'T' and $rdr->value ne '') {
> * * * * $buffer .= $indentation.' *'.$rdr->value."\n";
> * * }
>
> * * if ($rdr->is_end) {
> * * * * $buffer .= $indentation.'</'.$rdr->tag.'>'."\n";
> * * }
>
> }
>
>


My intention is to ~

- Get each sender and receiver
- Get the filetype ( could be InitTAP, FatalRAP etc )
- For each of filetype get the TAPSeqNo, NoofCalls etc....

Basically I want all the information in place for processing the
data....

Also, apart from XML::Twig, is there any module which can handle
larger XML files..

any help or suggestions are appreciated.


 
Reply With Quote
 
Klaus
Guest
Posts: n/a
 
      04-22-2010
On 21 avr, 14:35, alwaysonnet <(E-Mail Removed)> wrote:
> Hello all,
> I'm trying to parse the XML using XML::Twig Module as my XML could be
> very large to handle using XML::Simple.


Klaus <(E-Mail Removed)> wrote:
> However, let me bring in a shameless plug:
> You could also use my module XML::Reader
> http://search.cpan.org/~keichner/XML.../XML/Reader.pm


(E-Mail Removed) wrote:
> > Indeed shameless.
> >
> > [...]
> >
> > It would be good though to have a capture mechanism, where
> > xml capture can be triggered on/off by the user, later to
> > be regurgitated to the user (on demand), and given to an
> > xml::simple style mechanism to turn it into filtered records.


Here is an example of how to use XML::Reader to capture sub-trees from
a (potentially very big) XML file into a buffer and pass that buffer
to XML::Simple:

use strict;
use warnings;
use XML::Reader;

my $rdr = XML::Reader->newhd(\*DATA, {filter => 3,
using => '/Data/ConnectionList/Connection/FileItemList/FileItem/
FileType'});

my $buffer = '';

while ($rdr->iterate) {

if ($rdr->path eq '/') {
if ($rdr->is_start) {
$buffer = qq{<?xml version="1.0" encoding="UTF-8"?
><FileType>};

}
if ($rdr->is_end) {
$buffer .= qq{</FileType>};

use XML::Simple;
use Data:umper;

my $ref = XMLin($buffer);
print Dumper($ref), "\n\n";
}
next;
}

if ($rdr->is_start) {
$buffer .= '<'.$rdr->tag.
join('', map{" $_='".$rdr->att_hash->{$_}."'"} sort keys %
{$rdr->att_hash}).
'>';
}

if ($rdr->type eq 'T' and $rdr->value ne '') {
$buffer .= $rdr->value;
}

if ($rdr->is_end) {
$buffer .= '</'.$rdr->tag.'>';
}
}
 
Reply With Quote
 
Klaus
Guest
Posts: n/a
 
      04-22-2010
On 21 avr, 14:35, alwaysonnet <(E-Mail Removed)> wrote:
> Hello all,
> I'm trying to parse the XML using XML::Twig Module as my XML could be
> very large to handle using XML::Simple.


On Wed, 21 Apr 2010 10:06:14, Klaus <(E-Mail Removed)> wrote:
> What Tad McClellan and John Bokma suggested should be your first
> path of investigation.
> However, let me bring in a shameless plug:
> You could also use my module XML::Reader
> http://search.cpan.org/~keichner/XML.../XML/Reader.pm


On 21 avr, 20:07, (E-Mail Removed) wrote:
> Indeed shameless.


On 22 avr, 10:24, alwaysonnet <(E-Mail Removed)> wrote:
> My intention is to ~
> - Get each sender and receiver
> - Get the filetype ( could be InitTAP, FatalRAP etc )
> - For each of filetype get the TAPSeqNo, NoofCalls etc....
>
> Basically I want all the information in place for processing the
> data....
>
> Also, apart from XML::Twig, is there any module which can handle
> larger XML files..


As I said before, take the advice of Tad McClellan and John Bokma
first.

If, for whatever reason, you can't follow their advice, (and, for
whatever reason, you can't use XML::Twig either) there is always my
"shameless plug" XML::Reader:

There are, in my opinion, two scenarios:

Scenario 1:
You already know how to parse your XML with XML::Simple, but the XML
file is too big to fit entirely into memory.
In that case, I suggest you follow my example (with XML::Reader) that
I gave in this thread today (where I said: "...Here is an example of
how to use XML::Reader to capture sub-trees...)
see http://groups.google.com/group/comp....b3a769d96c1b2e

Scenario 2:
You know the general rules of your XML parsing, but you don't know
which XML module to use (and you can't follow the advice from Tad
McClellan and from John Bokma).
In that case I suggest you follow my example (with XML::Reader) that I
gave in this thread yesterday (where I said: "...use XML::Reader-
>newhd(... {filter => 2})...")

see http://groups.google.com/group/comp....2534f342f939e6
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
error: Only Content controls are allowed directly in a content page that contains Content controls. hazz ASP .Net 6 06-09-2010 01:54 PM
Can I read String (XML content) rather XML file using SAX parser Sanjeev Java 4 05-04-2008 10:59 PM
Get content in a xml element using hpricot Bonita Ruby 3 04-13-2007 10:50 AM
[XML Schema] Content type of complex type definition with complex content Stanimir Stamenkov XML 2 10-25-2005 10:16 AM
get textual content of a Xml element using 4DOM frankabel Python 4 03-06-2005 08:21 AM



Advertisments