Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Perl > Perl Misc > How to import only part of a large XML file?

Reply
Thread Tools

How to import only part of a large XML file?

 
 
Dwight Army of Champions
Guest
Posts: n/a
 
      11-11-2011
I have a very large XML file that I want to load, but I don't want to
necessarily load the entire document; that takes too long. What I want
to do instead is only key/value pairs that meet certain criteria, like
only grab entries whose value fall within a certain date for a key
date_of_entry. Can I just use XML::Simple for this or do I need a
better module?
 
Reply With Quote
 
 
 
 
Dwight Army of Champions
Guest
Posts: n/a
 
      11-11-2011
On Nov 11, 5:59*pm, Ben Morrow <b...@morrow.me.uk> wrote:
> Quoth Dwight Army of Champions <dwightarmyofchampi...@hotmail.com>:
>
> > I have a very large XML file that I want to load, but I don't want to
> > necessarily load the entire document; that takes too long. What I want
> > to do instead is only key/value pairs that meet certain criteria, like
> > only grab entries whose value fall within a certain date for a key
> > date_of_entry. Can I just use XML::Simple for this or do I need a
> > better module?

>
> It sounds like you want either XML::Twig or one of the SAX modules.
> XML::Simple, at least in non-SAX mode, will load the entire document
> into a tree structure before letting you see any of it.
>
> Ben


I'm glancing at XML::Twig on search.cpan.org, What methods can I use
to accomplish these tasks? I don't see any kind of "filter" method...

 
Reply With Quote
 
 
 
 
Bjoern Hoehrmann
Guest
Posts: n/a
 
      11-11-2011
* Dwight Army of Champions wrote in comp.lang.perl.misc:
>I have a very large XML file that I want to load, but I don't want to
>necessarily load the entire document; that takes too long. What I want
>to do instead is only key/value pairs that meet certain criteria, like
>only grab entries whose value fall within a certain date for a key
>date_of_entry. Can I just use XML::Simple for this or do I need a
>better module?


It depends on what you mean by "key/value pairs". If you want to filter
elements based on attributes, and don't particularily need to look at
child elements, then the SAX modules are likely a good fit, they report
events like "start of element plus attributes" and "end of element" and
you have to manage state between the events. Generally, this should help
<http://perl-xml.sourceforge.net/faq/#parser_selection>, and if you have
special needs, Perl- is likely to give the
best advice.
--
Björn Höhrmann · private.php?do=newpm&u= · http://bjoern.hoehrmann.de
Am Badedeich 7 · Telefon: +49(0)160/4415681 · http://www.bjoernsworld.de
25899 Dagebüll · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/
 
Reply With Quote
 
Dwight Army of Champions
Guest
Posts: n/a
 
      11-12-2011
On Nov 11, 6:26*pm, Bjoern Hoehrmann <bjo...@hoehrmann.de> wrote:
> * Dwight Army of Champions wrote in comp.lang.perl.misc:
>
> >I have a very large XML file that I want to load, but I don't want to
> >necessarily load the entire document; that takes too long. What I want
> >to do instead is only key/value pairs that meet certain criteria, like
> >only grab entries whose value fall within a certain date for a key
> >date_of_entry. Can I just use XML::Simple for this or do I need a
> >better module?

>
> It depends on what you mean by "key/value pairs". If you want to filter
> elements based on attributes, and don't particularily need to look at
> child elements, then the SAX modules are likely a good fit, they report
> events like "start of element plus attributes" and "end of element" and
> you have to manage state between the events. Generally, this should help
> <http://perl-xml.sourceforge.net/faq/#parser_selection>, and if you have
> special needs, Perl-...@listserv.ActiveState.com is likely to give the
> best advice.
> --
> Björn Höhrmann · mailto:bjo...@hoehrmann.de ·http://bjoern.hoehrmann.de
> Am Badedeich 7 · Telefon: +49(0)160/4415681 ·http://www.bjoernsworld.de
> 25899 Dagebüll · PGP Pub. KeyID: 0xA4357E78 ·http://www.websitedev.de/


For example, suppose I have the following XML input file:

<?xml version="1.0"?>
<library>
<book>
<title>Dreamcatcher</title>
<author>Stephen King</author>
<genre>Horror</genre>
<pages>899</pages>
<price>23.99</price>
<rating>5</rating>
<publication_date>11/27/2001</publication_date>
</book>
<book>
<title>Mystic River</title>
<author>Dennis Lehane</author>
<genre>Thriller</genre>
<pages>390</pages>
<price>17.49</price>
<rating>4</rating>
<publication_date>07/22/2003</publication_date>
</book>
<book>
<title>The Lord Of The Rings</title>
<author>J. R. R. Tolkien</author>
<genre>Fantasy</genre>
<pages>3489</pages>
<price>10.99</price>
<rating>5</rating>
<publication_date>10/12/2005</publication_date>
</book>
</library>


Suppose I only want to import books that were published after January
1, 2002. If I apply such a filter when I do my initial import, the
result should look like this:

$VAR1 = {
'book' => [
{
'publication_date' => '07/22/2003',
'price' => '17.49',
'author' => 'Dennis Lehane',
'title' => 'Mystic River',
'rating' => '4',
'pages' => '390',
'genre' => 'Thriller'
},
{
'publication_date' => '10/12/2005',
'price' => '10.99',
'author' => 'J. R. R. Tolkien',
'title' => 'The Lord Of The Rings',
'rating' => '5',
'pages' => '3489',
'genre' => 'Fantasy'
}
]
};

The import will completely ignore entries that don't meet the
specified criteria (in this case, publication_date >= '1/1/2002').
 
Reply With Quote
 
Bjoern Hoehrmann
Guest
Posts: n/a
 
      11-12-2011
* Dwight Army of Champions wrote in comp.lang.perl.misc:
>For example, suppose I have the following XML input file:


>Suppose I only want to import books that were published after January
>1, 2002. If I apply such a filter when I do my initial import, the
>result should look like this:


One way to do this would be with a SAX filter: you look for "book"
elements, store all events until you can decider whether you are
interested in this branch, and then re-emit or discard the events.
You can then use some module that turns the SAX stream into some
more Perl-ish data structure. There are some libraries that allow
you to filter in this fashion automatically ("xpath filtering"),
but I am not sure which, if any, modules for Perl do this for you.

Note that size is quite important here, with 100 MB you might just
suffer "too long" but with 5 GB you might suffer "impossible" for
some possible solutions. Some "reader"-style APIs allow you to go
to a "book" element, read everything up to the end of the element
into some DOM-style representation, and then make it easy to check
if you are interested in this branch as you have DOM-style access,
but only to the interesting part, so you save memory. Similar to
the SAX filter solution, except that you trade some memory and per-
haps speed for ease of programming.
--
Björn Höhrmann · private.php?do=newpm&u= · http://bjoern.hoehrmann.de
Am Badedeich 7 · Telefon: +49(0)160/4415681 · http://www.bjoernsworld.de
25899 Dagebüll · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/
 
Reply With Quote
 
Klaus
Guest
Posts: n/a
 
      11-12-2011
On 12 nov, 01:11, Dwight Army of Champions
<dwightarmyofchampi...@hotmail.com> wrote:
> Suppose I only want to import books that were published after January
> 1, 2002. If I apply such a filter when I do my initial import, the
> result should look like this:
>
> $VAR1 = {
> * * * * * 'book' => [
> * * * * * * * * * * {
> * * * * * * * * * * * 'publication_date' => '07/22/2003',
> * * * * * * * * * * * 'price' => '17.49',
> * * * * * * * * * * * 'author' => 'Dennis Lehane',
> * * * * * * * * * * * 'title' => 'Mystic River',
> * * * * * * * * * * * 'rating' => '4',
> * * * * * * * * * * * 'pages' => '390',
> * * * * * * * * * * * 'genre' => 'Thriller'
> * * * * * * * * * * },
> * * * * * * * * * * {
> * * * * * * * * * * * 'publication_date' => '10/12/2005',
> * * * * * * * * * * * 'price' => '10.99',
> * * * * * * * * * * * 'author' => 'J. R. R. Tolkien',
> * * * * * * * * * * * 'title' => 'The Lord Of TheRings',
> * * * * * * * * * * * 'rating' => '5',
> * * * * * * * * * * * 'pages' => '3489',
> * * * * * * * * * * * 'genre' => 'Fantasy'
> * * * * * * * * * * }
> * * * * * * * * * ]
> * * * * };


That's a perfect Job for XML::Reader

use strict;
use warnings;

use XML::Reader;
use XML::Simple;
use Data:umper;

my $huge_xml =
q{<?xml version="1.0"?>
<library>
<book>
<title>Dreamcatcher</title>
<author>Stephen King</author>
<genre>Horror</genre>
<pages>899</pages>
<price>23.99</price>
<rating>5</rating>
<publication_date>11/27/2001</publication_date>
</book>
<book>
<title>Mystic River</title>
<author>Dennis Lehane</author>
<genre>Thriller</genre>
<pages>390</pages>
<price>17.49</price>
<rating>4</rating>
<publication_date>07/22/2003</publication_date>
</book>
<book>
<title>The Lord Of The Rings</title>
<author>J. R. R. Tolkien</author>
<genre>Fantasy</genre>
<pages>3489</pages>
<price>10.99</price>
<rating>5</rating>
<publication_date>10/12/2005</publication_date>
</book>
</library>
};

my $selected = { book => [] };

my $rdr = XML::Reader->new(\$huge_xml, {mode => 'branches'},
{ root => '/library/book', branch => '*' });

while ($rdr->iterate) {
my $small_ref = XMLin($rdr->rvalue);

my ($day, $month, $year) =
$small_ref->{'publication_date'} =~
m{\A (\d+) / (\d+) / (\d+) \z}xms;

unless (defined $day) { $day = 0; }
unless (defined $month) { $month = 0; }
unless (defined $year) { $year = 0; }

my $date = sprintf('%04d-%02d-%02d', $year, $month, $day);

if ($date ge '2002-01-01') {
push @{$selected->{book}}, $small_ref;
}
}
print Dumper($selected);

> The import will completely ignore entries that don't meet the
> specified criteria (in this case, publication_date >= '1/1/2002').


Yes, the way it works is that XML::Reader reads from a huge XML only
small chunks (via $rdr->rvalue) (a small chunk being the '<book>...</
book> part). This small chunk is then fed into XML::Simple::XMLin() to
generate a small structure in memory which can then be used to extract
the date. if the date is >= 1/1/2002, then that small structure in
memory is pushed to a selected structure.
 
Reply With Quote
 
Klaus
Guest
Posts: n/a
 
      11-12-2011
On 12 nov, 11:28, Klaus <klau...@gmail.com> wrote:
> That's a perfect Job for XML::Reader
> [...]
> my $huge_xml =
> q{<?xml version="1.0"?>
> <library>
> [...]
> </library>
> };
> [...]
> my $rdr = XML::Reader->new(\$huge_xml, {mode => 'branches'},
> * { root => '/library/book', branch => '*' });


That's, of course, better written with an external file ('huge.xml'):

open my $fh, '>', 'huge.xml' or die $!;
print {$fh}
q{<?xml version="1.0"?>
<library>
[...]
</library>
};
close $fh;
[...]
my $rdr = XML::Reader->new('huge.xml', {mode => 'branches'},
{ root => '/library/book', branch => '*' });
[...]

The rest stays exactly the same:

> while ($rdr->iterate) {
> * * my $small_ref = XMLin($rdr->rvalue);
>
> * * my ($day, $month, $year) =
> * * * $small_ref->{'publication_date'} =~
> * * * m{\A (\d+) / (\d+) / (\d+) \z}xms;
>
> * * unless (defined $day) * { $day * = 0; }
> * * unless (defined $month) { $month = 0; }
> * * unless (defined $year) *{ $year *= 0; }
>
> * * my $date = sprintf('%04d-%02d-%02d', $year, $month, $day);
>
> * * if ($date ge '2002-01-01') {
> * * * * push @{$selected->{book}}, $small_ref;
> * * }}
>
> print Dumper($selected);
>
> > The import will completely ignore entries that don't meet the
> > specified criteria (in this case, publication_date >= '1/1/2002').

>
> Yes, the way it works is that XML::Reader reads from a huge XML only
> small chunks (via $rdr->rvalue) (a small chunk being the '<book>...</
> book> part). This small chunk is then fed into XML::Simple::XMLin() to
> generate a small structure in memory which can then be used to extract
> the date. if the date is >= 1/1/2002, then that small structure in
> memory is pushed to a selected structure.

 
Reply With Quote
 
Dwight Army of Champions
Guest
Posts: n/a
 
      11-13-2011
On Nov 12, 5:28*am, Klaus <klau...@gmail.com> wrote:
> On 12 nov, 01:11, Dwight Army of Champions
>
>
>
>
>
>
>
>
>
> <dwightarmyofchampi...@hotmail.com> wrote:
> > Suppose I only want to import books that were published after January
> > 1, 2002. If I apply such a filter when I do my initial import, the
> > result should look like this:

>
> > $VAR1 = {
> > * * * * * 'book' => [
> > * * * * * * * * * * {
> > * * * * * * * * * * * 'publication_date' => '07/22/2003',
> > * * * * * * * * * * * 'price' => '17.49',
> > * * * * * * * * * * * 'author' => 'Dennis Lehane',
> > * * * * * * * * * * * 'title' => 'Mystic River',
> > * * * * * * * * * * * 'rating' => '4',
> > * * * * * * * * * * * 'pages' => '390',
> > * * * * * * * * * * * 'genre' => 'Thriller'
> > * * * * * * * * * * },
> > * * * * * * * * * * {
> > * * * * * * * * * * * 'publication_date' => '10/12/2005',
> > * * * * * * * * * * * 'price' => '10.99',
> > * * * * * * * * * * * 'author' => 'J. R. R. Tolkien',
> > * * * * * * * * * * * 'title' => 'The Lord Of The Rings',
> > * * * * * * * * * * * 'rating' => '5',
> > * * * * * * * * * * * 'pages' => '3489',
> > * * * * * * * * * * * 'genre' => 'Fantasy'
> > * * * * * * * * * * }
> > * * * * * * * * * ]
> > * * * * };

>
> That's a perfect Job for XML::Reader
>
> use strict;
> use warnings;
>
> use XML::Reader;
> use XML::Simple;
> use Data:umper;
>
> my $huge_xml =
> q{<?xml version="1.0"?>
> <library>
> * * <book>
> * * * * <title>Dreamcatcher</title>
> * * * * <author>Stephen King</author>
> * * * * <genre>Horror</genre>
> * * * * <pages>899</pages>
> * * * * <price>23.99</price>
> * * * * <rating>5</rating>
> * * * * <publication_date>11/27/2001</publication_date>
> * * </book>
> * * <book>
> * * * * <title>Mystic River</title>
> * * * * <author>Dennis Lehane</author>
> * * * * <genre>Thriller</genre>
> * * * * <pages>390</pages>
> * * * * <price>17.49</price>
> * * * * <rating>4</rating>
> * * * * <publication_date>07/22/2003</publication_date>
> * * </book>
> * * <book>
> * * * * <title>The Lord Of The Rings</title>
> * * * * <author>J. R. R. Tolkien</author>
> * * * * <genre>Fantasy</genre>
> * * * * <pages>3489</pages>
> * * * * <price>10.99</price>
> * * * * <rating>5</rating>
> * * * * <publication_date>10/12/2005</publication_date>
> * * </book>
> </library>
>
> };
>
> my $selected = { book => [] };
>
> my $rdr = XML::Reader->new(\$huge_xml, {mode => 'branches'},
> * { root => '/library/book', branch => '*' });
>
> while ($rdr->iterate) {
> * * my $small_ref = XMLin($rdr->rvalue);
>
> * * my ($day, $month, $year) =
> * * * $small_ref->{'publication_date'} =~
> * * * m{\A (\d+) / (\d+) / (\d+) \z}xms;
>
> * * unless (defined $day) * { $day * = 0; }
> * * unless (defined $month) { $month = 0; }
> * * unless (defined $year) *{ $year *= 0; }
>
> * * my $date = sprintf('%04d-%02d-%02d', $year, $month, $day);
>
> * * if ($date ge '2002-01-01') {
> * * * * push @{$selected->{book}}, $small_ref;
> * * }}
>
> print Dumper($selected);
>
> > The import will completely ignore entries that don't meet the
> > specified criteria (in this case, publication_date >= '1/1/2002').

>
> Yes, the way it works is that XML::Reader reads from a huge XML only
> small chunks (via $rdr->rvalue) (a small chunk being the '<book>...</
> book> part). This small chunk is then fed into XML::Simple::XMLin() to
> generate a small structure in memory which can then be used to extract
> the date. if the date is >= 1/1/2002, then that small structure in
> memory is pushed to a selected structure.

Yes that is exactly what I need. Thank you!

Follow-up question: Suppose that the library contains more than just
books. Let's say we expand the XML file to include music items, like
so:

<music>
<title>The Future Will Come</title>
<artist>The Juan Maclean</artist>
<release_date>04/21/2009</release_date>
<label>DFA</label>
</music>
<music>
<title>Laughing Stock</title>
<artist>Talk Talk</artist>
<release_date>09/16/1991</release_date>
<label>Verve</label>
</music>
<music>
<title>Hardcore Will Never Die, But You Will</title>
<artist>Mogwai</artist>
<release_date>02/14/2011</release_date>
<label>Rock Action Records</label>
</music>

Can we take the January 1, 2002 date and apply it to both
publication_date for books and release_date for music?

if ($item_is_a_book && $publication_date ge '2002-01-01') {
push @{$selected->{book}}, $small_ref;
}
else if ($item_is_a_music_item && $release_date ge '2002-01-01') {
push @{$selected->{music}}, $small_ref;
}

I mean, I'm sure we could create an entirely separate XML::Reader
object and do another traversal of the input file in another while
loop (this time looking for music instead of books), but that would
double the execution time of the program. I was wondering if we could
look for both types of items in one go.
 
Reply With Quote
 
Peter J. Holzer
Guest
Posts: n/a
 
      11-13-2011
On 2011-11-11 23:10, Dwight Army of Champions <> wrote:
> On Nov 11, 5:59*pm, Ben Morrow <b...@morrow.me.uk> wrote:
>> Quoth Dwight Army of Champions <dwightarmyofchampi...@hotmail.com>:
>>
>> > I have a very large XML file that I want to load, but I don't want to
>> > necessarily load the entire document; that takes too long. What I want
>> > to do instead is only key/value pairs that meet certain criteria, like
>> > only grab entries whose value fall within a certain date for a key
>> > date_of_entry. Can I just use XML::Simple for this or do I need a
>> > better module?

>>
>> It sounds like you want either XML::Twig or one of the SAX modules.
>> XML::Simple, at least in non-SAX mode, will load the entire document
>> into a tree structure before letting you see any of it.
>>
>> Ben

>
> I'm glancing at XML::Twig on search.cpan.org, What methods can I use
> to accomplish these tasks? I don't see any kind of "filter" method...


You specify the "filter" in the constructor. The twig_handlers attribute
specifies which handler to call for each "twig" (i.e. an element and its
descendants) that matches an XPath expression. So, if you can express
your filter as an XPath, you just specify that and your handler will be
called for each matching twig. If your filter is more complicated, you
specify a more lenient XPath expression and then do additional filtering
in the handler.

For example, here is an excerpt from one of my scripts:

[...]
my $twig=XML::Twig->new(
start_tag_handlers => {
'table[@class="route"]' => sub {
$in_route = 1;
},
},
twig_handlers => {
'title' => sub {
my ($t, $title) = @_;
my $stored_title = $title->children_trimmed_text();
my $computed_title = "hjp: laufen: $date";
unless ($stored_title eq $computed_title) {
$title->set_inner_xml($computed_title);
$modified = 1;
}
},
'table[@class="route"]' => sub {
$in_route = 0;
},
'tr' => sub {
my ($t, $row) = @_;
return unless $in_route;
my @cells = $row->children;
# print "# of cells: ", scalar(@cells), "\n";

my @pl = $row->get_xpath('th');
my $place = $pl[0]->children_trimmed_text if (@pl);

# there doesn't seem to be an XPath expression
# equivalent to the CSS selector [att~=val], so
# we have to do it the hard way.
my @dt = $row->get_xpath('td');
@dt = grep { ($_->att('class') // '') =~ /\bdt\b/ } @dt;
my $stored_dt = "";
my $stored_q = "";
if (@dt) {
$stored_dt = $dt[0]->children_trimmed_text;
if ($dt[0]->att('class') =~ /\bq([0-9])\b/) {
$stored_q = $1;
}
}
[...]

I matched <title> and <table class="route"> elements directly, but I
couldn't figure out how to match all td elements belonging to class
"dt", so I matched on <tr> instead and then did a grep over the child
elements. (I now also see that matching a <tr> within a <table
class="route"> could be achieved in a much simpler way than I did it. I
obviously didn't understand XPath very well when I wrote that.

hp
 
Reply With Quote
 
Klaus
Guest
Posts: n/a
 
      11-13-2011
On 13 nov, 06:44, Dwight Army of Champions
<dwightarmyofchampi...@hotmail.com> wrote:
> On Nov 12, 5:28*am, Klaus <klau...@gmail.com> wrote:
> > That's a perfect Job for XML::Reader
> > [...]
> > my $rdr = XML::Reader->new(\$huge_xml, {mode => 'branches'},
> > * { root => '/library/book', branch => '*' });
> > while ($rdr->iterate) {
> > * * my $small_ref = XMLin($rdr->rvalue);


> Yes that is exactly what I need. Thank you!
>
> Follow-up question: Suppose that the library contains more than just
> books. Let's say we expand the XML file to include music
> items [...]
>
> Can we take the January 1, 2002 date and apply it to both
> publication_date for books and release_date for music?
>
> if ($item_is_a_book && $publication_date ge '2002-01-01') {
> * push @{$selected->{book}}, $small_ref;}
>
> else if ($item_is_a_music_item && $release_date ge '2002-01-01') {
> * push @{$selected->{music}}, $small_ref;
>
> }
>
> I mean, I'm sure we could create an entirely separate XML::Reader
> object and do another traversal of the input file in another while
> loop (this time looking for music instead of books), but that would
> double the execution time of the program. I was wondering if we could
> look for both types of items in one go.


Yes, that's in fact what XML::Reader is designed to do. You just need
to add another line { root => '/library/music', branch => '*' } and
then, inside your loop you just need to check $rdr->rx (which is 0 if
it found a <book> item or 1 if it found a <music> item). With that
logic, the file 'huge.xml' is parsed only once, while extracting
<book> and/or <music> items as it goes along.

************************************************** ***

The important lines are:

[...]

my $selected = { book => [], music => [] };

my $rdr = XML::Reader->new('huge.xml', {mode => 'branches'},
{ root => '/library/book', branch => '*' },
{ root => '/library/music', branch => '*' });


while ($rdr->iterate) {
my $small_ref = XMLin($rdr->rvalue);
my $topic = $rdr->rx == 0 ? 'book' : 'music';

[...]

************************************************** ***

Here is a complete program:

use strict;
use warnings;

use XML::Reader;
use XML::Simple;
use Data:umper;

open my $fh, '>', 'huge.xml' or die $!;

print {$fh}
q{<?xml version="1.0"?>
<library>
<book>
<title>Dreamcatcher</title>
<author>Stephen King</author>
<genre>Horror</genre>
<pages>899</pages>
<price>23.99</price>
<rating>5</rating>
<publication_date>11/27/2001</publication_date>
</book>
<music>
<title>The Future Will Come</title>
<artist>The Juan Maclean</artist>
<release_date>04/21/2009</release_date>
<label>DFA</label>
</music>
<book>
<title>Mystic River</title>
<author>Dennis Lehane</author>
<genre>Thriller</genre>
<pages>390</pages>
<price>17.49</price>
<rating>4</rating>
<publication_date>07/22/2003</publication_date>
</book>
<music>
<title>Laughing Stock</title>
<artist>Talk Talk</artist>
<release_date>09/16/1991</release_date>
<label>Verve</label>
</music>
<book>
<title>The Lord Of The Rings</title>
<author>J. R. R. Tolkien</author>
<genre>Fantasy</genre>
<pages>3489</pages>
<price>10.99</price>
<rating>5</rating>
<publication_date>10/12/2005</publication_date>
</book>
<music>
<title>Hardcore Will Never Die, But You Will</title>
<artist>Mogwai</artist>
<release_date>02/14/2011</release_date>
<label>Rock Action Records</label>
</music>
</library>
};

close $fh;

my $selected = { book => [], music => [] };

my $rdr = XML::Reader->new('huge.xml', {mode => 'branches'},
{ root => '/library/book', branch => '*' },
{ root => '/library/music', branch => '*' });

while ($rdr->iterate) {
my $small_ref = XMLin($rdr->rvalue);
my $topic = $rdr->rx == 0 ? 'book' : 'music';

my $dat_ele = $topic eq 'book'
? $small_ref->{'publication_date'}
: $small_ref->{'release_date'};

my ($day, $month, $year) = $dat_ele =~
m{\A (\d+) / (\d+) / (\d+) \z}xms;

unless (defined $day) { $day = 0; }
unless (defined $month) { $month = 0; }
unless (defined $year) { $year = 0; }

my $date = sprintf('%04d-%02d-%02d', $year, $month, $day);

if ($topic eq 'book') {
if ($date ge '2002-01-01') {
push @{$selected->{book}}, $small_ref;
}
}
elsif ($topic eq 'music') {
if ($date ge '2002-01-01') {
push @{$selected->{music}}, $small_ref;
}
}
}

print Dumper($selected);
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Re: How include a large array? Edward A. Falk C Programming 1 04-04-2013 08:07 PM
reformatting only a part of the XML using XSL unaveen XML 1 03-18-2008 02:53 AM
XML Schema question - does "import" import elements? Vitali Gontsharuk XML 2 08-25-2005 07:33 PM
ActiveX apologetic Larry Seltzer... "Sun paid for malicious ActiveX code, and Firefox is bad, bad bad baad. please use ActiveX, it's secure and nice!" (ok, the last part is irony on my part) fernando.cassia@gmail.com Java 0 04-16-2005 10:05 PM
Easy part done, now the hard part!! jollyjimpoppy A+ Certification 0 09-10-2003 10:37 AM



Advertisments
 



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57