On 13 nov, 06:44, Dwight Army of Champions
<dwightarmyofchampi...@hotmail.com> wrote:
> On Nov 12, 5:28*am, Klaus <klau...@gmail.com> wrote:
> > That's a perfect Job for XML::Reader
> > [...]
> > my $rdr = XML::Reader->new(\$huge_xml, {mode => 'branches'},
> > * { root => '/library/book', branch => '*' });
> > while ($rdr->iterate) {
> > * * my $small_ref = XMLin($rdr->rvalue);
> Yes that is exactly what I need. Thank you!
>
> Follow-up question: Suppose that the library contains more than just
> books. Let's say we expand the XML file to include music
> items [...]
>
> Can we take the January 1, 2002 date and apply it to both
> publication_date for books and release_date for music?
>
> if ($item_is_a_book && $publication_date ge '2002-01-01') {
> * push @{$selected->{book}}, $small_ref;}
>
> else if ($item_is_a_music_item && $release_date ge '2002-01-01') {
> * push @{$selected->{music}}, $small_ref;
>
> }
>
> I mean, I'm sure we could create an entirely separate XML::Reader
> object and do another traversal of the input file in another while
> loop (this time looking for music instead of books), but that would
> double the execution time of the program. I was wondering if we could
> look for both types of items in one go.
Yes, that's in fact what XML::Reader is designed to do. You just need
to add another line { root => '/library/music', branch => '*' } and
then, inside your loop you just need to check $rdr->rx (which is 0 if
it found a <book> item or 1 if it found a <music> item). With that
logic, the file 'huge.xml' is parsed only once, while extracting
<book> and/or <music> items as it goes along.
************************************************** ***
The important lines are:
[...]
my $selected = { book => [], music => [] };
my $rdr = XML::Reader->new('huge.xml', {mode => 'branches'},
{ root => '/library/book', branch => '*' },
{ root => '/library/music', branch => '*' });
while ($rdr->iterate) {
my $small_ref = XMLin($rdr->rvalue);
my $topic = $rdr->rx == 0 ? 'book' : 'music';
[...]
************************************************** ***
Here is a complete program:
use strict;
use warnings;
use XML::Reader;
use XML::Simple;
use Data:

umper;
open my $fh, '>', 'huge.xml' or die $!;
print {$fh}
q{<?xml version="1.0"?>
<library>
<book>
<title>Dreamcatcher</title>
<author>Stephen King</author>
<genre>Horror</genre>
<pages>899</pages>
<price>23.99</price>
<rating>5</rating>
<publication_date>11/27/2001</publication_date>
</book>
<music>
<title>The Future Will Come</title>
<artist>The Juan Maclean</artist>
<release_date>04/21/2009</release_date>
<label>DFA</label>
</music>
<book>
<title>Mystic River</title>
<author>Dennis Lehane</author>
<genre>Thriller</genre>
<pages>390</pages>
<price>17.49</price>
<rating>4</rating>
<publication_date>07/22/2003</publication_date>
</book>
<music>
<title>Laughing Stock</title>
<artist>Talk Talk</artist>
<release_date>09/16/1991</release_date>
<label>Verve</label>
</music>
<book>
<title>The Lord Of The Rings</title>
<author>J. R. R. Tolkien</author>
<genre>Fantasy</genre>
<pages>3489</pages>
<price>10.99</price>
<rating>5</rating>
<publication_date>10/12/2005</publication_date>
</book>
<music>
<title>Hardcore Will Never Die, But You Will</title>
<artist>Mogwai</artist>
<release_date>02/14/2011</release_date>
<label>Rock Action Records</label>
</music>
</library>
};
close $fh;
my $selected = { book => [], music => [] };
my $rdr = XML::Reader->new('huge.xml', {mode => 'branches'},
{ root => '/library/book', branch => '*' },
{ root => '/library/music', branch => '*' });
while ($rdr->iterate) {
my $small_ref = XMLin($rdr->rvalue);
my $topic = $rdr->rx == 0 ? 'book' : 'music';
my $dat_ele = $topic eq 'book'
? $small_ref->{'publication_date'}
: $small_ref->{'release_date'};
my ($day, $month, $year) = $dat_ele =~
m{\A (\d+) / (\d+) / (\d+) \z}xms;
unless (defined $day) { $day = 0; }
unless (defined $month) { $month = 0; }
unless (defined $year) { $year = 0; }
my $date = sprintf('%04d-%02d-%02d', $year, $month, $day);
if ($topic eq 'book') {
if ($date ge '2002-01-01') {
push @{$selected->{book}}, $small_ref;
}
}
elsif ($topic eq 'music') {
if ($date ge '2002-01-01') {
push @{$selected->{music}}, $small_ref;
}
}
}
print Dumper($selected);