Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > XML > DOM sub trees whilst SAX'ing in perl?

Reply
Thread Tools

DOM sub trees whilst SAX'ing in perl?

 
 
bugbear
Guest
Posts: n/a
 
      02-17-2005
I need to process some XML files that are rather large.
However their structure may usefully be expressed
as
<ELEMENT FILE (RECORD)+>
..
..
..

Each record is a few Kb. The files are many 10's of Megabytes.

I would (dearly) like to use DOM to process each record,
since it's easier to get my head round than SAX events.

But I don't want to pull the whole file into
a DOM tree; it's too big.

These people have come up with a perfect (and obvious?)
solution:
http://www.devsphere.com/xml/saxdomix/

But I'm coding in a Perl environment.

Is there a similar Module, generating separate
DOM sub trees for Perl?

BugBear
 
Reply With Quote
 
 
 
 
Michel Rodriguez
Guest
Posts: n/a
 
      02-17-2005
bugbear wrote:
> I need to process some XML files that are rather large.
> However their structure may usefully be expressed
> as
> <ELEMENT FILE (RECORD)+>
> .
> .
> .
>
> Each record is a few Kb. The files are many 10's of Megabytes.
>
> I would (dearly) like to use DOM to process each record,
> since it's easier to get my head round than SAX events.
>
> But I don't want to pull the whole file into
> a DOM tree; it's too big.
>
> These people have come up with a perfect (and obvious?)
> solution:
> http://www.devsphere.com/xml/saxdomix/
>
> But I'm coding in a Perl environment.
>
> Is there a similar Module, generating separate
> DOM sub trees for Perl?


It looks like what XML::Twig does, except XML::Twig is not SAX/DOM based.

--
mirod
 
Reply With Quote
 
 
 
 
bugbear
Guest
Posts: n/a
 
      02-17-2005
Michel Rodriguez wrote:
> bugbear wrote:


>> These people have come up with a perfect (and obvious?)
>> solution:
>> http://www.devsphere.com/xml/saxdomix/
>>
>> But I'm coding in a Perl environment.
>>
>> Is there a similar Module, generating separate
>> DOM sub trees for Perl?

>
>
> It looks like what XML::Twig does, except XML::Twig is not SAX/DOM based.
>


OK. That does the right thing; I'd prefer to stay with standards
(i.e. SAX and DOM) if possible. I'll keep looking, and bear
XML::Twig in mind as a fall back position.

BugBear
 
Reply With Quote
 
SL
Guest
Posts: n/a
 
      02-17-2005
> >> Is there a similar Module, generating separate
> >> DOM sub trees for Perl?

> >
> >
> > It looks like what XML::Twig does, except XML::Twig is not SAX/DOM

based.
> >

>
> OK. That does the right thing; I'd prefer to stay with standards
> (i.e. SAX and DOM) if possible. I'll keep looking, and bear
> XML::Twig in mind as a fall back position.
>


I haven't used it since a while, but there is (or was) a package doing what
you want on CPAN: DocSplitter in XML::SAX::Machines. It allows you to split
a SAX stream into several smaller documents by throwing a startDocument()
and endDocument() event before and after a particular element. For instance,
you may split your stream on each RECORD element, so that each filter below
in the pipeline process RECORD element as the root element of distinct
document. This is is useful in particular with the filtre XML::Filter::XSLT
by Matt Sergeant. If you want to merge again the results of the
transformation into a big document, you may use a "Merger" in the pipeline
package; it works with the splitter for removing the extra startDocument()
and endDocument() events. Machines provide several facilities for dealing
with SAX pipeline.

HTH,
SL


 
Reply With Quote
 
bugbear
Guest
Posts: n/a
 
      02-17-2005
SL wrote:
>>>>Is there a similar Module, generating separate
>>>>DOM sub trees for Perl?
>>>
>>>
>>>It looks like what XML::Twig does, except XML::Twig is not SAX/DOM

>
> based.
>
>>OK. That does the right thing; I'd prefer to stay with standards
>>(i.e. SAX and DOM) if possible. I'll keep looking, and bear
>>XML::Twig in mind as a fall back position.
>>

>
>
> I haven't used it since a while, but there is (or was) a package doing what
> you want on CPAN: DocSplitter in XML::SAX::Machines. It allows you to split
> a SAX stream into several smaller documents by throwing a startDocument()
> and endDocument() event before and after a particular element. For instance,
> you may split your stream on each RECORD element, so that each filter below
> in the pipeline process RECORD element as the root element of distinct
> document. This is is useful in particular with the filtre XML::Filter::XSLT
> by Matt Sergeant. If you want to merge again the results of the
> transformation into a big document, you may use a "Merger" in the pipeline
> package; it works with the splitter for removing the extra startDocument()
> and endDocument() events. Machines provide several facilities for dealing
> with SAX pipeline.


So how do I get my DOM(s)?

BugBear
 
Reply With Quote
 
SL
Guest
Posts: n/a
 
      02-17-2005
> So how do I get my DOM(s)?

Look into the XML::Filter::XSLT::LibXSLT filter : it used
XML::LibXML::SAX::Builder for building a DOM using the SAX events received.

SL


 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Death To Sub-Sub-Sub-Directories! Lawrence D'Oliveiro Java 92 05-20-2011 06:50 AM
Binary search trees (AVL trees) jacob navia C Programming 34 01-08-2010 07:27 PM
Recognising Sub-Items and sub-sub items using xslt Ben XML 2 09-19-2007 09:35 AM
Xerces C++ and DOM trees Peter Saffrey XML 1 01-06-2005 03:21 PM
how do make a pop-up in sub ASP.net sub ? THY ASP .Net 1 08-18-2003 11:30 PM



Advertisments