Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > XML > Parsing for Performance

Reply
Thread Tools

Parsing for Performance

 
 
Paul
Guest
Posts: n/a
 
      04-22-2005
I have users who want to search 6 different large flat xml documents

I can only fit 3 of these documents into memory at one time

So I continually have to swap XML documents in and out of memory

Is it best to use DOM or SAX? or maybe something else?

Using SAX seems like the technology of choice for large xml files
because there is no need to put the xml into memory. But under load
would there not be a hard disk issue from numerous concurrent searches
on a big xml file?

Using DOM would give really quick search times, but since the
different xml files need to keep swapping in and out of memory, surly
constantly parsing the files into memory is hammering the hd just as
much as SAX?

So presumably SAX is the best of the worse?

or is there some other technique that would be better (Discount normal
databases and native xml databases) I know these would be faster, but
we need a quick fix
 
Reply With Quote
 
 
 
 
William Park
Guest
Posts: n/a
 
      04-22-2005
Paul <(E-Mail Removed)> wrote:
> I have users who want to search 6 different large flat xml documents
>
> I can only fit 3 of these documents into memory at one time
>
> So I continually have to swap XML documents in and out of memory
>
> Is it best to use DOM or SAX? or maybe something else?
>
> Using SAX seems like the technology of choice for large xml files
> because there is no need to put the xml into memory. But under load
> would there not be a hard disk issue from numerous concurrent searches
> on a big xml file?
>
> Using DOM would give really quick search times, but since the
> different xml files need to keep swapping in and out of memory, surly
> constantly parsing the files into memory is hammering the hd just as
> much as SAX?
>
> So presumably SAX is the best of the worse?
>
> or is there some other technique that would be better (Discount normal
> databases and native xml databases) I know these would be faster, but
> we need a quick fix


If you want to extract some data and throw away the rest, then top-down
XML parser is good choice. Eg. practically every scripting language has
interface to Expat XML parser (www.libexpat.org). Heck, even Awk and Bash
shell has it.

--
William Park <(E-Mail Removed)>, Toronto, Canada
Slackware Linux -- because it works.
 
Reply With Quote
 
 
 
 
ajm
Guest
Posts: n/a
 
      04-25-2005
t'ja ....

as far as DOM v. SAX is concerned the former has a large
(sometimes v.v.large) memory footprint which might be a
problem for you. SAX on the other hand generally does
not (and concurrency might not matter depending on your
implementation e.g., a sensible SAX parser impl might
perform deep searches only when necessary etc.)

the rest, as they say, is implementation detail (and
likely depends on your choice of language etc.) I
recommend you profile your results etc. and take your
time (your "quick fix" might be nothing of the sort once
you have figured the total cost of your solution

hth,
ajm.


William Park <(E-Mail Removed)> wrote in message news:<85eb5$42691a04$d1b71688$(E-Mail Removed)>...
> Paul <(E-Mail Removed)> wrote:
> > I have users who want to search 6 different large flat xml documents
> >
> > I can only fit 3 of these documents into memory at one time
> >
> > So I continually have to swap XML documents in and out of memory
> >
> > Is it best to use DOM or SAX? or maybe something else?
> >
> > Using SAX seems like the technology of choice for large xml files
> > because there is no need to put the xml into memory. But under load
> > would there not be a hard disk issue from numerous concurrent searches
> > on a big xml file?
> >
> > Using DOM would give really quick search times, but since the
> > different xml files need to keep swapping in and out of memory, surly
> > constantly parsing the files into memory is hammering the hd just as
> > much as SAX?
> >
> > So presumably SAX is the best of the worse?
> >
> > or is there some other technique that would be better (Discount normal
> > databases and native xml databases) I know these would be faster, but
> > we need a quick fix

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Performance Tutorials Services - Boosting Performance by DisablingUnnecessary Services on Windows XP Home Edition Software Engineer Javascript 0 06-10-2011 02:18 AM
SOAP performance and unmarshalling/parsing speed Micah Wedemeyer Ruby 1 06-06-2007 01:49 PM
Performance File Parsing Thomas Kowalski C++ 1 08-18-2006 04:27 PM
odd performance question - xml parsing =?Utf-8?B?TWFyaw==?= ASP .Net 2 01-18-2006 04:18 PM
Web Form Performance Versus Single File Performance jm ASP .Net 1 12-12-2003 11:14 PM



Advertisments