Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Java > Slowness of SAX

Reply
Thread Tools

Slowness of SAX

 
 
Arne Vajhøj
Guest
Posts: n/a
 
      11-13-2008
Lew wrote:
> Arne Vajhøj wrote:
>> Roedy Green wrote:
>>> On Wed, 12 Nov 2008 09:16:44 +0100, Sigfried <(E-Mail Removed)>
>>> wrote, quoted or indirectly quoted someone who said :
>>>> Do you know faster XML API ?
>>>
>>> XML/SAX is inherently a high-overhead format, best used for small
>>> files.

>>
>> No - SAX is the XML parser for huge files.
>>
>> For small files DOM and XPath is much easier.

>
> Quite so. The advantage of SAX over DOM is that it is quite fast, very
> easy on memory requirements and suitable for single-pass processing of
> XML documents. Its disadvantage is that it does not keep an in-memory
> representation of the XML document for repeated processing.


Plus compared to XPath you need to write a lot of code to do some
advanced searching.

Arne
 
Reply With Quote
 
 
 
 
Mike Schilling
Guest
Posts: n/a
 
      11-13-2008
Lew wrote:
> Arne Vajhj wrote:
>> Roedy Green wrote:
>>> On Wed, 12 Nov 2008 09:16:44 +0100, Sigfried
>>> <(E-Mail Removed)>
>>> wrote, quoted or indirectly quoted someone who said :
>>>> Do you know faster XML API ?
>>>
>>> XML/SAX is inherently a high-overhead format, best used for small
>>> files.

>>
>> No - SAX is the XML parser for huge files.
>>
>> For small files DOM and XPath is much easier.

>
> Quite so. The advantage of SAX over DOM is that it is quite fast,
> very easy on memory requirements and suitable for single-pass
> processing of XML documents. Its disadvantage is that it does not
> keep an in-memory representation of the XML document for repeated
> processing.


However, if you want to create an in-memory representation of a subset
of a huge document, SAX is the way to build it. In fact, making SAX
callbacks create a DOM (optionally filtering out part of the
document's content) is a pretty trivial exercise.


 
Reply With Quote
 
 
 
 
Lew
Guest
Posts: n/a
 
      11-13-2008
Arne Vajhøj wrote:
> Lew wrote:
>> Arne Vajhøj wrote:
>>> Roedy Green wrote:
>>>> On Wed, 12 Nov 2008 09:16:44 +0100, Sigfried <(E-Mail Removed)>
>>>> wrote, quoted or indirectly quoted someone who said :
>>>>> Do you know faster XML API ?
>>>>
>>>> XML/SAX is inherently a high-overhead format, best used for small
>>>> files.
>>>
>>> No - SAX is the XML parser for huge files.
>>>
>>> For small files DOM and XPath is much easier.

>>
>> Quite so. The advantage of SAX over DOM is that it is quite fast,
>> very easy on memory requirements and suitable for single-pass
>> processing of XML documents. Its disadvantage is that it does not
>> keep an in-memory representation of the XML document for repeated
>> processing.

>
> Plus compared to XPath you need to write a lot of code to do some
> advanced searching.


That isn't the point of SAX. SAX lets you import XML-encoded information
directly into an in-memory structure - that being the "lot" of code you need
to write but not really necessarily all that much. Once you have your object
model built, there shouldn't be a need for "advanced searching", you just
directly use the objects that you built.

If there is a need for advanced searching, then perhaps SAX is the wrong choice.

--
Lew
 
Reply With Quote
 
Arne Vajhøj
Guest
Posts: n/a
 
      11-13-2008
Lew wrote:
> Arne Vajhøj wrote:
>> Lew wrote:
>>> Arne Vajhøj wrote:
>>>> Roedy Green wrote:
>>>>> On Wed, 12 Nov 2008 09:16:44 +0100, Sigfried <(E-Mail Removed)>
>>>>> wrote, quoted or indirectly quoted someone who said :
>>>>>> Do you know faster XML API ?
>>>>>
>>>>> XML/SAX is inherently a high-overhead format, best used for small
>>>>> files.
>>>>
>>>> No - SAX is the XML parser for huge files.
>>>>
>>>> For small files DOM and XPath is much easier.
>>>
>>> Quite so. The advantage of SAX over DOM is that it is quite fast,
>>> very easy on memory requirements and suitable for single-pass
>>> processing of XML documents. Its disadvantage is that it does not
>>> keep an in-memory representation of the XML document for repeated
>>> processing.

>>
>> Plus compared to XPath you need to write a lot of code to do some
>> advanced searching.

>
> That isn't the point of SAX. SAX lets you import XML-encoded
> information directly into an in-memory structure - that being the "lot"
> of code you need to write but not really necessarily all that much.
> Once you have your object model built, there shouldn't be a need for
> "advanced searching", you just directly use the objects that you built.
>
> If there is a need for advanced searching, then perhaps SAX is the wrong
> choice.


The last is my point.

Doing //sometag/someothertag[athirdtag/@someattr='foobar']/afourthtag/text()
in SAX would require a lot more code than just a selectSingleNode
call.

Arne
 
Reply With Quote
 
Lew
Guest
Posts: n/a
 
      11-13-2008
Lew wrote:
>> If there is a need for advanced searching, then perhaps SAX is the
>> wrong choice.


Arne Vajhøj wrote:
> The last is my point.
>
> Doing
> //sometag/someothertag[athirdtag/@someattr='foobar']/afourthtag/text()
> in SAX would require a lot more code than just a selectSingleNode
> call.


But that wouldn't even be SAX - it's an entirely different universe. I know
that's your point, but it leaves me confused. If you use SAX, there wouldn't
even be a need to search - everything would already be right where you could
find it. The whole question of searching would never even come up.

That is one of the advantages of SAX over DOM. With DOM, you have this huge
memory structure that you have to search with XPath expressions that are hard
to figure out and run really slowly. With SAX you read things right into an
object model where you don't have to look for things, and you can access them
directly. Searching is irrelevant.

--
Lew
 
Reply With Quote
 
Sigfried
Guest
Posts: n/a
 
      11-13-2008
Tom Anderson a crit :
> On Wed, 12 Nov 2008, Sigfried wrote:
>
>> Hi, using a java profiler, i've realized that SAX is consuming too
>> much time:
>> - endElement + startElement 40 %
>> - *.read 7 %
>> - a few <= 1%
>>
>> So SAX take about 50 % of the time !!

>
> Which startElement and endElement methods are these? I assume not the
> ones in the ContentHandler, right?
>
>> Do you know faster XML API ?

>
> http://www.itu.int/rec/T-REC-X.891-200505-I/en
> http://java.sun.com/developer/techni...l/fastinfoset/
>
> Although that's probably not what you meant.


Your articles did convince me to use a binary format instead of text
format. But fastinfoset is still close to XML. Since my XML is mostly
Double.toString / parseDouble, i guess using java serialization would be
a better (and bigger) step.


> But seriously, XML isn't fast. Never has been, never will be. If you
> need fast, don't use XML. Fast XML parsing is like semi racing: even if
> you win, you're still retarded.


lol i did knew it for arguing on the internet.
 
Reply With Quote
 
Arne Vajhøj
Guest
Posts: n/a
 
      11-16-2008
Lew wrote:
> Lew wrote:
>>> If there is a need for advanced searching, then perhaps SAX is the
>>> wrong choice.

>
> Arne Vajhøj wrote:
>> The last is my point.
>>
>> Doing
>> //sometag/someothertag[athirdtag/@someattr='foobar']/afourthtag/text()
>> in SAX would require a lot more code than just a selectSingleNode
>> call.

>
> But that wouldn't even be SAX - it's an entirely different universe. I
> know that's your point, but it leaves me confused. If you use SAX,
> there wouldn't even be a need to search - everything would already be
> right where you could find it. The whole question of searching would
> never even come up.
>
> That is one of the advantages of SAX over DOM. With DOM, you have this
> huge memory structure that you have to search with XPath expressions
> that are hard to figure out and run really slowly. With SAX you read
> things right into an object model where you don't have to look for
> things, and you can access them directly. Searching is irrelevant.


Not necessarily.

You can use can use SAX to just pick a small subset of the XML as well.

And have a need to code that "pick".

Arne

 
Reply With Quote
 
Robert Klemme
Guest
Posts: n/a
 
      11-17-2008
On 16.11.2008 04:55, Arne Vajhj wrote:
> Lew wrote:
>> Lew wrote:
>>>> If there is a need for advanced searching, then perhaps SAX is the
>>>> wrong choice.

>>
>> Arne Vajhj wrote:
>>> The last is my point.
>>>
>>> Doing
>>> //sometag/someothertag[athirdtag/@someattr='foobar']/afourthtag/text()
>>> in SAX would require a lot more code than just a selectSingleNode
>>> call.

>>
>> But that wouldn't even be SAX - it's an entirely different universe.
>> I know that's your point, but it leaves me confused. If you use SAX,
>> there wouldn't even be a need to search - everything would already be
>> right where you could find it. The whole question of searching would
>> never even come up.
>>
>> That is one of the advantages of SAX over DOM. With DOM, you have
>> this huge memory structure that you have to search with XPath
>> expressions that are hard to figure out and run really slowly. With
>> SAX you read things right into an object model where you don't have to
>> look for things, and you can access them directly. Searching is
>> irrelevant.

>
> Not necessarily.
>
> You can use can use SAX to just pick a small subset of the XML as well.
>
> And have a need to code that "pick".


I fully agree with Lew: if you have to do XPath like searching on your
subset you picked the completely wrong data structure for your SAX
processing.

If you meant that the subset picking should be done with XPath then you
have a generic mechanism for which DOM is probably a better choice. If
your searching requirements are not as broad you can easily create your
own simplified searching with SAX - and it's still more efficient for
this than DOM.

robert

 
Reply With Quote
 
Arne Vajhj
Guest
Posts: n/a
 
      11-19-2008
Robert Klemme wrote:
> If you meant that the subset picking should be done with XPath then you
> have a generic mechanism for which DOM is probably a better choice.


That was approx. my point.

Arne
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Moz .8 "Save As" dialog slowness Stubby Firefox 4 06-29-2005 04:47 PM
EIGRP slowness and backup jmiklo Cisco 2 11-23-2004 07:32 AM
Netbios over IP slowness with 3600 Ciscos, IMAs and ATM The Prisoner Cisco 2 02-03-2004 03:00 PM
Problems With Frame Relay Slowness Phin Cisco 0 01-22-2004 03:47 AM



Advertisments