Velocity Reviews

Velocity Reviews (http://www.velocityreviews.com/forums/index.php)
-   Java (http://www.velocityreviews.com/forums/f30-java.html)
-   -   Slowness of SAX (http://www.velocityreviews.com/forums/t644086-slowness-of-sax.html)

Sigfried 11-12-2008 08:16 AM

Slowness of SAX
 
Hi, using a java profiler, i've realized that SAX is consuming too much
time:
- endElement + startElement 40 %
- *.read 7 %
- a few <= 1%

So SAX take about 50 % of the time !!

Do you know faster XML API ?

Roedy Green 11-12-2008 08:36 AM

Re: Slowness of SAX
 
On Wed, 12 Nov 2008 09:16:44 +0100, Sigfried <sig.fried@hotmail.com>
wrote, quoted or indirectly quoted someone who said :

>Do you know faster XML API ?


XML/SAX is inherently a high-overhead format, best used for small
files. Consider converting your file to something else, e.g.
DataInputStream ar Serialised stream so you pay the overhead only
once.

See http://mindprod.com/jgloss/xml.html
for alternative processing techniques.

--
Roedy Green Canadian Mind Products
http://mindprod.com
Your old road is
Rapidly agin'.
Please get out of the new one
If you can't lend your hand
For the times they are a-changin'.

Lew 11-12-2008 01:58 PM

Re: Slowness of SAX
 
bugbear wrote:
> Sigfried wrote:
>> Hi, using a java profiler, i've [sic] realized that SAX is consuming too
>> much time:
>> - endElement + startElement 40 %
>> - *.read 7 %
>> - a few <= 1%
>>
>> So SAX take about 50 % of the time !!

>
> If all you're doing is parsing, what would you expect?


Indeed.

> Give us more context.


I found SAX to be extremely fast, arguably the (possibly tied for) fastest XML
parsing in Java. Back in 1999 we were able to parse a million rather large
documents in about three hours over a 10MB/s Ethernet connection using Java
1.2 on the hardware extant in those days using SAX, and it was very
parsimonious of memory. Parsers and JVMs (and hardware) have improved
considerably since then.

As bugbear points out, 50% of the time parsing is quite reasonable if at least
50% of the work to do is parsing, and if three-quarters of the work is parsing
you're money ahead.

--
Lew

Sigfried 11-12-2008 03:07 PM

Re: Slowness of SAX
 
bugbear a crit :
> Sigfried wrote:
>> Hi, using a java profiler, i've realized that SAX is consuming too
>> much time:
>> - endElement + startElement 40 %
>> - *.read 7 %
>> - a few <= 1%
>>
>> So SAX take about 50 % of the time !!

>
> If all you're doing is parsing, what would you expect?
>
> Give us more context.


I've tried the jdk 1.6 stax implementation which is 10 % faster, but the
DTD is ignored... So i guess Stax speed is the same as SAX. I would hope
pushing to 30 % for XML parsing.

Tom Anderson 11-12-2008 10:26 PM

Re: Slowness of SAX
 
On Wed, 12 Nov 2008, Sigfried wrote:

> Hi, using a java profiler, i've realized that SAX is consuming too much time:
> - endElement + startElement 40 %
> - *.read 7 %
> - a few <= 1%
>
> So SAX take about 50 % of the time !!


Which startElement and endElement methods are these? I assume not the ones
in the ContentHandler, right?

> Do you know faster XML API ?


http://www.itu.int/rec/T-REC-X.891-200505-I/en
http://java.sun.com/developer/techni...l/fastinfoset/

Although that's probably not what you meant.

But seriously, XML isn't fast. Never has been, never will be. If you need
fast, don't use XML. Fast XML parsing is like semi racing: even if you
win, you're still retarded.

tom

--
Safety not guaranteed. I have only done this once before.

Tom Anderson 11-12-2008 10:35 PM

Re: Slowness of SAX
 
On Wed, 12 Nov 2008, Tom Anderson wrote:

> On Wed, 12 Nov 2008, Sigfried wrote:
>
>> Hi, using a java profiler, i've realized that SAX is consuming too much
>> time:
>> - endElement + startElement 40 %
>> - *.read 7 %
>> - a few <= 1%
>>
>> So SAX take about 50 % of the time !!
>>
>> Do you know faster XML API ?

>
> http://www.itu.int/rec/T-REC-X.891-200505-I/en
> http://java.sun.com/developer/techni...l/fastinfoset/
>
> Although that's probably not what you meant.
>
> But seriously, XML isn't fast. Never has been, never will be. If you need
> fast, don't use XML. Fast XML parsing is like semi racing: even if you win,
> you're still retarded.


Although you could try this:

http://piccolo.sourceforge.net/

tom

--
Safety not guaranteed. I have only done this once before.

Arne Vajhj 11-13-2008 12:42 AM

Re: Slowness of SAX
 
Roedy Green wrote:
> On Wed, 12 Nov 2008 09:16:44 +0100, Sigfried <sig.fried@hotmail.com>
> wrote, quoted or indirectly quoted someone who said :
>> Do you know faster XML API ?

>
> XML/SAX is inherently a high-overhead format, best used for small
> files.


No - SAX is the XML parser for huge files.

For small files DOM and XPath is much easier.

Arne

Arne Vajhj 11-13-2008 12:45 AM

Re: Slowness of SAX
 
Sigfried wrote:
> Hi, using a java profiler, i've realized that SAX is consuming too much
> time:
> - endElement + startElement 40 %
> - *.read 7 %
> - a few <= 1%
>
> So SAX take about 50 % of the time !!
>
> Do you know faster XML API ?


SAX is usually the fastest XML parser.

And I can not see why you are surprised that the XML parser
uses most of the CPU time when doing XML parsing.

Arne

Daniel Pitts 11-13-2008 01:14 AM

Re: Slowness of SAX
 
Sigfried wrote:
> Hi, using a java profiler, i've realized that SAX is consuming too much
> time:
> - endElement + startElement 40 %
> - *.read 7 %
> - a few <= 1%
>
> So SAX take about 50 % of the time !!
>
> Do you know faster XML API ?

SAX uses callbacks. startElement/endElement probably calls some code
that processes the result. It is *that* code which is taking up CPU
time, you should see what is under that part of the callstack.

--
Daniel Pitts' Tech Blog: <http://virtualinfinity.net/wordpress/>

Lew 11-13-2008 01:23 AM

Re: Slowness of SAX
 
Arne Vajhøj wrote:
> Roedy Green wrote:
>> On Wed, 12 Nov 2008 09:16:44 +0100, Sigfried <sig.fried@hotmail.com>
>> wrote, quoted or indirectly quoted someone who said :
>>> Do you know faster XML API ?

>>
>> XML/SAX is inherently a high-overhead format, best used for small
>> files.

>
> No - SAX is the XML parser for huge files.
>
> For small files DOM and XPath is much easier.


Quite so. The advantage of SAX over DOM is that it is quite fast, very easy
on memory requirements and suitable for single-pass processing of XML
documents. Its disadvantage is that it does not keep an in-memory
representation of the XML document for repeated processing.

--
Lew


All times are GMT. The time now is 10:11 AM.

Powered by vBulletin®. Copyright ©2000 - 2014, vBulletin Solutions, Inc.
SEO by vBSEO ©2010, Crawlability, Inc.