Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > XML > xerces advanced usage - progresss, random access etc

Reply
Thread Tools

xerces advanced usage - progresss, random access etc

 
 
Kza
Guest
Posts: n/a
 
      09-04-2006
Hi, I am currently using xerces sax parser for c++, (I use DOM too, but
I think SAX is more relevant here) for processing and displaying fairly
large xml files. Usually I give xerces a filename, and it parses it and
thats all good. But the customer needs more features.

Feature 1: A progress display. I have tried a few times now to find a
way of asking xerces how far through a file it is in bytes, but no
luck. (I did try a per element check, but that involves a whole extra
parse at the start just to count the elements). I have tried using the
LocalFileInputSource, and getting its BinInputStream and calling itc
curPos, but its always 0.

Any ideas?

Feature 2: Loading only a "screenful" of the file at a time. I also
would like some sort of random access functionality, so if the user
scrolls down to 75% of the file, the parser skips forward to that
position and starts reading there, and when they scroll back up it goes
up and reads just that little bit of the file.

I am pretty sure feature 1 is possible with normal xerces sax, but I
have no idea how, the documentation is very sparse, naming the
functions etc but not actually saying what they do or how they should
be used.

For feature 2 it might be more complicated. A colleage mentioned some
other "object models" like xparse and xalaron (not sure how thats
pronounced or spelt) some apache project that parses xml in a random
access fashion.

Anyone got any ideas?

Thanks a lot.

 
Reply With Quote
 
 
 
 
Joe Kesselman
Guest
Posts: n/a
 
      09-05-2006
Kza wrote:
> Feature 1: A progress display.


The SAX APIs can be persuaded to give line/column information, though
unless you know how many lines there were in the file before you stared
parsing it that doesn't do you any good. Look at the Locator API.

The DOM assumes reading the file is a single operation, so the concept
of getting incremental details doesn't make much sense. You *could* plug
in a stream filter between wherever the file is being read from and the
parser, and set up that filter so it counts characters going by --
that's going to give you only a very rough progress indication, and
again it requires that you know the length before you start if you want
to report it as a percentage-complete number.

> Feature 2: Loading only a "screenful" of the file at a time.


"Screenful" is not defined in XML. Nor is starting parse from the middle
of a file. You could try to do something with incremental processing,
via throttling of ta SAX stream -- I've done that in the past -- but
keeping track of when enough has been read to fill a screen and when
more would have to be read to fill the next screen is very much an
application problem rather than a parser problem.

Random-access to an XML model isn't a problem -- the DOM can do that,
though again it isn't designed to operate on screenfuls -- but
random-order parsing really doesn't make sense. Namespaces are
context-dependent, to take one major point where that idea breaks down.


--
() ASCII Ribbon Campaign | Joe Kesselman
/\ Stamp out HTML e-mail! | System architexture and kinetic poetry
 
Reply With Quote
 
 
 
 
Boris Kolpackov
Guest
Posts: n/a
 
      09-08-2006
"Kza" <(E-Mail Removed)> writes:

> Feature 1: A progress display. I have tried a few times now to find a
> way of asking xerces how far through a file it is in bytes, but no
> luck. (I did try a per element check, but that involves a whole extra
> parse at the start just to count the elements). I have tried using the
> LocalFileInputSource, and getting its BinInputStream and calling itc
> curPos, but its always 0.
>
> Any ideas?


You can implement your own InputStream which will keep track of how
much data Xerces-C++ has consumed so far. Combine this with the total
length of the file and you can calculate the progress.


> Feature 2: Loading only a "screenful" of the file at a time. I also
> would like some sort of random access functionality, so if the user
> scrolls down to 75% of the file, the parser skips forward to that
> position and starts reading there, and when they scroll back up it goes
> up and reads just that little bit of the file.


This one would definitely be easier with an in-memory model (e.g., DOM).


hth,
-boris


--
Boris Kolpackov
Code Synthesis Tools CC
http://www.codesynthesis.com
Open-Source, Cross-Platform C++ XML Data Binding
 
Reply With Quote
 
Kza
Guest
Posts: n/a
 
      09-08-2006
Just as an update here, and I hope top posting is de riguer for this
news group,

I solved feature one with xerces getSrcOffset() method. Even though I
had to wrap it with an exception catcher, as the particular version we
are using at work at the moment causes an exception when parsing is
finished (but before the parse method returns) and theres no other way
to find out when its finished.

Feature 2 I dont have a solution for at the moment. DOM is not an
option as the whole point is that a whole file uses up too much memory,
and DOM loads the whole thing at once, thats why we wanted to load in a
section at a time.

If it turns out really important to analyse large files, I will just
have to write a seperate program that uses sax, and maybe only filters
for certain things, or perhaps reparses when people want to "scroll up"
which has its own time trade off for saving memory. Its up to the
customers really. I suspect the real solution is a non-xml indexed
binary format. But the memory issue isnt actually as big as the
customers think it is.. I will work something out.

Boris Kolpackov wrote:
> "Kza" <(E-Mail Removed)> writes:
>
> > Feature 1: A progress display. I have tried a few times now to find a
> > way of asking xerces how far through a file it is in bytes, but no
> > luck. (I did try a per element check, but that involves a whole extra
> > parse at the start just to count the elements). I have tried using the
> > LocalFileInputSource, and getting its BinInputStream and calling itc
> > curPos, but its always 0.
> >
> > Any ideas?

>
> You can implement your own InputStream which will keep track of how
> much data Xerces-C++ has consumed so far. Combine this with the total
> length of the file and you can calculate the progress.
>
>
> > Feature 2: Loading only a "screenful" of the file at a time. I also
> > would like some sort of random access functionality, so if the user
> > scrolls down to 75% of the file, the parser skips forward to that
> > position and starts reading there, and when they scroll back up it goes
> > up and reads just that little bit of the file.

>
> This one would definitely be easier with an in-memory model (e.g., DOM).
>
>
> hth,
> -boris
>
>
> --
> Boris Kolpackov
> Code Synthesis Tools CC
> http://www.codesynthesis.com
> Open-Source, Cross-Platform C++ XML Data Binding


 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Math.random() and Math.round(Math.random()) and Math.floor(Math.random()*2) VK Javascript 15 05-02-2010 03:43 PM
Re: PIL (etc etc etc) on OS X Kevin Walzer Python 4 08-13-2008 08:27 AM
random.random(), random not defined!? globalrev Python 4 04-20-2008 08:12 AM
is Random Access File really "random access"? Kevin Java 19 02-13-2006 09:31 PM
Upgrade of Xalan 1.2.2 and Xerces 1.4.4 to Xalan 2.6 and Xerces 2.6.2 cvissy XML 0 11-16-2004 07:06 AM



Advertisments