Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > XML > What tool to use for processing large documents

Reply
Thread Tools

What tool to use for processing large documents

 
 
Luc Mercier
Guest
Posts: n/a
 
      10-23-2006
Jürgen Kahrs wrote:
> jay m wrote:
>
>> Well, there seem to be some open source XML databases
>> I don't know about your other qualifiers, but...
>>
>> http://exist.sourceforge.net/
>> http://xml.apache.org/xindice/
>>
>> are two.

>
> Interesting. Sounds like "exist" should be able
> to handle large files. But I am not convinced
> that the performance will be acceptable. This
> has to be tried to get an answer.


Well as I mentioned, I got serious problems using eXist: when I tried to
run the very first example they give in the documentation of the xml:db
api, my screen would get all red, and then Kde logged me out and I got
to the login screen... Never had anything like that before, especially
running Java code !

I tried the two current releases (standard and 'new core'), and both
produced the same result. So this product does not seem mature enough...

I'm trying xindice right now.

Thanks everyone for your suggestions.
 
Reply With Quote
 
 
 
 
Luc Mercier
Guest
Posts: n/a
 
      10-23-2006
Luc Mercier wrote:
> Jürgen Kahrs wrote:
>> jay m wrote:
>>
>>> Well, there seem to be some open source XML databases
>>> I don't know about your other qualifiers, but...
>>>
>>> http://exist.sourceforge.net/
>>> http://xml.apache.org/xindice/
>>>
>>> are two.

>> Interesting. Sounds like "exist" should be able
>> to handle large files. But I am not convinced
>> that the performance will be acceptable. This
>> has to be tried to get an answer.

>
> Well as I mentioned, I got serious problems using eXist: when I tried to
> run the very first example they give in the documentation of the xml:db
> api, my screen would get all red, and then Kde logged me out and I got
> to the login screen... Never had anything like that before, especially
> running Java code !
>
> I tried the two current releases (standard and 'new core'), and both
> produced the same result. So this product does not seem mature enough...
>
> I'm trying xindice right now.
>
> Thanks everyone for your suggestions.


All right, after wasting some time with xindice, I read in the xindice FAQ:


--
10. My 5 megabyte file is crashing the command line, help?

See FAQ #2. Xindice wasn't designed for monster documents, rather, it
was designed for collections of small to medium sized documents. The
best thing to do in this case would be to look at your 5 megabyte file,
and determine whether or not it's a good candidate for being sliced into
a set of small documents. If so, you'll want to extract the separate
documents and add them to a Xindice collection individually. A good
example of this, would be a massive document of this form:
--

So it's not suitable for me. I certainly could slice up my documents,
but not easy as much as 5Mb- pieces.

Luc
 
Reply With Quote
 
 
 
 
Luc Mercier
Guest
Posts: n/a
 
      10-23-2006
Ok, so I found a well documented list of Native XML databases:

http://www.rpbourret.com/xml/ProdsNative.htm

Three of them are explicitly said to be designed to handle large documents:
* 4Suite, 4Suite Server (free)
* Infonyte DB (commercial)
* Sonic XML Server(commercial)



The first one is in Python. I don't know how easy that is to call Python
stuff from Matlab. I'm going to check that.

Does anyone has any experience with one of the two others?

- Luc.



 
Reply With Quote
 
=?ISO-8859-1?Q?J=FCrgen_Kahrs?=
Guest
Posts: n/a
 
      10-24-2006
Luc Mercier wrote:

> Well as I mentioned, I got serious problems using eXist: when I tried to
> run the very first example they give in the documentation of the xml:db
> api, my screen would get all red, and then Kde logged me out and I got
> to the login screen... Never had anything like that before, especially
> running Java code !


That's funny. I just remembered this one:

http://vtd-xml.sourceforge.net/
VTD-XML is the next generation XML parser
that goes beyond DOM and SAX in terms of
performance, memory and ease of use.
 
Reply With Quote
 
Joseph Kesselman
Guest
Posts: n/a
 
      10-24-2006
Jürgen Kahrs wrote:
>>api, my screen would get all red, and then Kde logged me out and I got
>>to the login screen... Never had anything like that before, especially
>>running Java code !


Congratulations; you found a JVM bug. See if there was a logfile of some
sort, and if so report it to the folks maintaining that version of Java...

--
Joe Kesselman / Beware the fury of a patient man. -- John Dryden
 
Reply With Quote
 
=?ISO-8859-1?Q?J=FCrgen_Kahrs?=
Guest
Posts: n/a
 
      10-24-2006
Joseph Kesselman wrote:
> Jürgen Kahrs wrote:
>>> api, my screen would get all red, and then Kde logged me out and I got
>>> to the login screen... Never had anything like that before, especially
>>> running Java code !

>
> Congratulations; you found a JVM bug. See if there was a logfile of some
> sort, and if so report it to the folks maintaining that version of Java...
>


It was Luc Mercier who found it.
 
Reply With Quote
 
jay m
Guest
Posts: n/a
 
      10-26-2006

Jürgen Kahrs wrote:
> That's funny. I just remembered this one:
>
> http://vtd-xml.sourceforge.net/
> VTD-XML is the next generation XML parser
> that goes beyond DOM and SAX in terms of
> performance, memory and ease of use.


>From the website:

"Its memory usage is typically between 1.3x~1.5x the size of the XML
document, "
and
" VTD requires that XML document be maintained intact in memory."

For multi-GB documents, you will need a very well-equipped machine!

As an associate once told me: "yes, that's a very nice problem".
Regards
Jay

 
Reply With Quote
 
Luc Mercier
Guest
Posts: n/a
 
      11-04-2006
So, finally, after many experiments, I chose Infonyte DB, which was
clearly the best of everything I tried. It's a commercial software, but
not very expensive, handle documents up to 1 TB I think, performances
are ok, and setting everything up and getting started takes 5 min.

Thanks again to people who gave me some advices.

- Luc.
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
No more stuff on C:\Documents and Settings\[User]\My Documents\Visual Studio 2005\ craigkenisston@hotmail.com ASP .Net 1 10-18-2006 03:31 PM
Referring input document when processing multiple documents in XSLT Filip Hendrickx XML 3 02-07-2006 01:21 PM
DevX: "Processing EDI Documents into XML with Python" Claudio Grondi Python 2 01-25-2005 01:53 PM
processing XHTML1.1 documents with xml.sax webworldL@yahoo.com Python 1 08-09-2004 05:53 PM



Advertisments