Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > XML > Help to Process two very big xml files....

Reply
Thread Tools

Help to Process two very big xml files....

 
 
fuel
Guest
Posts: n/a
 
      06-11-2008
Hello,
I have two big xml files (around 50-60 MB) each. and I need to
process the data within each of them. The problem is, I need to
process each node and compare with the other nodes in the other xml
file. After iterating through all the nodes, I need to find those
nodes which have changed or which have been newly introduced.
Assume the following xml structure,

<?xml version="1.0"?>
<root>
<nodeToProcess>

</nodeToProcess>
.....
</root>

I have two such xml files. I keep one xml file as the reference and
compare it with the other. To solve this problem, I thought, I could
use XPath. However, for now, only DOM based XPath processors are
there. Since the file is very huge, I dont think I can afford DOM.
( Memory constraint )

How can I approach this problem ? what would be the right way to start
with.

P.S ( I am trying to access these elements through Java)


 
Reply With Quote
 
 
 
 
Manuel Collado
Guest
Posts: n/a
 
      06-11-2008
fuel escribió:
> Hello,
> I have two big xml files (around 50-60 MB) each. and I need to
> process the data within each of them. The problem is, I need to
> process each node and compare with the other nodes in the other xml
> file. After iterating through all the nodes, I need to find those
> nodes which have changed or which have been newly introduced.
> Assume the following xml structure,
>
> <?xml version="1.0"?>
> <root>
> <nodeToProcess>
>
> </nodeToProcess>
> .....
> </root>
>
> I have two such xml files. I keep one xml file as the reference and
> compare it with the other. To solve this problem, I thought, I could
> use XPath. However, for now, only DOM based XPath processors are
> there. Since the file is very huge, I dont think I can afford DOM.
> ( Memory constraint )
>
> How can I approach this problem ? what would be the right way to start
> with.


There are ready-to-run tools for differencing XML files. Please google
for xml-diff.

>
> P.S ( I am trying to access these elements through Java)


Some of the tools are written in Java and some of them are open-source.

Don't know the performance of these tools with big files.

Hope this helps.
--
Manuel Collado - http://lml.ls.fi.upm.es/~mcollado
 
Reply With Quote
 
 
 
 
Peyo
Guest
Posts: n/a
 
      06-11-2008
fuel a écrit :

> How can I approach this problem ?


> P.S ( I am trying to access these elements through Java)


Use, through its Java API, an XML database, if possible light-weight,
in order to get optimized access to the nodes you need to process ?

Cheers,

p.
 
Reply With Quote
 
Martin Honnen
Guest
Posts: n/a
 
      06-11-2008
fuel wrote:

> How can I approach this problem ? what would be the right way to start
> with.


Considering current destktop systems with a main memory of 1 or 2 or 3
GB I don't think you will run into problems to perform XPath on 60 MB
files. Just make sure that the Java VM is allowed to allocate enough
memory http://java.sun.com/javase/6/docs/te...dows/java.html


--

Martin Honnen
http://JavaScript.FAQTs.com/
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
GIDS 2009 .Net:: Save Big, Win Big, Learn Big: Act Before Dec 29 2008 Shaguf ASP .Net 0 12-26-2008 09:29 AM
GIDS 2009 .Net:: Save Big, Win Big, Learn Big: Act Before Dec 29 2008 Shaguf ASP .Net Web Controls 0 12-26-2008 06:11 AM
GIDS 2009 Java:: Save Big, Win Big, Learn Big: Act Before Dec 29 2008 Shaguf Python 0 12-24-2008 07:35 AM
GIDS 2009 Java:: Save Big, Win Big, Learn Big: Act Before Dec 29 2008 Shaguf Ruby 0 12-24-2008 05:07 AM
Help running a very very very simple code olivier.melcher Java 8 05-12-2008 07:51 PM



Advertisments