Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > XML > How to edit a large xml file (250MB)?

Reply
Thread Tools

How to edit a large xml file (250MB)?

 
 
setar
Guest
Posts: n/a
 
      08-23-2006
How can I edit an xml file which has 250MB? I tried to use UltraEdit, Visual
Studio, Eclipse, Stylus Studio and XMLSpy editors but these programs can't
read this file because it is too big. SmEdit reads only the first MB of the
file and doesn't support UTF-8 (I need program which supports it). Now I use
XVI32 which is hexadecimal editor, but it can be useful only is editing
small number of characters - deleting and inserting characters to large
files is very tiring.



I don't need xml editor. It can be any text editor without xml validation
etc. I don't know how such a program should work, but in my opinion there
should be such a program.



 
Reply With Quote
 
 
 
 
Andy Dingley
Guest
Posts: n/a
 
      08-23-2006
setar wrote:

> How can I edit an xml file which has 250MB?


Don't make XML files that are 250MB in size.

Editing is simple. So if you can't even edit it, how are you going to
process it? If you run XPath on it, what do you think performance will
be like?

There are (rare) times when XML works in these volumes, but in general
it doesn't. If you're looking for a stream-based format (easy to work
with in huge volumes) then XML's single root element constraint works
against you. If you're trying to build a database, then XML's lack of
efficient querying is a performance hit. If you want 250MB files as an
encapsulated data format (maybe ETL on a database) then it's workable,
but the document lifecycle is a fairly short
create-transfer-load-delete.

So if your application requires a 250MB data entity, then think
carefully about the tools you're using. Life might be simpler that way.

I also have lots of 250MB files around, but I don't edit them by hand.
I have computers to do that sort of thing for me instead.

 
Reply With Quote
 
 
 
 
Juergen Kahrs
Guest
Posts: n/a
 
      08-23-2006
setar wrote:

> I don't need xml editor. It can be any text editor without xml validation
> etc. I don't know how such a program should work, but in my opinion there
> should be such a program.


Use vim, the improved vi editor. I have edited such
large XML files with vi several times and you hardly
notice the difference between 10 MB and 200 MB files.
Current versions of vim (when configured properly)
can also edit any UTF-8 characters, for example Japanese.
 
Reply With Quote
 
Joe Kesselman
Guest
Posts: n/a
 
      08-23-2006
setar wrote:
> How can I edit an xml file which has 250MB?


Emacs also supports UTF-8, of course.

How much swap space have you got? That's what's going to control your
maximum buffer size, assuming you've got a reasonably intelligent editor
implementation.

Another alternative is a stream editor -- the Unix tool "sed" or
something equivalent. Downside of that is that it isn't interactive; you
have to essentially write a program that tells it how to find the points
you want changed and what you want done with them.

If you'd rather stay in the XML world, you could find or write a stream
editor based on SAX streams; this is one of the classic situations where
SAX can have advantages over DOM-based processing.

Or find/write a tool that will handle your document in chunks, either
text-based or SAX-based. Again, that presumes that what you're doing
divides up nicely.

Which of these approaches/tools makes the most sense depends on exactly
what you're trying to do to the file.


--
() ASCII Ribbon Campaign | Joe Kesselman
/\ Stamp out HTML e-mail! | System architexture and kinetic poetry
 
Reply With Quote
 
Tjerk Wolterink
Guest
Posts: n/a
 
      08-23-2006
setar schreef:
> How can I edit an xml file which has 250MB? I tried to use UltraEdit, Visual
> Studio, Eclipse, Stylus Studio and XMLSpy editors but these programs can't
> read this file because it is too big. SmEdit reads only the first MB of the
> file and doesn't support UTF-8 (I need program which supports it). Now I use
> XVI32 which is hexadecimal editor, but it can be useful only is editing
> small number of characters - deleting and inserting characters to large
> files is very tiring.
>
>
>
> I don't need xml editor. It can be any text editor without xml validation
> etc. I don't know how such a program should work, but in my opinion there
> should be such a program.
>
>


Use a native XML-Database to store your xml data, and edit it using
XQuery,
there already exists databases that supports xml file sizes into the
multiple GB range:

http://exist.sourceforge.net/
http://xml.apache.org/xindice/
 
Reply With Quote
 
Joe Kesselman
Guest
Posts: n/a
 
      08-23-2006
Tjerk Wolterink wrote:
> Use a native XML-Database to store your xml data, and edit it using XQuery,
> there already exists databases that supports xml file sizes into the
> multiple GB range:
>
> http://exist.sourceforge.net/
> http://xml.apache.org/xindice/


IBM's DB2 now has a native-XML data format, making it a world-class XML
database as well as a world-class relational database.

--
() ASCII Ribbon Campaign | Joe Kesselman
/\ Stamp out HTML e-mail! | System architexture and kinetic poetry
 
Reply With Quote
 
acristata@yahoo.co.uk
Guest
Posts: n/a
 
      08-23-2006
In case you haven't got the hang of vim yet ...

If you're on Windows you could try TextPad (you can get a full-featured
evaluation version to test) or EmEditor (free standard version with
most features). Obviously your system's resources will determine
whether this works for you and how well, but I can open a 250MB text
file with those text editors and it looks as though I could edit.
Performance seems better on EmEditor, TextPad doesn't have full Unicode
display support but seems like it might cope... That said, I've never
opened such large files except out of curiosity...

Also check that you aren't using UTF-16 as a file encoding --
conversion to UTF-8 could save you some space.

XML editors will obviously have problems opening such large files
because they have to parse the file (some XML editors have an option
which you can set so that files aren't automatically parsed on
opening). One good open-source XML editor which aims at efficiency is
XML Copy Editor which you'll find on sourceforge. It won't manage files
of that size, though.

Tim

setar wrote:
> How can I edit an xml file which has 250MB? I tried to use UltraEdit, Visual
> Studio, Eclipse, Stylus Studio and XMLSpy editors but these programs can't
> read this file because it is too big. SmEdit reads only the first MB of the
> file and doesn't support UTF-8 (I need program which supports it). Now I use
> XVI32 which is hexadecimal editor, but it can be useful only is editing
> small number of characters - deleting and inserting characters to large
> files is very tiring.
>
>
>
> I don't need xml editor. It can be any text editor without xml validation
> etc. I don't know how such a program should work, but in my opinion there
> should be such a program.


 
Reply With Quote
 
setar
Guest
Posts: n/a
 
      08-23-2006

User "Andy Dingley" wrote:
> Don't make XML files that are 250MB in size.


It isn't file created by me. File contains about 100'000 records which I
import to my program. Everything is working. Unfortunately several records
in the file have errors which I want to correct. I don't want to write
additional code to be able to correct imported data. I prefer to make some
changes in source file. Of course I could write code for editing imported
data, but I don't need this functionality except for correcting mentioned
errors. I also have no access to editor which exported mentioned xml file.

User "Juergen Kahrs" wrote:
> Use vim, the improved vi editor. I have edited such
> large XML files with vi several times ....


Thanks! I've checked it and it's good solution for me.
With this configuration:
- set enc=utf-8 (UTF-8 encoding)
- set undolevels=-1 (maybe with this vim is faster ...)
efficiencies for subtasks of editing in gvim are:
- opening 250MB xml file: 15 seconds
- searching word (case sensitive): to 20 seconds (depending on its place
in file)
In my opinion it could be better because for example in Total
Commander's default viewer it takes only 2 seconds!
But it is acceptable, because I want only to make a few dozen of
changes.
- going to specified line of the file by specifying line number or by
draging vertical slider by mouse: veeeery long, so don't do this!
- making small changes (for example inserting and deleting some lines of
text; writing something): fluently
- writing changes to file (for example when we will do all changes): 15
seconds
I have Athlon 2500 with 1GB RAM. gvim uses only 300MB, so 512MB of RAM were
free.

User "Juergen Kahrs" wrote:
> ... and you hardly
> notice the difference between 10 MB and 200 MB files.
> Current versions of vim (when configured properly)
> can also edit any UTF-8 characters, for example Japanese.


I can notice difference between searches which take 2 seconds and 20
seconds But you are right that "making small changes (for example
inserting and deleting some lines of text; writing something)" is very fast.

User "Joe Kesselman" wrote:
>Ather alternative is a stream editor -- the Unix tool "sed" or
>something equivalent. Downside of that is that it isn't interactive; you
>have to essentially write a program that tells it how to find the points
>you want changed and what you want done with them.


I would prefer something interactive, because every change will be different
.... I dont want to write a program every time ...

>Or find/write a tool that will handle your document in chunks, either
>text-based or SAX-based. Again, that presumes that what you're doing
>divides up nicely.


Unfortunatelly I can't find such a tool ...

User http://www.velocityreviews.com/forums/(E-Mail Removed) wrote:
>If you're on Windows you could try TextPad (you can get a full-featured
>evaluation version to test) or EmEditor (free standard version with
>most features).


Here are statistics with default configuration:
- opening 250MB xml file: 70 seconds
- searching word at end of file: 45 seconds
- draging vertical slider by mouse: fluently
- making small changes (for example inserting and deleting some lines of
text; writing something): sometimes 0.5 second, sometimes 30 seconds ((
30 seconds is long, but maybe it will be acceptable for someone ...
- writing changes to file (for example when we will do all changes): not
tested

P.S. Sorry for errors, my English isn't good.



 
Reply With Quote
 
=?ISO-8859-1?Q?J=FCrgen_Kahrs?=
Guest
Posts: n/a
 
      08-23-2006
setar wrote:

> efficiencies for subtasks of editing in gvim are:
> - opening 250MB xml file: 15 seconds


7 seconds on my AMD Sempron 2800+ (SuSE Linux 10.1).

> - searching word (case sensitive): to 20 seconds (depending on its place
> in file)


18 seconds on my PC for searching until end of file.

> - going to specified line of the file by specifying line number or by
> draging vertical slider by mouse: veeeery long, so don't do this!


You shouldnt use gvim but the original vim on Linux.
Going to line number 5000000 works instantly on my PC.

> - writing changes to file (for example when we will do all changes): 15
> seconds


15 seconds also on my PC.

> I have Athlon 2500 with 1GB RAM. gvim uses only 300MB, so 512MB of RAM were
> free.


300 MB used by vim on my PC also.

> I can notice difference between searches which take 2 seconds and 20
> seconds But you are right that "making small changes (for example
> inserting and deleting some lines of text; writing something)" is very fast.


That's true, I also noticed a "slight" difference.

>> Or find/write a tool that will handle your document in chunks, either
>> text-based or SAX-based. Again, that presumes that what you're doing
>> divides up nicely.

>
> Unfortunatelly I can't find such a tool ...


Before you choose a tool you have to find out if you
can assume that XML files are well-formed. If they _are_
well-formed, than you can choose among a large set of
tools on the marke. Otherwise, you have to use an editor.

I guess you are better off using vim.
But if you consider using a tool, have a look at this one:

http://home.vrweb.de/~juergen.kahrs/gawk/XML/

Good luck.
 
Reply With Quote
 
Peter Flynn
Guest
Posts: n/a
 
      08-23-2006
setar wrote:
> How can I edit an xml file which has 250MB? I tried to use UltraEdit, Visual
> Studio, Eclipse, Stylus Studio and XMLSpy editors but these programs can't
> read this file because it is too big. SmEdit reads only the first MB of the
> file and doesn't support UTF-8 (I need program which supports it). Now I use
> XVI32 which is hexadecimal editor, but it can be useful only is editing
> small number of characters - deleting and inserting characters to large
> files is very tiring.


Emacs. With psgml and xxml and onsgmls if you want DTD validation.

///Peter
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Re: How include a large array? Edward A. Falk C Programming 1 04-04-2013 08:07 PM
Edit large xml files billsahiker@yahoo.com XML 11 02-19-2008 11:36 PM
Different results parsing a XML file with XML::Simple (XML::Sax vs. XML::Parser) Erik Wasser Perl Misc 5 03-05-2006 10:09 PM
Snapshot restraint - edit, edit, edit Alan Browne Digital Photography 24 05-10-2005 10:15 PM
Snapshot restraint - edit, edit, edit Patrick Digital Photography 0 05-06-2005 10:53 PM



Advertisments