Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Perl > Perl Misc > Sorting a large XML file

Reply
Thread Tools

Sorting a large XML file

 
 
Rishi Dhupar
Guest
Posts: n/a
 
      04-19-2005
Hi,

I have a 40-50 mb XML files consisting of 1000's of nodes that look
like:
<File>
<FileOwner></FileOwner>
<FilePath>C:\perl_files</FilePath>
<FileName>1.xml</FileName>
<FileAccessed>4/18/2005</FileAccessed>
<FileModified>4/15/2005</FileModified>
<FileCreated>4/18/2005</FileCreated>
<FileSize>1342</FileSize>
</File>

I don't really care what it is sorted by, but as long as I can sort the
file in some manor that is the same each time.

Is there any method to doing this? Loading the XML into memory and then
sorting is too memory intensive. My files could get upwards to 200mb.

Thanks for any tips

 
Reply With Quote
 
 
 
 
xhoster@gmail.com
Guest
Posts: n/a
 
      04-19-2005
"Rishi Dhupar" <(E-Mail Removed)> wrote:
> Hi,
>
> I have a 40-50 mb XML files consisting of 1000's of nodes that look
> like:
> <File>
> <FileOwner></FileOwner>
> <FilePath>C:\perl_files</FilePath>
> <FileName>1.xml</FileName>
> <FileAccessed>4/18/2005</FileAccessed>
> <FileModified>4/15/2005</FileModified>
> <FileCreated>4/18/2005</FileCreated>
> <FileSize>1342</FileSize>
> </File>
>
> I don't really care what it is sorted by, but as long as I can sort the
> file in some manor that is the same each time.


What was wrong with Ian Wilson's response from the last time when you
asked a very similar question?

> Is there any method to doing this? Loading the XML into memory and then
> sorting is too memory intensive. My files could get upwards to 200mb.
>
> Thanks for any tips


My tip would be to not use XML for something it is ill-suited for.

Xho

--
-------------------- http://NewsReader.Com/ --------------------
Usenet Newsgroup Service $9.95/Month 30GB
 
Reply With Quote
 
 
 
 
John Bokma
Guest
Posts: n/a
 
      04-19-2005
Rishi Dhupar wrote:

> Hi,
>
> I have a 40-50 mb XML files consisting of 1000's of nodes that look
> like:
> <File>
> <FileOwner></FileOwner>
> <FilePath>C:\perl_files</FilePath>
> <FileName>1.xml</FileName>
> <FileAccessed>4/18/2005</FileAccessed>
> <FileModified>4/15/2005</FileModified>
> <FileCreated>4/18/2005</FileCreated>
> <FileSize>1342</FileSize>
> </File>
>
> I don't really care what it is sorted by, but as long as I can sort the
> file in some manor that is the same each time.
>
> Is there any method to doing this? Loading the XML into memory and then
> sorting is too memory intensive. My files could get upwards to 200mb.


Parse it using a fast parser and make the info very compact, e.g. glue path
and name together, drop the // from the date, etc.

If you want to pay me, drop me a line .

--
John Small Perl scripts: http://johnbokma.com/perl/
Perl programmer available: http://castleamber.com/
Happy Customers: http://castleamber.com/testimonials.html

 
Reply With Quote
 
rishid@gmail.com
Guest
Posts: n/a
 
      04-20-2005
Just found xml::filter::sort

It is a godsend, does everything I need and has buffers and max memory
for large files. Pretty amazing module actually. Just found a bug in
it which is ticking me off, hopefully the author can get back to me.

If anyone has any experience with it here is the bug:
My XML Input file
<File>
<FileOwner></FileOwner>
<FilePath>C:\perl_files</FilePath>
<FileName>FSW_Output.xml</FileName>
<FileAccessed>4/18/2005</FileAccessed>
<FileModified>4/18/2005</FileModified>
<FileCreated>4/18/2005</FileCreated>
<FileSize>0</FileSize>
</File>

This is what is outputted:
<File>
<FileOwner />
<FilePath>C:\perl_files</FilePath>
<FileName>FSW_Output.xml</FileName>
<FileAccessed>4/18/2005</FileAccessed>
<FileModified>4/18/2005</FileModified>
<FileCreated>4/18/2005</FileCreated>
<FileSize />0
</File>

The output, FileOwner and FileSize gets messed up. Cannot figure out
what is wrong with it.

 
Reply With Quote
 
Sherm Pendley
Guest
Posts: n/a
 
      04-20-2005
http://www.velocityreviews.com/forums/(E-Mail Removed) wrote:

> This is what is outputted:
> <File>
> <FileOwner />
> <FilePath>C:\perl_files</FilePath>
> <FileName>FSW_Output.xml</FileName>
> <FileAccessed>4/18/2005</FileAccessed>
> <FileModified>4/18/2005</FileModified>
> <FileCreated>4/18/2005</FileCreated>
> <FileSize />0
> </File>
>
> The output, FileOwner and FileSize gets messed up. Cannot figure out
> what is wrong with it.


Nothing wrong with FileOwner - that's a valid way to represent an empty
element in XML. Parsers will treat <FileOwner /> the same they would a
pair of opening and closing tags with nothing between them.

Don't know what happened to FileSize though...

sherm--

--
Cocoa programming in Perl: http://camelbones.sourceforge.net
Hire me! My resume: http://www.dot-app.org
 
Reply With Quote
 
John Bokma
Guest
Posts: n/a
 
      04-20-2005
Sherm Pendley wrote:

> (E-Mail Removed) wrote:
>
>> This is what is outputted:
>> <File>
>> <FileOwner />


[ snip ]

>> <FileSize />0
>> </File>
>>
>> The output, FileOwner and FileSize gets messed up. Cannot figure out
>> what is wrong with it.

>
> Nothing wrong with FileOwner - that's a valid way to represent an empty
> element in XML. Parsers will treat <FileOwner /> the same they would a
> pair of opening and closing tags with nothing between them.
>
> Don't know what happened to FileSize though...


Best guess: in a badly written test, 0 is seen as empty string

--
John Small Perl scripts: http://johnbokma.com/perl/
Perl programmer available: http://castleamber.com/
Happy Customers: http://castleamber.com/testimonials.html

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
median of large data set (from large file) friend.05@gmail.com Perl Misc 5 04-02-2009 04:06 AM
Sorting Large File (Code/Performance) Ira.Kovac@gmail.com Python 23 02-02-2008 02:40 PM
inputing, paging, sorting, a large text file JJ ASP .Net 13 06-08-2007 10:28 AM
sorting large file bisuvious C++ 12 04-05-2007 11:34 PM
Different results parsing a XML file with XML::Simple (XML::Sax vs. XML::Parser) Erik Wasser Perl Misc 5 03-05-2006 10:09 PM



Advertisments