Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > Memory errors with large zip files

Reply
Thread Tools

Memory errors with large zip files

 
 
Lorn
Guest
Posts: n/a
 
      05-20-2005
Is there a limitation with python's zipfile utility that limits the
size of a file that can be extracted? I'm currently trying to extract
125MB zip files with files that are uncompressed to > 1GB and am
receiving memory errors. Indeed my ram gets maxed during extraction and
then the script quits. Is there a way to spool to disk on the fly, or
is necessary that python opens the entire file before writing? The code
below iterates through a directory of zip files and extracts them
(thanks John!), however for testing I've just been using one file:

zipnames = [x for x in glob.glob('*.zip') if isfile(x)]
for zipname in zipnames:
zf =zipfile.ZipFile (zipname, 'r')
for zfilename in zf.namelist():
newFile = open ( zfilename, "wb")
newFile.write (zf.read (zfilename))
newFile.close()
zf.close()


Any suggestions or comments on how I might be able to work with zip
files of this size would be very helpful.

Best regards,
Lorn

 
Reply With Quote
 
 
 
 
Lorn
Guest
Posts: n/a
 
      05-21-2005
Ok, I'm not sure if this helps any, but in debugging it a bit I see the
script stalls on:

newFile.write (zf.read (zfilename))

The memory error generated references line 357 of the zipfile.py
program at the point of decompression:

elif zinfo.compress_type == ZIP_DEFLATED:
if not zlib:
raise RuntimeError, \
"De-compression requires the (missing) zlib module"
# zlib compress/decompress code by Jeremy Hylton of CNRI
dc = zlib.decompressobj(-15)
bytes = dc.decompress(bytes) ### <------ right here

Is there anyway to modify how my code is approaching this or perhaps
how the zipfile code is handling it or do I need to just invest in more
RAM? I currently have 512 MB and thought that would be plenty....
perhaps I was wrong . If anyone has any ideas it would truly be very
helpful.

Lorn

 
Reply With Quote
 
 
 
 
Do Re Mi chel La Si Do
Guest
Posts: n/a
 
      05-21-2005
Hi


I had make this test (try) :

- create 12 txt's files of 100 MB (exactly 102 400 000 bytes)
- create the file "tst.zip" who contains this 12 files (but the file result
is only 1 095 965 bytes size...)
- delete the 12 txt's files
- try your code

And... it's OK for me.

But : the compressed file is only 1 MB of size ; I had 1 GB of RAM ; I use
windows-XP

Sorry, because :
1) my english is bad
2) I had no found your problem


Michel Claveau






 
Reply With Quote
 
John Machin
Guest
Posts: n/a
 
      05-21-2005
On 20 May 2005 18:04:22 -0700, "Lorn" <> wrote:

>Ok, I'm not sure if this helps any, but in debugging it a bit I see the
>script stalls on:
>
>newFile.write (zf.read (zfilename))
>
>The memory error generated references line 357 of the zipfile.py
>program at the point of decompression:
>
>elif zinfo.compress_type == ZIP_DEFLATED:
> if not zlib:
> raise RuntimeError, \
> "De-compression requires the (missing) zlib module"
> # zlib compress/decompress code by Jeremy Hylton of CNRI
> dc = zlib.decompressobj(-15)
> bytes = dc.decompress(bytes) ### <------ right here
>



The basic problem is that the zipfile module is asking the "dc" object
to decompress the whole file at once -- so you would need (at least)
enough memory to hold both the compressed file (C) and the
uncompressed file (U). There is also a possibility that this could
rise to 2U instead of U+C -- read a few lines further on:

bytes = bytes + ex

>Is there anyway to modify how my code is approaching this


You're doing the best you can, as far as I can tell.

> or perhaps
>how the zipfile code is handling it


Read this:
http://docs.python.org/lib/module-zlib.html

If you think you can work out how to modify zipfile.py to feed
dc.decompressobj a chunk of data at a time, properly manipulating
dc.unconsumed_tail, and keeping memory usage to a minimum, then go for
it

Reading the source of the Python zlib module, plus this page from the
zlib website could be helpful, perhaps even necessary:
http://www.gzip.org/zlib/zlib_how.html

See also the following post to this newsgroup:
From: John Goerzen <jgoer...@complete.org>
Newsgroups: comp.lang.python
Subject: Fixes to zipfile.py [PATCH]
Date: Fri, 07 Mar 2003 16:39:25 -0600

.... his patch obviously wasn't accepted


> or do I need to just invest in more
>RAM? I currently have 512 MB and thought that would be plenty....
>perhaps I was wrong .


Before you do anything rash (hacking zipfile.py or buying more
memory), take a step back for a moment:

Is this a one-off exercise or a regular exercise? Does it *really*
need to be done programatically? There will be at least one
command-line unzipper program for your platform . One-off req't: do it
manually.
Regular: Try using the unzipper manually; if all the available
unzippers on your platform die with a memory allocation problem then
you really have a problem. If it works, then instead of using the
zipfile module, use the unzipper program from your Python code via a
subprocess.

HTH,
John


 
Reply With Quote
 
Marcus Lowland
Guest
Posts: n/a
 
      05-23-2005
Thank for the detailed reply John! I guess it turned out to be a bit
tougher than I originally thought ....

Reading over your links, I think I better not attempt rewriting the
zipfile.py program... a little over my head . The best solution,
from everything I read seems to be calling an unzipper program from a
subprocess. I assume you mean using execfile()? I can't think of
another way.

Anyway, thank you very much for your help, it's been very educational.

Best regards,
Lorn

 
Reply With Quote
 
John Machin
Guest
Posts: n/a
 
      05-23-2005
On 23 May 2005 09:28:15 -0700, "Marcus Lowland" <>
wrote:

>Thank for the detailed reply John! I guess it turned out to be a bit
>tougher than I originally thought ....
>
>Reading over your links, I think I better not attempt rewriting the
>zipfile.py program... a little over my head . The best solution,
>from everything I read seems to be calling an unzipper program from a
>subprocess. I assume you mean using execfile()? I can't think of
>another way.


Errrmmmm ... no, execfile runs a Python source file.

Check out the subprocess module:

"""
6.8 subprocess -- Subprocess management

New in version 2.4.

The subprocess module allows you to spawn new processes, connect to
their input/output/error pipes, and obtain their return codes. This
module intends to replace several other, older modules and functions,
such as:

os.system
os.spawn*
os.popen*
popen2.*
commands.*
"""


 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
How to compress a big file into many zip files with Archive::Zip? Bo Yang Perl Misc 9 11-20-2006 11:39 AM
Archive::Zip - how to create huge zip files ? MoshiachNow Perl Misc 2 10-04-2006 09:09 PM
Possibility to add a zip-file to a new zip-file with "add to zip" (right-click) ?? erikkie@casema.nl Computer Support 4 06-26-2006 12:18 AM
java.util.zip - problem opening some legitimite me zip files Alex Hunsley Java 1 09-16-2004 02:06 PM
Backing Up Large Files..Or A Large Amount Of Files Scott D. Weber For Unuathorized Thoughts Inc. Computer Support 1 09-19-2003 07:28 PM



Advertisments
 



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57