Velocity Reviews

Velocity Reviews (http://www.velocityreviews.com/forums/index.php)
-   Python (http://www.velocityreviews.com/forums/f43-python.html)
-   -   Memory errors with large zip files (http://www.velocityreviews.com/forums/t345179-memory-errors-with-large-zip-files.html)

Lorn 05-20-2005 05:10 PM

Memory errors with large zip files
 
Is there a limitation with python's zipfile utility that limits the
size of a file that can be extracted? I'm currently trying to extract
125MB zip files with files that are uncompressed to > 1GB and am
receiving memory errors. Indeed my ram gets maxed during extraction and
then the script quits. Is there a way to spool to disk on the fly, or
is necessary that python opens the entire file before writing? The code
below iterates through a directory of zip files and extracts them
(thanks John!), however for testing I've just been using one file:

zipnames = [x for x in glob.glob('*.zip') if isfile(x)]
for zipname in zipnames:
zf =zipfile.ZipFile (zipname, 'r')
for zfilename in zf.namelist():
newFile = open ( zfilename, "wb")
newFile.write (zf.read (zfilename))
newFile.close()
zf.close()


Any suggestions or comments on how I might be able to work with zip
files of this size would be very helpful.

Best regards,
Lorn


Lorn 05-21-2005 01:04 AM

Re: Memory errors with large zip files
 
Ok, I'm not sure if this helps any, but in debugging it a bit I see the
script stalls on:

newFile.write (zf.read (zfilename))

The memory error generated references line 357 of the zipfile.py
program at the point of decompression:

elif zinfo.compress_type == ZIP_DEFLATED:
if not zlib:
raise RuntimeError, \
"De-compression requires the (missing) zlib module"
# zlib compress/decompress code by Jeremy Hylton of CNRI
dc = zlib.decompressobj(-15)
bytes = dc.decompress(bytes) ### <------ right here

Is there anyway to modify how my code is approaching this or perhaps
how the zipfile code is handling it or do I need to just invest in more
RAM? I currently have 512 MB and thought that would be plenty....
perhaps I was wrong :-(. If anyone has any ideas it would truly be very
helpful.

Lorn


Do Re Mi chel La Si Do 05-21-2005 04:43 AM

Re: Memory errors with large zip files
 
Hi


I had make this test (try) :

- create 12 txt's files of 100 MB (exactly 102 400 000 bytes)
- create the file "tst.zip" who contains this 12 files (but the file result
is only 1 095 965 bytes size...)
- delete the 12 txt's files
- try your code

And... it's OK for me.

But : the compressed file is only 1 MB of size ; I had 1 GB of RAM ; I use
windows-XP

Sorry, because :
1) my english is bad
2) I had no found your problem


Michel Claveau







John Machin 05-21-2005 09:51 AM

Re: Memory errors with large zip files
 
On 20 May 2005 18:04:22 -0700, "Lorn" <efoda5446@yahoo.com> wrote:

>Ok, I'm not sure if this helps any, but in debugging it a bit I see the
>script stalls on:
>
>newFile.write (zf.read (zfilename))
>
>The memory error generated references line 357 of the zipfile.py
>program at the point of decompression:
>
>elif zinfo.compress_type == ZIP_DEFLATED:
> if not zlib:
> raise RuntimeError, \
> "De-compression requires the (missing) zlib module"
> # zlib compress/decompress code by Jeremy Hylton of CNRI
> dc = zlib.decompressobj(-15)
> bytes = dc.decompress(bytes) ### <------ right here
>



The basic problem is that the zipfile module is asking the "dc" object
to decompress the whole file at once -- so you would need (at least)
enough memory to hold both the compressed file (C) and the
uncompressed file (U). There is also a possibility that this could
rise to 2U instead of U+C -- read a few lines further on:

bytes = bytes + ex

>Is there anyway to modify how my code is approaching this


You're doing the best you can, as far as I can tell.

> or perhaps
>how the zipfile code is handling it


Read this:
http://docs.python.org/lib/module-zlib.html

If you think you can work out how to modify zipfile.py to feed
dc.decompressobj a chunk of data at a time, properly manipulating
dc.unconsumed_tail, and keeping memory usage to a minimum, then go for
it :-)

Reading the source of the Python zlib module, plus this page from the
zlib website could be helpful, perhaps even necessary:
http://www.gzip.org/zlib/zlib_how.html

See also the following post to this newsgroup:
From: John Goerzen <jgoer...@complete.org>
Newsgroups: comp.lang.python
Subject: Fixes to zipfile.py [PATCH]
Date: Fri, 07 Mar 2003 16:39:25 -0600

.... his patch obviously wasn't accepted :-(


> or do I need to just invest in more
>RAM? I currently have 512 MB and thought that would be plenty....
>perhaps I was wrong :-(.


Before you do anything rash (hacking zipfile.py or buying more
memory), take a step back for a moment:

Is this a one-off exercise or a regular exercise? Does it *really*
need to be done programatically? There will be at least one
command-line unzipper program for your platform . One-off req't: do it
manually.
Regular: Try using the unzipper manually; if all the available
unzippers on your platform die with a memory allocation problem then
you really have a problem. If it works, then instead of using the
zipfile module, use the unzipper program from your Python code via a
subprocess.

HTH,
John



Marcus Lowland 05-23-2005 04:28 PM

Re: Memory errors with large zip files
 
Thank for the detailed reply John! I guess it turned out to be a bit
tougher than I originally thought :-)....

Reading over your links, I think I better not attempt rewriting the
zipfile.py program... a little over my head :-). The best solution,
from everything I read seems to be calling an unzipper program from a
subprocess. I assume you mean using execfile()? I can't think of
another way.

Anyway, thank you very much for your help, it's been very educational.

Best regards,
Lorn


John Machin 05-23-2005 08:09 PM

Re: Memory errors with large zip files
 
On 23 May 2005 09:28:15 -0700, "Marcus Lowland" <mcdesigns@walla.com>
wrote:

>Thank for the detailed reply John! I guess it turned out to be a bit
>tougher than I originally thought :-)....
>
>Reading over your links, I think I better not attempt rewriting the
>zipfile.py program... a little over my head :-). The best solution,
>from everything I read seems to be calling an unzipper program from a
>subprocess. I assume you mean using execfile()? I can't think of
>another way.


Errrmmmm ... no, execfile runs a Python source file.

Check out the subprocess module:

"""
6.8 subprocess -- Subprocess management

New in version 2.4.

The subprocess module allows you to spawn new processes, connect to
their input/output/error pipes, and obtain their return codes. This
module intends to replace several other, older modules and functions,
such as:

os.system
os.spawn*
os.popen*
popen2.*
commands.*
"""




All times are GMT. The time now is 12:42 PM.

Powered by vBulletin®. Copyright ©2000 - 2014, vBulletin Solutions, Inc.
SEO by vBSEO ©2010, Crawlability, Inc.