Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > bz2 module doesn't work properly with all bz2 files

Reply
Thread Tools

bz2 module doesn't work properly with all bz2 files

 
 
Magdoll
Guest
Posts: n/a
 
      06-04-2010
I'm not sure what's causing this, but depending on the compression
program used, the bz2 module sometimes exits earlier.

I used pbzip2 to compress my bz2 files and read through the file using
the bz2 module. The file descriptor always exits much earlier than
where the actual EOF is. If I use bzip2 instead of pbzip2 to compress
the files, then everything is fine.

My files are generally big (several GBs) so decompressing them is not
a wise choice, and it is a little unfortunate that I can't use pbzip2
because it's usually much faster than bz2.



 
Reply With Quote
 
 
 
 
Cameron Simpson
Guest
Posts: n/a
 
      06-04-2010
On 04Jun2010 12:53, Magdoll <> wrote:
| I'm not sure what's causing this, but depending on the compression
| program used, the bz2 module sometimes exits earlier.
|
| I used pbzip2 to compress my bz2 files and read through the file using
| the bz2 module. The file descriptor always exits much earlier than
| where the actual EOF is. If I use bzip2 instead of pbzip2 to compress
| the files, then everything is fine.
|
| My files are generally big (several GBs) so decompressing them is not
| a wise choice, and it is a little unfortunate that I can't use pbzip2
| because it's usually much faster than bz2.

Have you tested the decompression or the problematic files with the
bunzip2 command? Just to ensure the bug is with the python bz2 module
and not with the pbzip2 utility?
--
Cameron Simpson <> DoD#743
http://www.cskk.ezoshosting.com/cs/

A lot of people don't know the difference between a violin and a viola, so
I'll tell you. A viola burns longer. - Victor Borge
 
Reply With Quote
 
 
 
 
Magdoll
Guest
Posts: n/a
 
      06-04-2010
On Jun 4, 3:05*pm, Cameron Simpson <c...@zip.com.au> wrote:
> On 04Jun2010 12:53, Magdoll <magd...@gmail.com> wrote:
> | I'm not sure what's causing this, but depending on the compression
> | program used, the bz2 module sometimes exits earlier.
> |
> | I used pbzip2 to compress my bz2 files and read through the file using
> | the bz2 module. The file descriptor always exits much earlier than
> | where the actual EOF is. If I use bzip2 instead of pbzip2 to compress
> | the files, then everything is fine.
> |
> | My files are generally big (several GBs) so decompressing them is not
> | a wise choice, and it is a little unfortunate that I can't use pbzip2
> | because it's usually much faster than bz2.
>
> Have you tested the decompression or the problematic files with the
> bunzip2 command? Just to ensure the bug is with the python bz2 module
> and not with the pbzip2 utility?
> --
> Cameron Simpson <c...@zip.com.au> DoD#743http://www.cskk.ezoshosting.com/cs/
>
> A lot of people don't know the difference between a violin and a viola, so
> I'll tell you. *A viola burns longer. * - Victor Borge


Yes. Decompressing them with either pbzip2 or bunzip2 are both fine.
So the problem is not with pbzip2.
 
Reply With Quote
 
Steven D'Aprano
Guest
Posts: n/a
 
      06-05-2010
On Fri, 04 Jun 2010 12:53:26 -0700, Magdoll wrote:

> I'm not sure what's causing this, but depending on the compression
> program used, the bz2 module sometimes exits earlier.

[...]

The current bz2 module only supports files written as a single stream,
and not multiple stream files. This is why the BZ2File class has no
"append" mode. See this bug report:

http://bugs.python.org/issue1625

Here's an example:

>>> bz2.BZ2File('a.bz2', 'w').write('this is the first chunk of text')
>>> bz2.BZ2File('b.bz2', 'w').write('this is the second chunk of text')
>>> bz2.BZ2File('c.bz2', 'w').write('this is the third chunk of text')
>>> # concatenate the files

.... d = file('concate.bz2', 'w')
>>> for name in "abc":

.... Â* Â* f = file('%c.bz2' % name, 'rb')
.... Â* Â* d.write(f.read())
....
>>> d.close()
>>>
>>> bz2.BZ2File('concate.bz2', 'r').read()

'this is the first chunk of text'

Sure enough, BZ2File only sees the first chunk of text, but if I open it
in (e.g.) KDE's Ark application, I see all the text.

So this is a known bug, sorry.


--
Steven
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
I found the bug in Irfanview where it doesn't work properly on a large number of files Annu Pai Digital Photography 11 03-30-2009 04:57 AM
.gz and .bz2 files on the Perl command line Ted Zlatanov Perl Misc 5 08-06-2008 04:08 PM
Server.Execute doesn't work properly in all environments Jeff Robichaud ASP .Net 0 03-07-2006 02:47 PM
bz2 module Brad Tilley Python 7 10-24-2004 06:52 AM
More American Graffiti: Properly Framed, Properly Scored? Scot Gardner DVD Video 0 09-02-2003 02:28 AM



Advertisments