Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > Re: CRC-checksum failed in gzip

Reply
Thread Tools

Re: CRC-checksum failed in gzip

 
 
andrea crotti
Guest
Posts: n/a
 
      08-01-2012
Full traceback:

Exception in thread Thread-8:
Traceback (most recent call last):
File "/user/sim/python/lib/python2.7/threading.py", line 530, in
__bootstrap_inner
self.run()
File "/user/sim/tests/llif/AutoTester/src/AutoTester2.py", line 67, in run
self.processJobData(jobData, logger)
File "/user/sim/tests/llif/AutoTester/src/AutoTester2.py", line 204,
in processJobData
self.run_simulator(area, jobData[1] ,log)
File "/user/sim/tests/llif/AutoTester/src/AutoTester2.py", line 142,
in run_simulator
report_file, percentage, body_text = SimResults.copy_test_batch(log, area)
File "/user/sim/tests/llif/AutoTester/src/SimResults.py", line 274,
in copy_test_batch
out2_lines = out2.read()
File "/user/sim/python/lib/python2.7/gzip.py", line 245, in read
self._read(readsize)
File "/user/sim/python/lib/python2.7/gzip.py", line 316, in _read
self._read_eof()
File "/user/sim/python/lib/python2.7/gzip.py", line 338, in _read_eof
hex(self.crc)))
IOError: CRC check failed 0x4f675fba != 0xa9e45aL


- The file is written with the linux gzip program.
- no I can't reproduce the error with the same exact file that did
failed, that's what is really puzzling,
there seems to be no clear pattern and just randmoly fails. The file
is also just open for read from this program,
so in theory no way that it can be corrupted.

I also checked with lsof if there are processes that opened it but
nothing appears..

- can't really try on the local disk, might take ages unfortunately
(we are rewriting this system from scratch anyway)
 
Reply With Quote
 
 
 
 
Steven D'Aprano
Guest
Posts: n/a
 
      08-01-2012
On Wed, 01 Aug 2012 14:01:45 +0100, andrea crotti wrote:

> Full traceback:
>
> Exception in thread Thread-8:


"DANGER DANGER DANGER WILL ROBINSON!!!"

Why didn't you say that there were threads involved? That puts a
completely different perspective on the problem.

I *was* going to write back and say that you probably had either file
system corruption, or network errors. But now that I can see that you
have threads, I will revise that and say that you probably have a bug in
your thread handling code.

I must say, Andrea, your initial post asking for help was EXTREMELY
misleading. You over-simplified the problem to the point that it no
longer has any connection to the reality of the code you are running.
Please don't send us on wild goose chases after bugs in code that you
aren't actually running.


> there seems to be no clear pattern and just randmoly fails.


When you start using threads, you have to expect these sorts of
intermittent bugs unless you are very careful.

My guess is that you have a bug where two threads read from the same file
at the same time. Since each read shares state (the position of the file
pointer), you're going to get corruption. Because it depends on timing
details of which threads do what at exactly which microsecond, the effect
might as well be random.

Example: suppose the file contains three blocks A B and C, and a
checksum. Thread 8 starts reading the file, and gets block A and B. Then
thread 2 starts reading it as well, and gets half of block C. Thread 8
gets the rest of block C, calculates the checksum, and it doesn't match.

I recommend that you run a file system check on the remote disk. If it
passes, you can eliminate file system corruption. Also, run some network
diagnostics, to eliminate corruption introduced in the network layer. But
I expect that you won't find anything there, and the problem is a simple
thread bug. Simple, but really, really hard to find.

Good luck.


--
Steven
 
Reply With Quote
 
 
 
 
andrea crotti
Guest
Posts: n/a
 
      08-01-2012
2012/8/1 Steven D'Aprano <(E-Mail Removed)>:
> On Wed, 01 Aug 2012 14:01:45 +0100, andrea crotti wrote:
>
>> Full traceback:
>>
>> Exception in thread Thread-8:

>
> "DANGER DANGER DANGER WILL ROBINSON!!!"
>
> Why didn't you say that there were threads involved? That puts a
> completely different perspective on the problem.
>
> I *was* going to write back and say that you probably had either file
> system corruption, or network errors. But now that I can see that you
> have threads, I will revise that and say that you probably have a bug in
> your thread handling code.
>
> I must say, Andrea, your initial post asking for help was EXTREMELY
> misleading. You over-simplified the problem to the point that it no
> longer has any connection to the reality of the code you are running.
> Please don't send us on wild goose chases after bugs in code that you
> aren't actually running.
>
>
>> there seems to be no clear pattern and just randmoly fails.

>
> When you start using threads, you have to expect these sorts of
> intermittent bugs unless you are very careful.
>
> My guess is that you have a bug where two threads read from the same file
> at the same time. Since each read shares state (the position of the file
> pointer), you're going to get corruption. Because it depends on timing
> details of which threads do what at exactly which microsecond, the effect
> might as well be random.
>
> Example: suppose the file contains three blocks A B and C, and a
> checksum. Thread 8 starts reading the file, and gets block A and B. Then
> thread 2 starts reading it as well, and gets half of block C. Thread 8
> gets the rest of block C, calculates the checksum, and it doesn't match.
>
> I recommend that you run a file system check on the remote disk. If it
> passes, you can eliminate file system corruption. Also, run some network
> diagnostics, to eliminate corruption introduced in the network layer. But
> I expect that you won't find anything there, and the problem is a simple
> thread bug. Simple, but really, really hard to find.
>
> Good luck.
>


Thanks a lot, that makes a lot of sense.. I haven't given this detail
before because I didn't write this code, and I forgot that there were
threads involved completely, I'm just trying to help to fix this bug.

Your explanation makes a lot of sense, but it's still surprising that
even just reading files without ever writing them can cause troubles
using threads :/
 
Reply With Quote
 
Laszlo Nagy
Guest
Posts: n/a
 
      08-01-2012

> Thanks a lot, that makes a lot of sense.. I haven't given this detail
> before because I didn't write this code, and I forgot that there were
> threads involved completely, I'm just trying to help to fix this bug.
>
> Your explanation makes a lot of sense, but it's still surprising that
> even just reading files without ever writing them can cause troubles
> using threads :/

Make sure that file objects are not shared between threads. If that is
possible. It will probably solve the problem (if that is related to
threads).
 
Reply With Quote
 
andrea crotti
Guest
Posts: n/a
 
      08-01-2012
2012/8/1 Laszlo Nagy <(E-Mail Removed)>:
>
>> Thanks a lot, that makes a lot of sense.. I haven't given this detail
>> before because I didn't write this code, and I forgot that there were
>> threads involved completely, I'm just trying to help to fix this bug.
>>
>> Your explanation makes a lot of sense, but it's still surprising that
>> even just reading files without ever writing them can cause troubles
>> using threads :/

>
> Make sure that file objects are not shared between threads. If that is
> possible. It will probably solve the problem (if that is related to
> threads).



Well I just have to create a lock I guess right?
with lock:
# open file
# read content
 
Reply With Quote
 
Laszlo Nagy
Guest
Posts: n/a
 
      08-01-2012

>> Make sure that file objects are not shared between threads. If that is
>> possible. It will probably solve the problem (if that is related to
>> threads).

>
> Well I just have to create a lock I guess right?

That is also a solution. You need to call file.read() inside an acquired
lock.
> with lock:
> # open file
> # read content
>

But not that way! Your example will keep the lock acquired for the
lifetime of the file, so it cannot be shared between threads.

More likely:

## Open file
lock = threading.Lock()
fin = gzip.open(file_path...)
# Now you can share the file object between threads.

# and do this inside any thread:
## data needed. block until the file object becomes usable.
with lock:
data = fin.read(....) # other threads are blocked while I'm reading
## use your data here, meanwhile other threads can read


 
Reply With Quote
 
Ulrich Eckhardt
Guest
Posts: n/a
 
      08-02-2012
Am 01.08.2012 19:57, schrieb Laszlo Nagy:
> ## Open file
> lock = threading.Lock()
> fin = gzip.open(file_path...)
> # Now you can share the file object between threads.
>
> # and do this inside any thread:
> ## data needed. block until the file object becomes usable.
> with lock:
> data = fin.read(....) # other threads are blocked while I'm reading
> ## use your data here, meanwhile other threads can read


Technically, that is correct, but IMHO its complete nonsense to share
the file object between threads in the first place. If you need the data
in two threads, just read the file once and then share the read-only,
immutable content. If the file is small or too large to be held in
memory at once, just open and read it on demand. This also saves you
from having to rewind the file every time you read it.

Am I missing something?

Uli
 
Reply With Quote
 
andrea crotti
Guest
Posts: n/a
 
      08-02-2012
2012/8/1 Steven D'Aprano <(E-Mail Removed)>:
>
> When you start using threads, you have to expect these sorts of
> intermittent bugs unless you are very careful.
>
> My guess is that you have a bug where two threads read from the same file
> at the same time. Since each read shares state (the position of the file
> pointer), you're going to get corruption. Because it depends on timing
> details of which threads do what at exactly which microsecond, the effect
> might as well be random.
>
> Example: suppose the file contains three blocks A B and C, and a
> checksum. Thread 8 starts reading the file, and gets block A and B. Then
> thread 2 starts reading it as well, and gets half of block C. Thread 8
> gets the rest of block C, calculates the checksum, and it doesn't match.
>
> I recommend that you run a file system check on the remote disk. If it
> passes, you can eliminate file system corruption. Also, run some network
> diagnostics, to eliminate corruption introduced in the network layer. But
> I expect that you won't find anything there, and the problem is a simple
> thread bug. Simple, but really, really hard to find.
>
> Good luck.


One last thing I would like to do before I add this fix is to actually
be able to reproduce this behaviour, and I thought I could just do the
following:

import gzip
import threading


class OpenAndRead(threading.Thread):
def run(self):
fz = gzip.open('out2.txt.gz')
fz.read()
fz.close()


if __name__ == '__main__':
for i in range(100):
OpenAndRead().start()


But no matter how many threads I start, I can't reproduce the CRC
error, any idea how I can try to help it happening?

The code in run should be shared by all the threads since there are no
locks, right?
 
Reply With Quote
 
Laszlo Nagy
Guest
Posts: n/a
 
      08-02-2012

> Technically, that is correct, but IMHO its complete nonsense to share
> the file object between threads in the first place. If you need the
> data in two threads, just read the file once and then share the
> read-only, immutable content. If the file is small or too large to be
> held in memory at once, just open and read it on demand. This also
> saves you from having to rewind the file every time you read it.
>
> Am I missing something?

We suspect that his program reads the same file object from different
threads. At least this would explain his problem. I agree with you -
usually it is not a good idea to share a file object between threads.
This is what I told him the first time. But it is not in our hands - he
already has a program that needs to be fixed. It might be easier for him
to protect read() calls with a lock. Because it can be done
automatically, without thinking too much.
 
Reply With Quote
 
Laszlo Nagy
Guest
Posts: n/a
 
      08-02-2012

> One last thing I would like to do before I add this fix is to actually
> be able to reproduce this behaviour, and I thought I could just do the
> following:
>
> import gzip
> import threading
>
>
> class OpenAndRead(threading.Thread):
> def run(self):
> fz = gzip.open('out2.txt.gz')
> fz.read()
> fz.close()
>
>
> if __name__ == '__main__':
> for i in range(100):
> OpenAndRead().start()
>
>
> But no matter how many threads I start, I can't reproduce the CRC
> error, any idea how I can try to help it happening?

Your example did not share the file object between threads. Here an
example that does that:

class OpenAndRead(threading.Thread):
def run(self):
global fz
fz.read(100)

if __name__ == '__main__':
fz = gzip.open('out2.txt.gz')
for i in range(10):
OpenAndRead().start()

Try this with a huge file. And here is the one that should never throw
CRC error, because the file object is protected by a lock:

class OpenAndRead(threading.Thread):
def run(self):
global fz
global fl
with fl:
fz.read(100)

if __name__ == '__main__':
fz = gzip.open('out2.txt.gz')
fl = threading.Lock()
for i in range(2):
OpenAndRead().start()

>
> The code in run should be shared by all the threads since there are no
> locks, right?

The code is shared but the file object is not. In your example, a new
file object is created, every time a thread is started.

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Re: CRC-checksum failed in gzip Laszlo Nagy Python 0 08-01-2012 01:27 PM
Re: CRC-checksum failed in gzip Laszlo Nagy Python 0 08-01-2012 11:11 AM
Re: CRC-checksum failed in gzip andrea crotti Python 0 08-01-2012 10:58 AM
Re: CRC-checksum failed in gzip Laszlo Nagy Python 0 08-01-2012 10:47 AM
CRC-checksum failed in gzip andrea crotti Python 0 08-01-2012 10:39 AM



Advertisments