Re: CRC-checksum failed in gzip
- The file is written with the linux gzip program.
- no I can't reproduce the error with the same exact file that did
failed, that's what is really puzzling,
How do you make sure that no process is reading the file before it is
fully flushed to disk?
Possible way of testing for this kind of error: before you open a file,
use os.stat to determine its size, and write out the size and the file
path into a log file. Whenever an error occurs, compare the actual size
of the file with the logged value. If they are different, then you have
tried to read from a file that was growing at that time.
Suggestion: from the other process, write the file into a different file
(for example, "file.gz.tmp"). Once the file is flushed and closed, use
os.rename() to give its final name. On POSIX systems, the rename()
operation is atomic.
> there seems to be no clear pattern and just randmoly fails. The file
> is also just open for read from this program,
> so in theory no way that it can be corrupted.
Yes, there is. Gzip stores CRC for compressed *blocks*. So if the file
is not flushed to the disk, then you can only read a fragment of the
block, and that changes the CRC.
> I also checked with lsof if there are processes that opened it but
> nothing appears..
lsof doesn't work very well over nfs. You can have other processes on
different computers (!) writting the file. lsof only lists the processes
on the system it is executed on.
> - can't really try on the local disk, might take ages unfortunately
> (we are rewriting this system from scratch anyway)
|All times are GMT. The time now is 06:33 AM.|
Powered by vBulletin®. Copyright ©2000 - 2014, vBulletin Solutions, Inc.
SEO by vBSEO ©2010, Crawlability, Inc.