Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > question on using tarfile to read a *.tar.gzip file

Reply
Thread Tools

question on using tarfile to read a *.tar.gzip file

 
 
m_ahlenius
Guest
Posts: n/a
 
      02-07-2010
Hi,

I have a number of relatively large number *tar.gzip files to
process. With the py module tarfile, I see that I can access and
extract them, one at a time to a temporary dir, but that of course
takes time.

All that I need to do is to read the first and last lines of each file
and then move on to the next one. I am not changing anything in these
files - just reading. The file lines are not fixed lengths either,
which makes it a bit more fun.

Is there a way to do this, without decompressing each file to a temp
dir? Like is there a method using some tarfile interface adapter to
read a compressed file? Otherwise I'll just access each file, extract
it, grab the 1st and last lines and then delete the temp file.

thx

'mark
 
Reply With Quote
 
 
 
 
Tim Chase
Guest
Posts: n/a
 
      02-07-2010
> Is there a way to do this, without decompressing each file to a temp
> dir? Like is there a method using some tarfile interface adapter to
> read a compressed file? Otherwise I'll just access each file, extract
> it, grab the 1st and last lines and then delete the temp file.


I think you're looking for the extractfile() method of the
TarFile object:

from glob import glob
from tarfile import TarFile
for fname in glob('*.tgz'):
print fname
tf = TarFile.gzopen(fname)
for ti in tf:
print ' %s' % ti.name
f = tf.extractfile(ti)
if not f: continue
fi = iter(f) # f doesn't natively support next()
first_line = fi.next()
for line in fi: pass
f.close()
print " First line: %r" % first_line
print " Last line: %r" % line
tf.close()

If you just want the first & last lines, it's a little more
complex if you don't want to scan the entire file (like I do with
the for-loop), but the file-like object returned by extractfile()
is documented as supporting seek() so you can skip to the end and
then read backwards until you have sufficient lines. I wrote a
"get the last line of a large file using seeks from the EOF"
function which you can find at [1] which should handle the odd
edge cases of $BUFFER_SIZE containing more or less than a full
line and then reading backwards in chunks (if needed) until you
have one full line, handling a one-line file, and other
odd/annoying edge-cases. Hope it helps.

-tkc

[1]
http://mail.python.org/pipermail/pyt...y/1186176.html


 
Reply With Quote
 
 
 
 
m_ahlenius
Guest
Posts: n/a
 
      02-08-2010
On Feb 7, 5:01*pm, Tim Chase <(E-Mail Removed)> wrote:
> > Is there a way to do this, without decompressing each file to a temp
> > dir? *Like is there a method using some tarfile interface adapter to
> > read a compressed file? *Otherwise I'll just access each file, extract
> > it, *grab the 1st and last lines and then delete the temp file.

>
> I think you're looking for the extractfile() method of the
> TarFile object:
>
> * *from glob import glob
> * *from tarfile import TarFile
> * *for fname in glob('*.tgz'):
> * * *print fname
> * * *tf = TarFile.gzopen(fname)
> * * *for ti in tf:
> * * * *print ' %s' % ti.name
> * * * *f = tf.extractfile(ti)
> * * * *if not f: continue
> * * * *fi = iter(f) # f doesn't natively support next()
> * * * *first_line = fi.next()
> * * * *for line in fi: pass
> * * * *f.close()
> * * * *print " *First line: %r" % first_line
> * * * *print " *Last line: %r" % line
> * * *tf.close()
>
> If you just want the first & last lines, it's a little more
> complex if you don't want to scan the entire file (like I do with
> the for-loop), but the file-like object returned by extractfile()
> is documented as supporting seek() so you can skip to the end and
> then read backwards until you have sufficient lines. *I wrote a
> "get the last line of a large file using seeks from the EOF"
> function which you can find at [1] which should handle the odd
> edge cases of $BUFFER_SIZE containing more or less than a full
> line and then reading backwards in chunks (if needed) until you
> have one full line, handling a one-line file, and other
> odd/annoying edge-cases. *Hope it helps.
>
> -tkc
>
> [1]http://mail.python.org/pipermail/python-list/2009-January/1186176.html


Thanks Tim - this was very helpful. Just learning about tarfile.

'mark
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Create TarFile using python itzel Python 6 09-08-2009 11:15 AM
Create TarFile using string buffers aurora00@gmail.com Python 7 03-20-2007 10:18 AM
using tarfile with an open file object Matthew Thorley Python 1 05-04-2005 08:55 PM
tarfile's tar.extractfile() file-like object incompatible with pickle.load()? Matt Doucleff Python 5 08-27-2004 08:53 PM
TarFile using binary strings Marc Poinot (Onera) Python 0 08-12-2003 11:12 AM



Advertisments