Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > reading a specific column from file

Reply
Thread Tools

reading a specific column from file

 
 
cesco
Guest
Posts: n/a
 
      01-11-2008
Hi,

I have a file containing four columns of data separated by tabs (\t)
and I'd like to read a specific column from it (say the third). Is
there any simple way to do this in Python?

I've found quite interesting the linecache module but unfortunately
that is (to my knowledge) only working on lines, not columns.

Any suggestion?

Thanks and regards
Francesco
 
Reply With Quote
 
 
 
 
A.T.Hofkamp
Guest
Posts: n/a
 
      01-11-2008
On 2008-01-11, cesco <(E-Mail Removed)> wrote:
> Hi,
>
> I have a file containing four columns of data separated by tabs (\t)
> and I'd like to read a specific column from it (say the third). Is
> there any simple way to do this in Python?
>
> I've found quite interesting the linecache module but unfortunately
> that is (to my knowledge) only working on lines, not columns.
>
> Any suggestion?


the csv module may do what you want.
 
Reply With Quote
 
 
 
 
Fredrik Lundh
Guest
Posts: n/a
 
      01-11-2008
cesco wrote:

> I have a file containing four columns of data separated by tabs (\t)
> and I'd like to read a specific column from it (say the third). Is
> there any simple way to do this in Python?


use the "split" method and plain old indexing:

for line in open("file.txt"):
columns = line.split("\t")
print columns[2] # indexing starts at zero

also see the "csv" module, which can read all sorts of
comma/semicolon/tab-separated spreadsheet-style files.

> I've found quite interesting the linecache module


the "linecache" module seems to be quite popular on comp.lang.python
these days, but it's designed for a very specific purpose (displaying
Python code in tracebacks), and is a really lousy way to read text files
in the general case. please unlearn.

</F>

 
Reply With Quote
 
Chris
Guest
Posts: n/a
 
      01-11-2008
On Jan 11, 2:15 pm, cesco <(E-Mail Removed)> wrote:
> Hi,
>
> I have a file containing four columns of data separated by tabs (\t)
> and I'd like to read a specific column from it (say the third). Is
> there any simple way to do this in Python?
>
> I've found quite interesting the linecache module but unfortunately
> that is (to my knowledge) only working on lines, not columns.
>
> Any suggestion?
>
> Thanks and regards
> Francesco


for (i, each_line) in enumerate(open('input_file.txt','rb')):
try:
column_3 = each_line.split('\t')[2].strip()
except IndexError:
print 'Not enough columns on line %i of file.' % (i+1)
continue

do_something_with_column_3()
 
Reply With Quote
 
Peter Otten
Guest
Posts: n/a
 
      01-11-2008
A.T.Hofkamp wrote:

> On 2008-01-11, cesco <(E-Mail Removed)> wrote:
>> Hi,
>>
>> I have a file containing four columns of data separated by tabs (\t)
>> and I'd like to read a specific column from it (say the third). Is
>> there any simple way to do this in Python?
>>
>> I've found quite interesting the linecache module but unfortunately
>> that is (to my knowledge) only working on lines, not columns.
>>
>> Any suggestion?

>
> the csv module may do what you want.


Here's an example:

>>> print open("tmp.csv").read()

alpha beta gamma delta
one two three for

>>> records = csv.reader(open("tmp.csv"), delimiter="\t")
>>> [record[2] for record in records]

['gamma', 'three']

Peter
 
Reply With Quote
 
Ivan Novick
Guest
Posts: n/a
 
      01-11-2008
On Jan 11, 4:15 am, cesco <(E-Mail Removed)> wrote:
> Hi,
>
> I have a file containing four columns of data separated by tabs (\t)
> and I'd like to read a specific column from it (say the third). Is
> there any simple way to do this in Python?


You say you would like to "read" a specific column. I wonder if you
meant read all the data and then just seperate out the 3rd column or
if you really mean only do disk IO for the 3rd column of data and
thereby making your read faster. The second seems more interesting
but much harder and I wonder if any one has any ideas. As for the
just filtering out the third column, you have been given many
suggestions already.

Regards,
Ivan Novick
http://www.0x4849.net
 
Reply With Quote
 
Reedick, Andrew
Guest
Posts: n/a
 
      01-11-2008
> -----Original Message-----
> From: python-list-bounces+jr9445=(E-Mail Removed) [mailtoython-
> list-bounces+jr9445=(E-Mail Removed)] On Behalf Of Ivan Novick
> Sent: Friday, January 11, 2008 12:46 PM
> To: http://www.velocityreviews.com/forums/(E-Mail Removed)
> Subject: Re: reading a specific column from file
>
>
> You say you would like to "read" a specific column. I wonder if you
> meant read all the data and then just seperate out the 3rd column or
> if you really mean only do disk IO for the 3rd column of data and
> thereby making your read faster. The second seems more interesting
> but much harder and I wonder if any one has any ideas.


Do what databases do. If the columns are stored with a fixed size on
disk, then you can simply compute the offset and seek to it. If the
columns are of variable size, then you need to store (and maintain) the
offsets in some kind of index.



*****

The information transmitted is intended only for the person or entity to which it is addressed and may contain confidential, proprietary, and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon this information by persons or entities other than the intended recipient is prohibited. If you received this in error, please contact the sender and delete the material from all computers. GA623


 
Reply With Quote
 
Hai Vu
Guest
Posts: n/a
 
      01-17-2008
Here is another suggestion:

col = 2 # third column
filename = '4columns.txt'
third_column = [line[:-1].split('\t')[col] for line in open(filename,
'r')]

third_column now contains a list of items in the third column.

This solution is great for small files (up to a couple of thousand of
lines). For larger file, performance could be a problem, so you might
need a different solution.
 
Reply With Quote
 
John Machin
Guest
Posts: n/a
 
      01-17-2008
On Jan 17, 8:47 pm, Hai Vu <(E-Mail Removed)> wrote:
> Here is another suggestion:
>
> col = 2 # third column
> filename = '4columns.txt'
> third_column = [line[:-1].split('\t')[col] for line in open(filename,
> 'r')]
>
> third_column now contains a list of items in the third column.
>
> This solution is great for small files (up to a couple of thousand of
> lines). For larger file, performance could be a problem, so you might
> need a different solution.


Using the maxsplit arg could speed it up a little:

line[:-1].split('\t', col+1)[col]

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Splitting a file from specific column content Yigit Turgut Python 14 01-22-2012 08:55 PM
Reading only a specific portion of XML file. Savvoulidis Iordanis ASP .Net 3 12-15-2009 09:18 PM
Efficiently reading a string from a specific point in a file random guy C++ 7 05-12-2007 08:59 AM
reading specific lines of a file Yi Xing Python 12 07-16-2006 12:43 PM
Re-reading a portion of a source file from a specific point Berger, Daniel Ruby 8 08-12-2005 03:48 PM



Advertisments