Velocity Reviews

Velocity Reviews (http://www.velocityreviews.com/forums/index.php)
-   Python (http://www.velocityreviews.com/forums/f43-python.html)
-   -   file reading by record separator (not line by line) (http://www.velocityreviews.com/forums/t510895-file-reading-by-record-separator-not-line-by-line.html)

Lee Sander 05-31-2007 12:26 PM

file reading by record separator (not line by line)
 
Dear all,
I would like to read a really huge file that looks like this:

> name1....

line_11
line_12
line_13
....
>name2 ...

line_21
line_22
....
etc

where line_ij is just a free form text on that line.

how can i read file so that every time i do a "read()" i get exactly
one record
up to the next ">"

many thanks
Lee


Lee Sander 05-31-2007 12:39 PM

Re: file reading by record separator (not line by line)
 
I wanted to also say that this file is really huge, so I cannot
just do a read() and then split on ">" to get a record
thanks
lee

On May 31, 1:26 pm, Lee Sander <lesa...@gmail.com> wrote:
> Dear all,
> I would like toreada really hugefilethat looks like this:
>
> > name1....

>
> line_11
> line_12
> line_13
> ...>name2 ...
>
> line_21
> line_22
> ...
> etc
>
> where line_ij is just a free form text on that line.
>
> how can ireadfileso that every time i do a "read()" i get exactly
> onerecord
> up to the next ">"
>
> many thanks
> Lee




aspineux 05-31-2007 12:58 PM

Re: file reading by record separator (not line by line)
 

something like

name=None
lines=[]
for line in open('yourfilename.txt'):
if line.startwith('>'):
if name!=None:
print 'Here is the record', name
print lines
print
name=line.stripr('\r')
lines=[]
else:
lines.append(line.stripr('\n'))



On 31 mai, 14:39, Lee Sander <lesa...@gmail.com> wrote:
> I wanted to also say that this file is really huge, so I cannot
> just do a read() and then split on ">" to get a record
> thanks
> lee
>
> On May 31, 1:26 pm, Lee Sander <lesa...@gmail.com> wrote:
>
> > Dear all,
> > I would like toreada really hugefilethat looks like this:

>
> > > name1....

>
> > line_11
> > line_12
> > line_13
> > ...>name2 ...

>
> > line_21
> > line_22
> > ...
> > etc

>
> > where line_ij is just a free form text on that line.

>
> > how can ireadfileso that every time i do a "read()" i get exactly
> > onerecord
> > up to the next ">"

>
> > many thanks
> > Lee




Tijs 05-31-2007 01:14 PM

Re: file reading by record separator (not line by line)
 
Lee Sander wrote:

> I wanted to also say that this file is really huge, so I cannot
> just do a read() and then split on ">" to get a record
> thanks
> lee


Below is the easy solution. To get even better performance, or if '<' is not
always at the start of the line, you would have to implement the buffering
that is done by readline() yourself (see _fileobject in socket.py in the
standard lib for example).

def chunkreader(f):
name = None
lines = []
while True:
line = f.readline()
if not line: break
if line[0] == '>':
if name is not None:
yield name, lines
name = line[1:].rstrip()
lines = []
else:
lines.append(line)
if name is not None:
yield name, lines

if __name__ == '__main__':
from StringIO import StringIO
s = \
"""> name1
line1
line2
line3
> name2

line 4
line 5
line 6"""
f = StringIO(s)
for name, lines in chunkreader(f):
print '***', name
print ''.join(lines)


$ python test.py
*** name1
line1
line2
line3

*** name2
line 4
line 5
line 6

--

Regards,
Tijs

Tijs 05-31-2007 01:15 PM

Re: file reading by record separator (not line by line)
 
aspineux wrote:

>
> something like
>
> name=None
> lines=[]
> for line in open('yourfilename.txt'):
> if line.startwith('>'):
> if name!=None:
> print 'Here is the record', name
> print lines
> print
> name=line.stripr('\r')
> lines=[]
> else:
> lines.append(line.stripr('\n'))
>


That would miss the last chunk.

--

Regards,
Tijs

Marc 'BlackJack' Rintsch 05-31-2007 01:41 PM

Re: file reading by record separator (not line by line)
 
In <1180614374.027569.235540@g4g2000hsf.googlegroups. com>, Lee Sander
wrote:

> Dear all,
> I would like to read a really huge file that looks like this:
>
>> name1....

> line_11
> line_12
> line_13
> ...
>>name2 ...

> line_21
> line_22
> ...
> etc
>
> where line_ij is just a free form text on that line.
>
> how can i read file so that every time i do a "read()" i get exactly
> one record
> up to the next ">"


There was just recently a thread with a `itertools.groupby()` solution.
Something like this:

from itertools import count, groupby, imap
from operator import itemgetter

def mark_records(lines):
counter = 0
for line in lines:
if line.startswith('>'):
counter += 1
yield (counter, line)


def iter_records(lines):
fst = itemgetter(0)
snd = itemgetter(1)
for dummy, record_lines in groupby(mark_records(lines), fst):
yield imap(snd, record_lines)


def main():
source = """\
> name1....

line_11
line_12
line_13
....
> name2 ...

line_21
line_22
....""".splitlines()

for record in iter_records(source):
print 'Start of record...'
for line in record:
print ':', line

Ciao,
Marc 'BlackJack' Rintsch

Hendrik van Rooyen 06-01-2007 05:57 AM

Re: file reading by record separator (not line by line)
 
"Lee Sander" <le..e@gmail.com>wrote:


> I wanted to also say that this file is really huge, so I cannot
> just do a read() and then split on ">" to get a record
> thanks
> lee
>
> On May 31, 1:26 pm, Lee Sander <lesa...@gmail.com> wrote:
> > Dear all,
> > I would like toreada really hugefilethat looks like this:
> >
> > > name1....

> >
> > line_11
> > line_12
> > line_13
> > ...>name2 ...
> >
> > line_21
> > line_22
> > ...
> > etc
> >
> > where line_ij is just a free form text on that line.
> >
> > how can ireadfileso that every time i do a "read()" i get exactly
> > onerecord
> > up to the next ">"
> >
> > many thanks
> > Lee

>


I would do something like: (not tested):

def get_a_record(f,sep):
ret_rec = ''
while True:
char = f.read(1)
if char == sep:
break
else:
ret_rec += char
return ret_rec

- Hendrik



All times are GMT. The time now is 06:43 PM.

Powered by vBulletin®. Copyright ©2000 - 2014, vBulletin Solutions, Inc.
SEO by vBSEO ©2010, Crawlability, Inc.