Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > file reading by record separator (not line by line)

Reply
Thread Tools

file reading by record separator (not line by line)

 
 
Lee Sander
Guest
Posts: n/a
 
      05-31-2007
Dear all,
I would like to read a really huge file that looks like this:

> name1....

line_11
line_12
line_13
....
>name2 ...

line_21
line_22
....
etc

where line_ij is just a free form text on that line.

how can i read file so that every time i do a "read()" i get exactly
one record
up to the next ">"

many thanks
Lee

 
Reply With Quote
 
 
 
 
Lee Sander
Guest
Posts: n/a
 
      05-31-2007
I wanted to also say that this file is really huge, so I cannot
just do a read() and then split on ">" to get a record
thanks
lee

On May 31, 1:26 pm, Lee Sander <lesa...@gmail.com> wrote:
> Dear all,
> I would like toreada really hugefilethat looks like this:
>
> > name1....

>
> line_11
> line_12
> line_13
> ...>name2 ...
>
> line_21
> line_22
> ...
> etc
>
> where line_ij is just a free form text on that line.
>
> how can ireadfileso that every time i do a "read()" i get exactly
> onerecord
> up to the next ">"
>
> many thanks
> Lee



 
Reply With Quote
 
 
 
 
aspineux
Guest
Posts: n/a
 
      05-31-2007

something like

name=None
lines=[]
for line in open('yourfilename.txt'):
if line.startwith('>'):
if name!=None:
print 'Here is the record', name
print lines
print
name=line.stripr('\r')
lines=[]
else:
lines.append(line.stripr('\n'))



On 31 mai, 14:39, Lee Sander <lesa...@gmail.com> wrote:
> I wanted to also say that this file is really huge, so I cannot
> just do a read() and then split on ">" to get a record
> thanks
> lee
>
> On May 31, 1:26 pm, Lee Sander <lesa...@gmail.com> wrote:
>
> > Dear all,
> > I would like toreada really hugefilethat looks like this:

>
> > > name1....

>
> > line_11
> > line_12
> > line_13
> > ...>name2 ...

>
> > line_21
> > line_22
> > ...
> > etc

>
> > where line_ij is just a free form text on that line.

>
> > how can ireadfileso that every time i do a "read()" i get exactly
> > onerecord
> > up to the next ">"

>
> > many thanks
> > Lee



 
Reply With Quote
 
Tijs
Guest
Posts: n/a
 
      05-31-2007
Lee Sander wrote:

> I wanted to also say that this file is really huge, so I cannot
> just do a read() and then split on ">" to get a record
> thanks
> lee


Below is the easy solution. To get even better performance, or if '<' is not
always at the start of the line, you would have to implement the buffering
that is done by readline() yourself (see _fileobject in socket.py in the
standard lib for example).

def chunkreader(f):
name = None
lines = []
while True:
line = f.readline()
if not line: break
if line[0] == '>':
if name is not None:
yield name, lines
name = line[1:].rstrip()
lines = []
else:
lines.append(line)
if name is not None:
yield name, lines

if __name__ == '__main__':
from StringIO import StringIO
s = \
"""> name1
line1
line2
line3
> name2

line 4
line 5
line 6"""
f = StringIO(s)
for name, lines in chunkreader(f):
print '***', name
print ''.join(lines)


$ python test.py
*** name1
line1
line2
line3

*** name2
line 4
line 5
line 6

--

Regards,
Tijs
 
Reply With Quote
 
Tijs
Guest
Posts: n/a
 
      05-31-2007
aspineux wrote:

>
> something like
>
> name=None
> lines=[]
> for line in open('yourfilename.txt'):
> if line.startwith('>'):
> if name!=None:
> print 'Here is the record', name
> print lines
> print
> name=line.stripr('\r')
> lines=[]
> else:
> lines.append(line.stripr('\n'))
>


That would miss the last chunk.

--

Regards,
Tijs
 
Reply With Quote
 
Marc 'BlackJack' Rintsch
Guest
Posts: n/a
 
      05-31-2007
In <. com>, Lee Sander
wrote:

> Dear all,
> I would like to read a really huge file that looks like this:
>
>> name1....

> line_11
> line_12
> line_13
> ...
>>name2 ...

> line_21
> line_22
> ...
> etc
>
> where line_ij is just a free form text on that line.
>
> how can i read file so that every time i do a "read()" i get exactly
> one record
> up to the next ">"


There was just recently a thread with a `itertools.groupby()` solution.
Something like this:

from itertools import count, groupby, imap
from operator import itemgetter

def mark_records(lines):
counter = 0
for line in lines:
if line.startswith('>'):
counter += 1
yield (counter, line)


def iter_records(lines):
fst = itemgetter(0)
snd = itemgetter(1)
for dummy, record_lines in groupby(mark_records(lines), fst):
yield imap(snd, record_lines)


def main():
source = """\
> name1....

line_11
line_12
line_13
....
> name2 ...

line_21
line_22
....""".splitlines()

for record in iter_records(source):
print 'Start of record...'
for line in record:
print ':', line

Ciao,
Marc 'BlackJack' Rintsch
 
Reply With Quote
 
Hendrik van Rooyen
Guest
Posts: n/a
 
      06-01-2007
"Lee Sander" <>wrote:


> I wanted to also say that this file is really huge, so I cannot
> just do a read() and then split on ">" to get a record
> thanks
> lee
>
> On May 31, 1:26 pm, Lee Sander <lesa...@gmail.com> wrote:
> > Dear all,
> > I would like toreada really hugefilethat looks like this:
> >
> > > name1....

> >
> > line_11
> > line_12
> > line_13
> > ...>name2 ...
> >
> > line_21
> > line_22
> > ...
> > etc
> >
> > where line_ij is just a free form text on that line.
> >
> > how can ireadfileso that every time i do a "read()" i get exactly
> > onerecord
> > up to the next ">"
> >
> > many thanks
> > Lee

>


I would do something like: (not tested):

def get_a_record(f,sep):
ret_rec = ''
while True:
char = f.read(1)
if char == sep:
break
else:
ret_rec += char
return ret_rec

- Hendrik

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Readline and record separator Johny Python 12 11-02-2007 05:42 PM
Re: file reading by record separator (not line by line) Steve Howell Python 3 06-02-2007 01:24 PM
Record-separator is a regular expression William James Ruby 8 12-05-2005 01:19 PM
Record separator for readlines() Angelic Devil Python 3 09-03-2005 04:31 AM
spacing in XSLT, return char after each line? / last record separator removal bjam XML 3 04-28-2005 07:31 PM



Advertisments
 



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57