Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > Possible File iteration bug

Reply
Thread Tools

Possible File iteration bug

 
 
Billy Mays
Guest
Posts: n/a
 
      07-14-2011
I noticed that if a file is being continuously written to, the file
generator does not notice it:



def getLines(f):
lines = []
for line in f:
lines.append(line)
return lines

with open('/var/log/syslog', 'rb') as f:
lines = getLines(f)
# do some processing with lines
# /var/log/syslog gets updated in the mean time

# always returns an empty list, even though f has more data
lines = getLines(f)




I found a workaround by adding f.seek(0,1) directly before the last
getLines() call, but is this the expected behavior? Calling f.tell()
right after the first getLines() call shows that it isn't reset back to
0. Is this correct or a bug?

--
Bill
 
Reply With Quote
 
 
 
 
Ian Kelly
Guest
Posts: n/a
 
      07-14-2011
On Thu, Jul 14, 2011 at 1:46 PM, Billy Mays <(E-Mail Removed)> wrote:
> def getLines(f):
> * *lines = []
> * *for line in f:
> * * * *lines.append(line)
> * *return lines
>
> with open('/var/log/syslog', 'rb') as f:
> * *lines = getLines(f)
> * *# do some processing with lines
> * *# /var/log/syslog gets updated in the mean time
>
> * *# always returns an empty list, even though f has more data
> * *lines = getLines(f)
>
>
>
>
> I found a workaround by adding f.seek(0,1) directly before the last
> getLines() call, but is this the expected behavior? *Calling f.tell() right
> after the first getLines() call shows that it isn't reset back to 0. *Is
> this correct or a bug?


This is expected. Part of the iterator protocol is that once an
iterator raises StopIteration, it should continue to raise
StopIteration on subsequent next() calls.
 
Reply With Quote
 
 
 
 
Billy Mays
Guest
Posts: n/a
 
      07-14-2011
On 07/14/2011 04:00 PM, Ian Kelly wrote:
> On Thu, Jul 14, 2011 at 1:46 PM, Billy Mays<(E-Mail Removed)> wrote:
>> def getLines(f):
>> lines = []
>> for line in f:
>> lines.append(line)
>> return lines
>>
>> with open('/var/log/syslog', 'rb') as f:
>> lines = getLines(f)
>> # do some processing with lines
>> # /var/log/syslog gets updated in the mean time
>>
>> # always returns an empty list, even though f has more data
>> lines = getLines(f)
>>
>>
>>
>>
>> I found a workaround by adding f.seek(0,1) directly before the last
>> getLines() call, but is this the expected behavior? Calling f.tell() right
>> after the first getLines() call shows that it isn't reset back to 0. Is
>> this correct or a bug?

>
> This is expected. Part of the iterator protocol is that once an
> iterator raises StopIteration, it should continue to raise
> StopIteration on subsequent next() calls.


Is there any way to just create a new generator that clears its `closed`
status?

--
Bill
 
Reply With Quote
 
Hrvoje Niksic
Guest
Posts: n/a
 
      07-14-2011
Billy Mays <(E-Mail Removed)> writes:

> Is there any way to just create a new generator that clears its
> closed` status?


You can define getLines in terms of the readline file method, which does
return new data when it is available.

def getLines(f):
lines = []
while True:
line = f.readline()
if line == '':
break
lines.append(line)
return lines

or, more succinctly:

def getLines(f):
return list(iter(f.readline, ''))
 
Reply With Quote
 
Terry Reedy
Guest
Posts: n/a
 
      07-14-2011
On 7/14/2011 3:46 PM, Billy Mays wrote:
> I noticed that if a file is being continuously written to, the file
> generator does not notice it:


Because it does not look, as Ian explained.

> def getLines(f):
> lines = []
> for line in f:
> lines.append(line)
> return lines


This nearly duplicates .readlines, except for using f an an iterator.
Try the following (untested):

with open('/var/log/syslog', 'rb') as f:
lines = f.readlines()
# do some processing with lines
# /var/log/syslog gets updated in the mean time
lines = f.readlines()

People regularly do things like this with readline, so it is possible.
If above does not work, try (untested):

def getlines(f):
lines = []
while True:
l = f.readline()
if l: lines.append(l)
else: return lines

--
Terry Jan Reedy

 
Reply With Quote
 
bruno.desthuilliers@gmail.com
Guest
Posts: n/a
 
      07-15-2011
On Jul 14, 9:46*pm, Billy Mays <(E-Mail Removed)> wrote:
> I noticed that if a file is being continuously written to, the file
> generator does not notice it:
>
> def getLines(f):
> * * *lines = []
> * * *for line in f:
> * * * * *lines.append(line)
> * * *return lines


what's wrong with file.readlines() ?
 
Reply With Quote
 
Billy Mays
Guest
Posts: n/a
 
      07-15-2011
On 07/15/2011 04:01 AM, http://www.velocityreviews.com/forums/(E-Mail Removed) wrote:
> On Jul 14, 9:46 pm, Billy Mays<(E-Mail Removed)> wrote:
>> I noticed that if a file is being continuously written to, the file
>> generator does not notice it:
>>
>> def getLines(f):
>> lines = []
>> for line in f:
>> lines.append(line)
>> return lines

>
> what's wrong with file.readlines() ?


Using that will read the entire file into memory which may not be
possible. In the library reference, it mentions that using the
generator (which calls file.next()) uses a read ahead buffer to
efficiently loop over the file. If I call .readline() myself, I forfeit
that performance gain.

I was thinking that a convenient solution to this problem would be to
introduce a new Exception call PauseIteration, which would signal to the
caller that there is no more data for now, but not to close down the
generator entirely.

--
Bill
 
Reply With Quote
 
Thomas Rachel
Guest
Posts: n/a
 
      07-15-2011
Am 14.07.2011 21:46 schrieb Billy Mays:
> I noticed that if a file is being continuously written to, the file
> generator does not notice it:


Yes. That's why there were alternative suggestions in your last thread
"How to write a file generator".

To repeat mine: an object which is not an iterator, but an iterable.

class Follower(object):
def __init__(self, file):
self.file = file
def __iter__(self):
while True:
l = self.file.readline()
if not l: return
yield l

if __name__ == '__main__':
import time
f = Follower(open("/var/log/messages"))
while True:
for i in f: print i,
print "all read, waiting..."
time.sleep(4)

Here, you iterate over the object until it is exhausted, but you can
iterate again to get the next entries.

The difference to the file as iterator is, as you have noticed, that
once an iterator is exhausted, it will be so forever.

But if you have an iterable, like the Follower above, you can reuse it
as you want.
 
Reply With Quote
 
Billy Mays
Guest
Posts: n/a
 
      07-15-2011
On 07/15/2011 08:39 AM, Thomas Rachel wrote:
> Am 14.07.2011 21:46 schrieb Billy Mays:
>> I noticed that if a file is being continuously written to, the file
>> generator does not notice it:

>
> Yes. That's why there were alternative suggestions in your last thread
> "How to write a file generator".
>
> To repeat mine: an object which is not an iterator, but an iterable.
>
> class Follower(object):
> def __init__(self, file):
> self.file = file
> def __iter__(self):
> while True:
> l = self.file.readline()
> if not l: return
> yield l
>
> if __name__ == '__main__':
> import time
> f = Follower(open("/var/log/messages"))
> while True:
> for i in f: print i,
> print "all read, waiting..."
> time.sleep(4)
>
> Here, you iterate over the object until it is exhausted, but you can
> iterate again to get the next entries.
>
> The difference to the file as iterator is, as you have noticed, that
> once an iterator is exhausted, it will be so forever.
>
> But if you have an iterable, like the Follower above, you can reuse it
> as you want.



I did see it, but it feels less pythonic than using a generator. I did
end up using an extra class to get more data from the file, but it seems
like overhead. Also, in the python docs, file.next() mentions there
being a performance gain for using the file generator (iterator?) over
the readline function.

Really what would be useful is some sort of PauseIteration Exception
which doesn't close the generator when raised, but indicates to the
looping header that there is no more data for now.

--
Bill


 
Reply With Quote
 
Chris Angelico
Guest
Posts: n/a
 
      07-15-2011
On Fri, Jul 15, 2011 at 10:52 PM, Billy Mays <(E-Mail Removed)> wrote:
> Really what would be useful is some sort of PauseIteration Exception which
> doesn't close the generator when raised, but indicates to the looping header
> that there is no more data for now.
>


All you need is a sentinel yielded value (eg None).

ChrisA
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Iteration through File.file? misses entries for whichFile.file?(entry) == true Kyle Barbour Ruby 10 08-02-2010 08:55 PM
Struts - Problem with nested iteration or double iteration Rudi Java 5 10-01-2008 03:30 AM
[BUG] Ruby 1.9.0: possible bug when subclassing BasicObject Michael Fellinger Ruby 3 12-27-2007 04:05 PM
*bug* *bug* *bug* David Raleigh Arnold Firefox 12 04-02-2007 03:13 AM
Re: Possible fix for Bug 494589 - os.path.expandvars bug Steve Holden Python 1 07-02-2003 09:42 PM



Advertisments