Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > enumerate overflow

Reply
Thread Tools

enumerate overflow

 
 
crwe@post.cz
Guest
Posts: n/a
 
      10-03-2007
Hello all,

in python2.4, i read lines from a file with

for lineNum, line in enumerate(f): ...

However, lineNum soon overflows and starts counting backwards. How do
i force enumerate to return long integer?

Cheers.

 
Reply With Quote
 
 
 
 
Diez B. Roggisch
Guest
Posts: n/a
 
      10-03-2007
http://www.velocityreviews.com/forums/(E-Mail Removed) schrieb:
> Hello all,
>
> in python2.4, i read lines from a file with
>
> for lineNum, line in enumerate(f): ...
>
> However, lineNum soon overflows and starts counting backwards. How do
> i force enumerate to return long integer?


Most probably you can't, because it is a C-written function I presume.

But as python 2.4 has generators, it's ease to create an enumerate yourself:


def lenumerate(f):
i = 0
for line in f:
yield i, line
i += 1

Diez
 
Reply With Quote
 
 
 
 
Tim Chase
Guest
Posts: n/a
 
      10-03-2007
>> in python2.4, i read lines from a file with
>>
>> for lineNum, line in enumerate(f): ...
>>
>> However, lineNum soon overflows and starts counting backwards. How do
>> i force enumerate to return long integer?

>
> Most probably you can't, because it is a C-written function I presume.
>
> But as python 2.4 has generators, it's ease to create an enumerate yourself:
>
>
> def lenumerate(f):
> i = 0
> for line in f:
> yield i, line
> i += 1



I'd consider this a bug: either in the implementation of
enumerate(), or in the documentation

http://docs.python.org/lib/built-in-funcs.html#l2h-24

which fails to mention such arbitrary limitations. The
documentation describes what you create as an lenumerate() function.

Most likely, if one doesn't want to change the implementation,
one should update the documentation for enumerate() to include a
caveat like xrange() has

http://docs.python.org/lib/built-in-funcs.html#l2h-80

"""
Note: xrange() is intended to be simple and fast. Implementations
may impose restrictions to achieve this. The C implementation of
Python restricts all arguments to native C longs ("short" Python
integers), and also requires that the number of elements fit in a
native C long.
"""

While yes, it's easy enough to create the above lenumerate
generator (just as it's only slightly more work to create an
lxrange function), it would be good if the docs let you know that
you might need to create such a function

-tkc



 
Reply With Quote
 
Steve Holden
Guest
Posts: n/a
 
      10-03-2007
(E-Mail Removed) wrote:
> Hello all,
>
> in python2.4, i read lines from a file with
>
> for lineNum, line in enumerate(f): ...
>
> However, lineNum soon overflows and starts counting backwards. How do
> i force enumerate to return long integer?
>

Just how "soon" exactly do you read sys.maxint lines from a file? I
should have thought that it would take a significant amount of time to
read 2,147,483,647 lines ...

But it is true that Python 2.5 uses an enumobject representation that
limits the index to a (C) long:

typedef struct {
PyObject_HEAD
long en_index; /* current index of enumeration */
PyObject* en_sit; /* secondary iterator of enumeration */
PyObject* en_result; /* result tuple */
} enumobject;

regards
Steve
--
Steve Holden +1 571 484 6266 +1 800 494 3119
Holden Web LLC/Ltd http://www.holdenweb.com
Skype: holdenweb http://del.icio.us/steve.holden

Sorry, the dog ate my .sigline

 
Reply With Quote
 
Tim Chase
Guest
Posts: n/a
 
      10-03-2007
>> for lineNum, line in enumerate(f): ...
>>
>> However, lineNum soon overflows and starts counting backwards. How do
>> i force enumerate to return long integer?
>>

> Just how "soon" exactly do you read sys.maxint lines from a file? I
> should have thought that it would take a significant amount of time to
> read 2,147,483,647 lines ...


A modestly (but not overwhelmingly) long time:

(defining our own xrange-ish generator that can handle things
larger than longs)

>>> def xxrange(x):

.... i = 0
.... while i < x:
.... yield i
.... i += 1
....
>>> for i,j in enumerate(xxrange(2**33)): assert i==j

....
Traceback (most recent call last):
File "<stdin>", line 1, in ?
AssertionError


It took me about an 60-90 minutes to hit the assertion on a
dual-core 2.8ghz machine under otherwise-light-load. If
batch-processing lengthy log files or other large data such as
genetic data, it's entirely possible to hit this limit as the OP
discovered.

-tkc



 
Reply With Quote
 
Steve Holden
Guest
Posts: n/a
 
      10-03-2007
Tim Chase wrote:
>>> for lineNum, line in enumerate(f): ...
>>>
>>> However, lineNum soon overflows and starts counting backwards. How do
>>> i force enumerate to return long integer?
>>>

>> Just how "soon" exactly do you read sys.maxint lines from a file? I
>> should have thought that it would take a significant amount of time to
>> read 2,147,483,647 lines ...

>
> A modestly (but not overwhelmingly) long time:
>
> (defining our own xrange-ish generator that can handle things larger
> than longs)
>
> >>> def xxrange(x):

> ... i = 0
> ... while i < x:
> ... yield i
> ... i += 1
> ...
> >>> for i,j in enumerate(xxrange(2**33)): assert i==j

> ...
> Traceback (most recent call last):
> File "<stdin>", line 1, in ?
> AssertionError
>
>
> It took me about an 60-90 minutes to hit the assertion on a dual-core
> 2.8ghz machine under otherwise-light-load. If batch-processing lengthy
> log files or other large data such as genetic data, it's entirely
> possible to hit this limit as the OP discovered.
>

I wouldn't dream of suggesting it's impossible. I just regard "soon" as
less than an hour in commuter's terms, I suppose.

regards
Steve
--
Steve Holden +1 571 484 6266 +1 800 494 3119
Holden Web LLC/Ltd http://www.holdenweb.com
Skype: holdenweb http://del.icio.us/steve.holden

Sorry, the dog ate my .sigline
 
Reply With Quote
 
Tim Golden
Guest
Posts: n/a
 
      10-03-2007
Steve Holden wrote:
> I wouldn't dream of suggesting it's impossible.
> I just regard "soon" as less than an hour in
> commuter's terms, I suppose.


Sadly, speaking as a Londoner, an hour is indeed
"soon" in commuter terms.

TJG

 
Reply With Quote
 
Paul Rubin
Guest
Posts: n/a
 
      10-03-2007
Tim Chase <(E-Mail Removed)> writes:
> I'd consider this a bug: either in the implementation of enumerate(),
> or in the documentation
>
> http://docs.python.org/lib/built-in-funcs.html#l2h-24


2.5 has a patch that causes enumerate() and count() to raise overflow
if the count wraps around, which is still bad but at least beats
having the number suddenly go negative. See:

http://bugs.python.org/issue1512504 and
http://mail.python.org/pipermail/pyt...ry/058486.html

also:

http://bugs.python.org/issue1326277

I hope in 3.0 there's a real fix, i.e. the count should promote to
long. The rationale for leaving the bug in the library is just silly.
2**32 is not that big a number if we're talking about a language and
runtime system supposedly good for writing servers that stay up
continuously for years.
 
Reply With Quote
 
Raymond Hettinger
Guest
Posts: n/a
 
      10-03-2007
[Paul Rubin]
> I hope in 3.0 there's a real fix, i.e. the count should promote to
> long.


In Py2.6, I will mostly likely put in an automatic promotion to long
for both enumerate() and count(). It took a while to figure-out how
to do this without killing the performance for normal cases (ones used
in real programs, not examples contrived to say, "omg, see what
*could* happen").


Raymond



 
Reply With Quote
 
Paul Rubin
Guest
Posts: n/a
 
      10-03-2007
Raymond Hettinger <(E-Mail Removed)> writes:
> In Py2.6, I will mostly likely put in an automatic promotion to long
> for both enumerate() and count(). It took a while to figure-out how
> to do this without killing the performance for normal cases (ones used
> in real programs, not examples contrived to say, "omg, see what
> *could* happen").


Great, this is good to hear. I think it's ok if the enumeration slows
down after fixnum overflow is reached. So it's just a matter of
replacing the overflow signal with consing up a long. The fixnum case
would be the same as it is now. To be fancy, the count could be
stored in two C ints (or a gcc long long) so it would go up to 64 bits
but I don't think it's worth it, especially for itertools.count which
should be able to take arbitrary (i.e. larger than 64 bits) initializers.

As for real programs, well, the Y2038 bug is slowly creeping up on us.
That's when Unix timestamps overflow a signed 32-bit counter. It's
already caused an actual system failure, in 2006:

http://worsethanfailure.com/Articles...he_Epoch_.aspx

Really, the whole idea of int/long unification is so we can stop
worrying about "omg, that could happen". We want to write programs
without special consideration or "omg" about those possibilities, and
still have them keep working smoothly if that DOES happen. Just about
all of us these days have 100's of GB's or more of disk space on our
systems, and files with over 2**32 bytes or lines are not even
slightly unreasonable. We shouldn't have to write special generators
to deal with them, the library should instead just do the right thing.
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
enumerate roles a user belongs to ? (seek vb.net sample) Mad Scientist Jr ASP .Net 1 08-31-2004 05:28 PM
How to enumerate sessions? Arsen Vladimirskiy ASP .Net 2 01-09-2004 03:31 PM
Enumerate Control Attributes? localhost ASP .Net 7 12-22-2003 01:05 PM
enumerate Users in Activedirectory group shiv ASP .Net 3 12-03-2003 08:55 PM
Enumerate Roles? poi ASP .Net 1 11-15-2003 02:48 AM



Advertisments