Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > select() call and filedescriptor out of range in select error

Reply
Thread Tools

select() call and filedescriptor out of range in select error

 
 
k3xji
Guest
Posts: n/a
 
      09-16-2010
Hi all,

We have a select-based server written in Python. Occasionally, maybe
twice a month there occurs a weird problem, select() returns with
filedescriptor out of range in select() error. This is of course a
normal error and handled gracefully. Our policy is to take down few
users for select() to handle the next cycle. However, once this error
occurs, this also fails too:

self.__Sockets.remove(socket)

self.__Socket's is the very basic list of sockets we use in our IO
loop. The call fails with:
remove(x): x not in list

First of all, in our entire application there is no line of code like
remove(x), meaning there is no x variable. Second, the Exception shows
the line number containing above code. So
self.__Sockets.remove(socket) this fails with remove(x): x not in
list....

I cannot understand the problem. It happens in sporadic manner and it
feels that the ValueError of select() call somehow corrupts the List
structure itself in Python? Not sure if something like that is
possible.

Thanks in advance,
 
Reply With Quote
 
 
 
 
Ned Deily
Guest
Posts: n/a
 
      09-16-2010
In article
<(E-Mail Removed)>,
k3xji <(E-Mail Removed)> wrote:
> We have a select-based server written in Python. Occasionally, maybe
> twice a month there occurs a weird problem, select() returns with
> filedescriptor out of range in select() error. This is of course a
> normal error and handled gracefully. Our policy is to take down few
> users for select() to handle the next cycle. However, once this error
> occurs, this also fails too:
>
> self.__Sockets.remove(socket)
>
> self.__Socket's is the very basic list of sockets we use in our IO
> loop. The call fails with:
> remove(x): x not in list
>
> First of all, in our entire application there is no line of code like
> remove(x), meaning there is no x variable. Second, the Exception shows
> the line number containing above code. So
> self.__Sockets.remove(socket) this fails with remove(x): x not in
> list....
>
> I cannot understand the problem. It happens in sporadic manner and it
> feels that the ValueError of select() call somehow corrupts the List
> structure itself in Python? Not sure if something like that is
> possible.


That error message is a generic exception message. It just means the
object to be removed is not in the list. For example:

>>> l = [a, b]
>>> a, b = 1, 2
>>> l = [a, b]
>>> l.remove(a)
>>> l.remove(a)

Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: list.remove(x): x not in list

If the problem is that the socket object in question no longer exists,
you can protect your code there by enclosing the remove operation in a
try block, like:

try:
self.__Sockets.remove(socket)
except ValueError:
pass

--
Ned Deily,
http://www.velocityreviews.com/forums/(E-Mail Removed)

 
Reply With Quote
 
 
 
 
Steven D'Aprano
Guest
Posts: n/a
 
      09-16-2010
On Wed, 15 Sep 2010 21:05:49 -0700, k3xji wrote:

> Hi all,
>
> We have a select-based server written in Python. Occasionally, maybe
> twice a month there occurs a weird problem, select() returns with
> filedescriptor out of range in select() error. This is of course a
> normal error and handled gracefully. Our policy is to take down few
> users for select() to handle the next cycle. However, once this error
> occurs, this also fails too:
>
> self.__Sockets.remove(socket)
>
> self.__Socket's is the very basic list of sockets we use in our IO loop.
> The call fails with:
> remove(x): x not in list



Please show the *exact* error message, including the traceback, by
copying and pasting it. Do not retype it by hand, or summarize it, or put
it into your own words.



> First of all, in our entire application there is no line of code like
> remove(x), meaning there is no x variable.


Look at this example:

>>> sockets = []
>>> sockets.remove("Hello world")

Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: list.remove(x): x not in list


"x" is just a placeholder. It doesn't refer to an actual variable x.


> Second, the Exception shows
> the line number containing above code. So self.__Sockets.remove(socket)
> this fails with remove(x): x not in list....


Exactly.



> I cannot understand the problem. It happens in sporadic manner and it
> feels that the ValueError of select() call somehow corrupts the List
> structure itself in Python? Not sure if something like that is possible.


Anything is possible, but it's not likely. What's far more likely is that
you have a bug in your code, and that somehow, under rare circumstances,
it tries to remove something from a list that was never inserted into the
list. Or it tries to remove it twice.

My guess is something like this:

try:
socket = get_socket()
self._sockets.append(socket)
except SomeError:
pass
# later on
self._sockets.remove(socket)




--
Steven
 
Reply With Quote
 
James Mills
Guest
Posts: n/a
 
      09-16-2010
On Thu, Sep 16, 2010 at 2:49 PM, Ned Deily <(E-Mail Removed)> wrote:
> If the problem is that the socket object in question no longer exists,
> you can protect your code there by enclosing the remove operation in a
> try block, like:



The question that remains to be seen however is:

Why does your list contain dirty data ? Your code has likely removed
the socket object from the list before, why is it attempting to remove
it again ?

I would consider you re-look at your code's logic rather than patch
up the code with a "band-aid-solution".

cheers
James


--
-- James Mills
--
-- "Problems are solved by method"
 
Reply With Quote
 
k3xji
Guest
Posts: n/a
 
      09-16-2010
> Please show the *exact* error message, including the traceback, by
> copying and pasting it. Do not retype it by hand, or summarize it, or put
> it into your own words.


Unfortunately this is not possible. The logging system I designed only
gives the following information, as we have millions of logs per-day
of custom exceptions I didnot include the full traceback.Here is only
what I have:

144 15/09/10 20:02:08 -[*] ERROR: Physical max client limit
reached. Please contact maintenance.filedescriptor out of range in
select()[scSocketServer.py:215:][Port:515]

The code generating the error is:

try:
self.__ReadersInCycle, self.__WritersInCycle,
e = \
select( self.__Sockets,
self.__WritersInCycle, [],
base.scOptions.scOPT_SELECT_TIMEOUT)

except ValueError, e:
LogError('Physical max client limit reached.'
\
' Please contact maintenance.'+ str(e))
self.scSvr_OnClientPhysicalLimitReached()
#define a policy here
continue

> > First of all, in our entire application there is no line of code like
> > remove(x), meaning there is no x variable.

>
> Look at this example:
>
> >>> sockets = []
> >>> sockets.remove("Hello world")

>
> Traceback (most recent call last):
> * File "<stdin>", line 1, in <module>
> ValueError: list.remove(x): x not in list
>


Ok. Thanks.

> Anything is possible, but it's not likely. What's far more likely is that
> you have a bug in your code, and that somehow, under rare circumstances,
> it tries to remove something from a list that was never inserted into the
> list. Or it tries to remove it twice.
>
> My guess is something like this:
>
> try:
> * * socket = get_socket()
> * * self._sockets.append(socket)
> except SomeError:
> * * pass
> # later on
> self._sockets.remove(socket)
>


Hmm.. Might be, but inside the self.__Sockets list there is the
ListenSocket() which is the real listening socket. Naturally, I am
using it in the read list of select() on every server cycle. The weird
thing is that the ListenSocket itself is throwing the "not in list"
exception, too! And one thing I am sure is that I have not written any
kind of code that removes the Listen socket from the List, that is
just impossible. Additionaly, there are very few places that I
traverse the __Sockets list for optimization. The only places I delete
something from the __Sockets list:

1) a user disconnects (normal disconnect, authentication or ping
timeout)
3) server is being stopped or restarted

Other than that there is not access to that variable from outside
objects, as can be seen it is also private. And please keep in mind
that this bug is there for about a year, so many code reviews have
passed successfully without noticing the type of error you are
suggesting.

And more information on system: I am running Python 2.4 on CentOS.

By the way, through digging the logs and system, it turns out
select(..) is hitting the per-process FD limit. Although the system
wide ulimit is unlimited, I think Python "selectmodule.c" enforces
the rule to 1024. I am getting the error after hitting that limit and
somehow as I just explained the __ListenSocket is being removed from
the read list which causes it to be lost and Server instance is just
lost forever. Putting a try..except to that code and re-init server
port is a solution but I guess a bad one, because I will have not
found the root cause.

Thanks in advance,
 
Reply With Quote
 
Steven D'Aprano
Guest
Posts: n/a
 
      09-16-2010
On Thu, 16 Sep 2010 15:51:38 +1000, James Mills wrote:

> On Thu, Sep 16, 2010 at 2:49 PM, Ned Deily <(E-Mail Removed)> wrote:
>> If the problem is that the socket object in question no longer exists,
>> you can protect your code there by enclosing the remove operation in a
>> try block, like:

>
>
> The question that remains to be seen however is:
>
> Why does your list contain dirty data ? Your code has likely removed the
> socket object from the list before, why is it attempting to remove it
> again ?
>
> I would consider you re-look at your code's logic rather than patch up
> the code with a "band-aid-solution".


Well said.


--
Steven
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Re: ValueError: filedescriptor out of range in select() Laszlo Nagy Python 0 03-17-2009 05:03 PM
ValueError: filedescriptor out of range in select() Laszlo Nagy Python 0 03-17-2009 02:04 PM
FILEDESCRIPTOR Structure question sarada7@gmail.com C++ 1 09-16-2006 03:13 AM
Re: filedescriptor out of range in select() Andrew Bennetts Python 5 06-30-2003 12:11 PM
filedescriptor out of range in select() Paolo Invernizzi Python 0 06-26-2003 07:45 AM



Advertisments