Velocity Reviews

Velocity Reviews (http://www.velocityreviews.com/forums/index.php)
-   Python (http://www.velocityreviews.com/forums/f43-python.html)
-   -   Recommended number of threads? (in CPython) (http://www.velocityreviews.com/forums/t703387-recommended-number-of-threads-in-cpython.html)

mk 10-29-2009 03:56 PM

Recommended number of threads? (in CPython)
 
Hello everyone,

I wrote run-of-the-mill program for concurrent execution of ssh command
over a large number of hosts. (someone may ask why reinvent the wheel
when there's pssh and shmux around -- I'm not happy with working details
and lack of some options in either program)

The program has a working queue of threads so that no more than
maxthreads number are created and working at particular time.

But this begs the question: what is the recommended number of threads
working concurrently? If it's dependent on task, the task is: open ssh
connection, execute command (then the main thread loops over the queue
and if the thread is finished, it closes ssh connection and does .join()
on the thread)

I found that when using more than several hundred threads causes weird
exceptions to be thrown *sometimes* (rarely actually, but it happens
from time to time). Although that might be dependent on modules used in
threads (I'm using paramiko, which is claimed to be thread safe).



Falcolas 10-29-2009 04:03 PM

Re: Recommended number of threads? (in CPython)
 
On Oct 29, 9:56*am, mk <mrk...@gmail.com> wrote:
> Hello everyone,
>
> I wrote run-of-the-mill program for concurrent execution of ssh command
> over a large number of hosts. (someone may ask why reinvent the wheel
> when there's pssh and shmux around -- I'm not happy with working details
> and lack of some options in either program)
>
> The program has a working queue of threads so that no more than
> maxthreads number are created and working at particular time.
>
> But this begs the question: what is the recommended number of threads
> working concurrently? If it's dependent on task, the task is: open ssh
> connection, execute command (then the main thread loops over the queue
> and if the thread is finished, it closes ssh connection and does .join()
> on the thread)
>
> I found that when using more than several hundred threads causes weird
> exceptions to be thrown *sometimes* (rarely actually, but it happens
> from time to time). Although that might be dependent on modules used in
> threads (I'm using paramiko, which is claimed to be thread safe).


Since you're creating OS threads when doing this, your issue is
probably more related to your OS' implementation of threads than
Python. That said, several hundred threads, regardless of them being
blocked by the GIL, sounds like a recipe for trouble on most machines,
but as usual YMMV.

If you're running into problems with a large number of connections
(not related to a socket limit), you might look into doing it
asynchronously - loop over a list of connections and do non-blocking
reads to see if your command has completed. I've done this
successfully with pexpect, and didn't run into any issues with the
underlying OS.

Garrick

Neil Hodgson 10-29-2009 09:48 PM

Re: Recommended number of threads? (in CPython)
 
mk:

> I found that when using more than several hundred threads causes weird
> exceptions to be thrown *sometimes* (rarely actually, but it happens
> from time to time).


If you are running on a 32-bit environment, it is common to run out
of address space with many threads. Each thread allocates a stack and
this allocation may be as large as 10 Megabytes on Linux. With a 4
Gigabyte 32-bit address space this means that the maximum number of
threads will be 400. In practice, the operating system will further
subdivide the address space so only 200 to 300 threads will be possible.
On Windows, I think the normal stack allocation is 1 Megabyte.

The allocation is only of address space, not memory since memory can
be mapped into this space when it is needed and many threads do not need
very much stack.

Neil

Paul Rubin 10-29-2009 09:56 PM

Re: Recommended number of threads? (in CPython)
 
Neil Hodgson <nyamatongwe+thunder@gmail.com> writes:
> If you are running on a 32-bit environment, it is common to run out
> of address space with many threads. Each thread allocates a stack and
> this allocation may be as large as 10 Megabytes on Linux.


I'm sure it's smaller than that under most circumstances. I run
python programs with hundreds of threads all the time, and they don't
use gigabytes of memory.

Dave Angel 10-29-2009 11:26 PM

Re: Recommended number of threads? (in CPython)
 
Paul Rubin wrote:
> Neil Hodgson <nyamatongwe+thunder@gmail.com> writes:
>
>> If you are running on a 32-bit environment, it is common to run out
>> of address space with many threads. Each thread allocates a stack and
>> this allocation may be as large as 10 Megabytes on Linux.
>>

>
> I'm sure it's smaller than that under most circumstances. I run
> python programs with hundreds of threads all the time, and they don't
> use gigabytes of memory.
>
>

As Neil pointed out further on, in the same message you quoted, address
space is not the same as allocated memory. It's easy to run out of
allocatable address space long before you run out of virtual memory, or
swap space.

Any time a buffer is needed that will need to be contiguous (such as a
return stack), the address space for the max possible size must be
reserved, but the actual virtual memory allocations (which is what you
see when you're using the system utilities to display memory usage) are
done incrementally, as needed.

It's been several years, but I believe the two terms on Windows are
"reserve" and "commit." Reserve is done in multiples of 64k, and commit
in multiples of 4k.

DaveA


Aahz 11-02-2009 09:21 PM

Re: Recommended number of threads? (in CPython)
 
In article <mailman.2256.1256831821.2807.python-list@python.org>,
mk <mrkafk@gmail.com> wrote:
>
>I wrote run-of-the-mill program for concurrent execution of ssh command
>over a large number of hosts. (someone may ask why reinvent the wheel
>when there's pssh and shmux around -- I'm not happy with working details
>and lack of some options in either program)
>
>The program has a working queue of threads so that no more than
>maxthreads number are created and working at particular time.
>
>But this begs the question: what is the recommended number of threads
>working concurrently? If it's dependent on task, the task is: open ssh
>connection, execute command (then the main thread loops over the queue
>and if the thread is finished, it closes ssh connection and does .join()
>on the thread)


Given that you code is not just I/O-bound but wait-bound, I suggest
following the suggestion to use asynch code -- then you could open a
connection to every single machine simultaneously. Assuming your system
setup can handle the load, that is.
--
Aahz (aahz@pythoncraft.com) <*> http://www.pythoncraft.com/

[on old computer technologies and programmers] "Fancy tail fins on a
brand new '59 Cadillac didn't mean throwing out a whole generation of
mechanics who started with model As." --Andrew Dalke


All times are GMT. The time now is 04:55 PM.

Powered by vBulletin®. Copyright ©2000 - 2014, vBulletin Solutions, Inc.
SEO by vBSEO ©2010, Crawlability, Inc.