Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > Re: Best processor (i386) for Python performance?

Reply
Thread Tools

Re: Best processor (i386) for Python performance?

 
 
Brett C.
Guest
Posts: n/a
 
      08-26-2004
In terms of multithreading, an I/O intensive app that is threaded can
make use of dual procs. Otherwise threaded apps can't for technical
reasons (GIL and such but don't need to get into those details).

 
Reply With Quote
 
 
 
 
Grant Edwards
Guest
Posts: n/a
 
      08-26-2004
On 2004-08-26, Brett C. <(E-Mail Removed)> wrote:

> In terms of multithreading, an I/O intensive app that is
> threaded can make use of dual procs. Otherwise threaded apps
> can't for technical reasons (GIL and such but don't need to
> get into those details).


That's rather dissappointing. If I write a multi-threaded app
in C it can utilize multiple processors, but the same app in
Python can't?

Not that I _have_ any SMP machines...

--
Grant Edwards grante Yow! I Know A Joke!!
at
visi.com
 
Reply With Quote
 
 
 
 
Dave Brueck
Guest
Posts: n/a
 
      08-26-2004
Grant Edwards wrote:
> On 2004-08-26, Brett C. <(E-Mail Removed)> wrote:
>
>
>>In terms of multithreading, an I/O intensive app that is
>>threaded can make use of dual procs. Otherwise threaded apps
>>can't for technical reasons (GIL and such but don't need to
>>get into those details).

>
>
> That's rather dissappointing. If I write a multi-threaded app
> in C it can utilize multiple processors, but the same app in
> Python can't?


Depends on what the multithreaded app _does_. If multiple processors are
present then Python will use them, but how well they get used depends on
how much and for what reasons the GIL gets released.

I/O is the most common reason, so adding another processor to an I/O
bound program can give you a good performance boost (in our lab I've
seen easily 75% improvement over a single proc box for a program that
was very I/O bound, but I haven't measured it to see if it's closer to
75% or to 100% improvement).

Another easy boost comes if your app already calls out to a
GIL-releasing C function for CPU-intensive work, then adding a CPU can
give similar speed boosts - we have only one such case and although
there was noticable speedup in dual vs single processors, I never
attempted to quantify it. And the normal restrictions on parallel
computing apply - if whatever you're doing can't be done in parallel
anyway, then adding a CPU isn't helpful.

FWIW I haven't noticed a case where adding a CPU improved performance by
*less* than ~25%, probably because the GIL gets released here and there
for various operations anyway, and having an existing multithreaded app
where multiple threads are CPU bound is somewhat uncommon.

But then again very few of the projects I work on end up having CPU as
the most scarce resource so the machines that do have multiple CPUs are
that way because they are running oodles of other processes as well.

-Dave
 
Reply With Quote
 
Grant Edwards
Guest
Posts: n/a
 
      08-26-2004
On 2004-08-26, Dave Brueck <(E-Mail Removed)> wrote:

>>>In terms of multithreading, an I/O intensive app that is
>>>threaded can make use of dual procs. Otherwise threaded apps
>>>can't for technical reasons (GIL and such but don't need to get
>>>into those details).

>>
>> That's rather dissappointing. If I write a multi-threaded app
>> in C it can utilize multiple processors, but the same app in
>> Python can't?

>
> Depends on what the multithreaded app _does_. If multiple
> processors are present then Python will use them, but how well
> they get used depends on how much and for what reasons the GIL
> gets released.
>
> I/O is the most common reason, so adding another processor to
> an I/O bound program can give you a good performance boost (in
> our lab I've seen easily 75% improvement over a single proc
> box for a program that was very I/O bound, but I haven't
> measured it to see if it's closer to 75% or to 100%
> improvement).


Now that I think about it, in my multi-threaded apps all the
threads almost always end up blocking on I/O. A couple years
back I even added a GIL release to some of the termios() calls
so that I could get more parallelism when threads are waiting
for serial ports to drain or flush.

--
Grant Edwards grante Yow! I wonder if there's
at anything GOOD on tonight?
visi.com
 
Reply With Quote
 
David Bolen
Guest
Posts: n/a
 
      08-26-2004
Dave Brueck <(E-Mail Removed)> writes:

> Grant Edwards wrote:

(...)
> I/O is the most common reason, so adding another processor to an I/O
> bound program can give you a good performance boost (in our lab I've
> seen easily 75% improvement over a single proc box for a program that
> was very I/O bound, but I haven't measured it to see if it's closer to
> 75% or to 100% improvement).


I don't doubt the performance gains, but I'd argue that if you are
seeing that sort of improvement, then you clearly don't have an I/O
bound program at all, but a compute bound one. By definition, an I/O
bound program's performance is gated by the I/O operations, and not
CPU usage, so adding more CPU shouldn't really change anything. After
all, if your program is "very I/O bound" it means it is waiting on I/O
virtually all of the time (and thus not executing any Python code
using the CPU), so where would adding CPU time gain anything?

I do think it can be tricky to determine just what case an application
falls into (and many oscillate between I/O and CPU bound modes), and
indeed a purely CPU bound Python application (if in Python code and
not a well-behaving extension module) isn't going to be helped at all.

But to see benefit from additional CPUs for a Python application, I
believe you're really looking for a multithreaded application that is
technically compute bound - certainly on a instant to instant basis if
not overall - but which performs a lot of (or at least regular) I/O
operations (or as you note, other extension calls which release the
GIL). The good news is that I believe many applications do fall into
this category, even if from the outside they might be considered I/O
bound, if only because it doesn't take too much executing Python code
to process the I/O responses to create a CPU bottleneck at a given
instant.

(...)
> But then again very few of the projects I work on end up having CPU as
> the most scarce resource so the machines that do have multiple CPUs
> are that way because they are running oodles of other processes as
> well.


This is an excellent point since even if the only advantage to the
extra CPUs was to free up more of a single CPU for a Python
application, you'd still see a net gain for that application when
running in its real world environment.

-- David
 
Reply With Quote
 
Dave Brueck
Guest
Posts: n/a
 
      08-26-2004
David Bolen wrote:
> I don't doubt the performance gains, but I'd argue that if you are
> seeing that sort of improvement, then you clearly don't have an I/O
> bound program at all, but a compute bound one.


Ugh, yes. Thanks for the correction! The application in question was an
object layer in front of a database - it spent most of its time pickling
and unpickling objects, so the bulk of the performance gains probably
came from the database speeding up (it was on the same box).

>>But then again very few of the projects I work on end up having CPU as
>>the most scarce resource so the machines that do have multiple CPUs
>>are that way because they are running oodles of other processes as
>>well.

>
> This is an excellent point since even if the only advantage to the
> extra CPUs was to free up more of a single CPU for a Python
> application, you'd still see a net gain for that application when
> running in its real world environment.


Good call.

Thanks,
Dave
 
Reply With Quote
 
Ville Vainio
Guest
Posts: n/a
 
      08-26-2004
>>>>> "David" == David Bolen <(E-Mail Removed)> writes:

David> I do think it can be tricky to determine just what case an
David> application falls into (and many oscillate between I/O and
David> CPU bound modes), and indeed a purely CPU bound Python
David> application (if in Python code and not a well-behaving
David> extension module) isn't going to be helped at all.

The sensible thing to do then is to use multiple processes, not just
multiple threads. Many Python apps use Queue.Queue anyway, and such an
approach is often easy to convert over to use processes instead of
threads.

In fact, it might be fun to have a trivial message queue
implementation in the standard library:

# server code

frow mq import *

q,results = MQueue(),MQueue()

# file has just a handle, like
# mq:123.12.12.54:67

q.publish(open("~/jobs","w"))
results.publish(open("~/result","w"))
spawn_server_if_needed()

while 1:
job = q.get()
res = my_handle_job( job )
results.put(res)


# client code

....

req, results = MQueue(open("~/job")), MQueue(open("~/results"))

req.put( ("easyjob", 34, 2.44) )
req.put( ("easyjob", 213, 2.44) )

....


Obviously these trivial mqueues could still be wrapped with additional
rendezvous functionality:

job = mq.Job(("hello",2))

rv = mq.Rendezvous(q,resultqueue)

rv.put(job)

res = job.result() # blocks until result is ready

Though this might be more in the territory of external
libs/frameworks... but hey, we've already got xml-rpc and web server
functionality .

Inter-language systems should obviously something like Corba for this.

--
Ville Vainio http://tinyurl.com/2prnb
 
Reply With Quote
 
David Bolen
Guest
Posts: n/a
 
      08-26-2004
Ville Vainio <(E-Mail Removed)> writes:

> >>>>> "David" == David Bolen <(E-Mail Removed)> writes:

>
> David> I do think it can be tricky to determine just what case an
> David> application falls into (and many oscillate between I/O and
> David> CPU bound modes), and indeed a purely CPU bound Python
> David> application (if in Python code and not a well-behaving
> David> extension module) isn't going to be helped at all.
>
> The sensible thing to do then is to use multiple processes, not just
> multiple threads. Many Python apps use Queue.Queue anyway, and such an
> approach is often easy to convert over to use processes instead of
> threads.


Well, "sensible" may depend on your needs and environment. I'm far
less a fan of multi-process situations under Windows than I am under
Unix systems for example. In Windows process creation is far less
efficient, and proper parent/child relationships don't always work
properly (particularly when it comes to killing processes off) and
such. Threading, on the other hand, just plain works extremely well,
at least on the WinNT/2K/XP variants. That's almost backwards to the
way I feel about things under Unix, where the various thread
implementations and support for them on different systems can make
separate processes more attractive.

But you're right that multi-process solutions are certainly something
to keep in the toolbox as available options.

-- David
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
when do I see this? XSLTProcessor processor XSLTProcessor processor brahatha Java 1 06-15-2007 10:52 AM
win xp pro sp2 64 bit is a multi processor or a uni processor =?Utf-8?B?dW1lc2g=?= Windows 64bit 4 08-01-2006 05:24 AM
AMD 64 X2 Processor: Any what to tell what program/process is assigned to processor? The Frozen Canuck Windows 64bit 1 01-16-2006 07:45 PM
Best processor (i386) for Python performance? Tom Locke Python 5 08-27-2004 11:09 AM
Processor fried, should I upgrade or just buy a processor? Dim Computer Support 6 06-21-2004 08:11 PM



Advertisments