In article <irm8kv$hgm$>,
Billy Mays <> wrote:
>
>On the GPU I am using, threads execute simultaneously as long as they
>all execute the exact same instructions (ie, they all take the branch).
> If one of the threads takes a different execution path, then each
>thread has to run in serial which means the parallel speed gain is lost.
Wow. I'm curious as to what GPU this is, and what compiler you're
using that takes advantage of this.
<woolgathering>
I worked on Sun's "Majik" architecture (I may have the spelling wrong)
back in the day. It effectively had four 32-bit cpus, running in tandem.
The first cpu was more general-purpose than the other three, and had a full
set of control and branch instructions. The other three could only do
arithmetic, and operated in lock-step with the primary cpu.
If the calculations of a, b, and c in the original code were all
identical, then the compiler would typically arrange for them to
be computed in parallel and the three secondary cpus and the results
combined at the end. I wouldn't really call these separate threads,
but they were definitely parallel computations.
</woolgathering>
Sounds like you're describing some sort of similar vector processor,
in which case, yes, I see why you would want the compiler to
compile (a && b && c) without short-circuiting.
Of course, a sufficiently smart compiler would recognize that b and c can
be computed without harm even if their values wouldn't be needed after
all.
Have you actually compiled your code and determined that the compiler
is not using parallelism?
--
-Ed Falk,
http://thespamdiaries.blogspot.com/