Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > C Programming > Short-circuiting in C

Reply
Thread Tools

Short-circuiting in C

 
 
Edward A. Falk
Guest
Posts: n/a
 
      05-31-2011
In article <>,
Ben Pfaff <> wrote:
>Thomas Richter <> writes:
>
>>> x = a;
>>> y = b;
>>> z = c;
>>> if (x&& y&& z) {
>>> /* Do stuff */
>>> }

>>

>The code that I presented always evaluates all of 'a', 'b', and
>'c', which is what the OP said that he wanted.


A sufficiently smart compiler would realize that if y and z have no
side-effects, then their computation can be deferred until they're needed.
The above code would be optimized back to the original.

Now if y and z do have side-effects, then this would indeed generate
different code, forcing them to be evaluated when they might not have
been in the original.

Note that OP wanted to force evaluation of y and z in order to remove
some expensive branches. However, I think that any analysis would show
that the branches are cheaper than evaluating y and z in almost all cases.

--
-Ed Falk,
http://thespamdiaries.blogspot.com/
 
Reply With Quote
 
 
 
 
Edward A. Falk
Guest
Posts: n/a
 
      05-31-2011
In article <irm8kv$hgm$>,
Billy Mays <> wrote:
>
>On the GPU I am using, threads execute simultaneously as long as they
>all execute the exact same instructions (ie, they all take the branch).
> If one of the threads takes a different execution path, then each
>thread has to run in serial which means the parallel speed gain is lost.


Wow. I'm curious as to what GPU this is, and what compiler you're
using that takes advantage of this.

<woolgathering>
I worked on Sun's "Majik" architecture (I may have the spelling wrong)
back in the day. It effectively had four 32-bit cpus, running in tandem.
The first cpu was more general-purpose than the other three, and had a full
set of control and branch instructions. The other three could only do
arithmetic, and operated in lock-step with the primary cpu.

If the calculations of a, b, and c in the original code were all
identical, then the compiler would typically arrange for them to
be computed in parallel and the three secondary cpus and the results
combined at the end. I wouldn't really call these separate threads,
but they were definitely parallel computations.
</woolgathering>

Sounds like you're describing some sort of similar vector processor,
in which case, yes, I see why you would want the compiler to
compile (a && b && c) without short-circuiting.

Of course, a sufficiently smart compiler would recognize that b and c can
be computed without harm even if their values wouldn't be needed after
all.

Have you actually compiled your code and determined that the compiler
is not using parallelism?

--
-Ed Falk,
http://thespamdiaries.blogspot.com/
 
Reply With Quote
 
 
 
 
Edward A. Falk
Guest
Posts: n/a
 
      05-31-2011
In article <is3dtk$cp1$>,
Edward A. Falk <> wrote:
>
>Sounds like you're describing some sort of similar vector processor,
>in which case, yes, I see why you would want the compiler to
>compile (a && b && c) without short-circuiting.


Ahh, never mind. Saw your other post. I guessed right.

--
-Ed Falk,
http://thespamdiaries.blogspot.com/
 
Reply With Quote
 
Edward A. Falk
Guest
Posts: n/a
 
      05-31-2011
In article <irmuo5$u9t$>,
Eric Sosman <> wrote:
>
> C has no way to express the computation you desire. ...


See my previous post. OP is working with a cpu that doesn't
have multiple threads of execution, per se, but does have
parallel threads of computation.

Imagine:

a = x1 * x2 + x3;
b = y1 * y2 + y3;
c = z1 * z2 + z3;

I've worked on a CPU that could compute those three expressions
in perfect parallel. We made sure our compiler recognized cases
like this and optimized accordingly.

I think OP's problem is that the (a && b && c) short-circuiting
is preventing the compiler from computing those values in
parallel. That's really a failure on the compiler's part, but
OP is hoping there's a way to hint to the compiler that
short-circuiting is not necessary.

And yes, without extending the language, there's no way to
*force* the compiler to compute that stuff in parallel.

--
-Ed Falk,
http://thespamdiaries.blogspot.com/
 
Reply With Quote
 
Nizumzen
Guest
Posts: n/a
 
      05-31-2011
On 2011-05-31 19:45:55 +0100, Edward A. Falk said:
>
> Note that OP wanted to force evaluation of y and z in order to remove
> some expensive branches. However, I think that any analysis would show
> that the branches are cheaper than evaluating y and z in almost all cases.


Not on a GPU.

Branching on a GPU kills everything. Don't do it unless you really,
really, really need too.

Although this begs the question why he is using C as both OpenCL and
CUDA have their own C like languages for handling the code that
actually runs on the GPU neither of which follows C's rules exactly as
far as I am aware (most likely for this very reason).

 
Reply With Quote
 
Eric Sosman
Guest
Posts: n/a
 
      06-01-2011
On 5/31/2011 3:22 PM, Edward A. Falk wrote:
> In article<irmuo5$u9t$>,
> Eric Sosman<> wrote:
>>
>> C has no way to express the computation you desire. ...

>
> See my previous post. OP is working with a cpu that doesn't
> have multiple threads of execution, per se, but does have
> parallel threads of computation.


C has no way to express "parallel threads of computation"
(except in the restrictive senses of volatile variables and
asynchronous signals).

> Imagine:
>
> a = x1 * x2 + x3;
> b = y1 * y2 + y3;
> c = z1 * z2 + z3;
>
> I've worked on a CPU that could compute those three expressions
> in perfect parallel. We made sure our compiler recognized cases
> like this and optimized accordingly.


Fine: Your compiler made promises that are not made by C and
are not expressible in C. There's nothing preventing you from
making such promises, and in the right circumstances such promises
are useful and laudable. But they're *your* promises, not C's,
and compilers other than your own are under no obligation to pay
the slightest heed to them. C has no way to tell those other
compilers about a desire for parallelism, nor any means to detect
a promise broken.

> I think OP's problem is that the (a&& b&& c) short-circuiting
> is preventing the compiler from computing those values in
> parallel. That's really a failure on the compiler's part, but
> OP is hoping there's a way to hint to the compiler that
> short-circuiting is not necessary.
>
> And yes, without extending the language, there's no way to
> *force* the compiler to compute that stuff in parallel.


Exactly. The O.P. needs something other than C, or something
in addition to C.

--
Eric Sosman
d
 
Reply With Quote
 
Michael Press
Guest
Posts: n/a
 
      06-01-2011
In article <is20q1$sbr$>, BGB <>
wrote:

> On 5/30/2011 12:28 AM, Michael Press wrote:
> > In article<irpr4j$q20$>,
> > Eric Sosman<> wrote:


[...]

> >> 'course, it's possible I've misunderstood the O.P. or blundered
> >> in some other way, and will be summarily drummed out of the top ten.
> >> Again. (Anybody recall the Brain Trust of the TV series "Scrubs?"
> >> Them's my guys ...)

> >
> > I have two coins worth thirty cents total.
> > One of them is not a nickel.

>
> errm... "what is a quarter?...".
>
> but, yeah, the GPU is weird that way, as it tends to execute things in
> parallel (rather than the more usual serial execution).


The riddle I recounted was used in a memorable
episode of _Scrubs_ featuring the Brain Trust.

--
Michael Press
 
Reply With Quote
 
BGB
Guest
Posts: n/a
 
      06-02-2011
On 5/31/2011 6:30 PM, Michael Press wrote:
> In article<is20q1$sbr$>, BGB<>
> wrote:
>
>> On 5/30/2011 12:28 AM, Michael Press wrote:
>>> In article<irpr4j$q20$>,
>>> Eric Sosman<> wrote:

>
> [...]
>
>>>> 'course, it's possible I've misunderstood the O.P. or blundered
>>>> in some other way, and will be summarily drummed out of the top ten.
>>>> Again. (Anybody recall the Brain Trust of the TV series "Scrubs?"
>>>> Them's my guys ...)
>>>
>>> I have two coins worth thirty cents total.
>>> One of them is not a nickel.

>>
>> errm... "what is a quarter?...".
>>
>> but, yeah, the GPU is weird that way, as it tends to execute things in
>> parallel (rather than the more usual serial execution).

>
> The riddle I recounted was used in a memorable
> episode of _Scrubs_ featuring the Brain Trust.
>


ok. not seen that show...

would insert another random TV quote, except I can't think of much...
may as well then return to having nano-machines gather up Santa-knowledge...

results in picture of Jabba The Hut with a Santa hat:
"hooo... hooo... hooo... I be Santa Claus..."
little green thing: "te he he he he... Santa Claus...".

gives out a pizza...
"hooo... hooo... hooo... now I be Pizza Hut..."
....


or such...
 
Reply With Quote
 
BGB
Guest
Posts: n/a
 
      06-02-2011
On 5/31/2011 11:40 AM, Edward A. Falk wrote:
> In article<irm4et$66d$>,
> Billy Mays<> wrote:
>> Hey CLC,
>>
>> I was wondering if there is a way to avoid the C short circuiting for
>> boolean expressions such as the following:
>>
>> if(a&& b&& c) {
>> /* Do stuff */
>> }
>>
>> I am currently doing GPU programming and it would be better to evaluate
>> the entire condition, even if a is 0, since branches are slow.

>
> Unless a, b, and c are all simple register values, you'll have a hard
> time convincing me that simple short branches are slower than evaluating
> them.
>


actually, as far as performance is concerned, simple computations are
faster than a (mispredicted) jump over them often even on plain x86
(including with floating point, albeit with the partial exception of
slow/complex operations like "sqrt()" or "sin()", or large/bulky
calculations).

so, in my experience it is often a little faster just to do math which
works over the entire range, rather than having to add conditionals into
the mix.

granted, it depends some, as many algorithms are expressed better with
conditionals, or need them, so it is not usually worthwhile to try to
micro-optimize them out.


this situation also causes "switch()" when using a jump table to be a
surprisingly expensive operation, which can matter somewhat when using a
switch with a decent number of cases in a tight loop (for a small number
of cases, compilers will often optimize this by using a slightly faster
if-else chain internally, due mostly to branch-predictor magic...).


also, memory accesses are very fast when in-cache (almost as fast as
with plain registers), and IME it means that having values in registers
is not necessarily better performance-wise, since often trying to keep
values in registers may lead to a larger total number of loads and
stores (due to register spills, ...) so often it will make more sense to
leave variables on-stack or similar.

IME, there has also been little difference performance with between
simple and complex memory addresses, as apparently the processor
calculates complex memory addresses more or less for free.

actually, the above also results in a possible optimization of
calculating integer expressions like "x*4+y" using a "lea" instruction.


or such...
 
Reply With Quote
 
Eric Sosman
Guest
Posts: n/a
 
      06-03-2011
On 6/2/2011 12:48 PM, BGB wrote:
> On 5/31/2011 11:40 AM, Edward A. Falk wrote:
>> In article<irm4et$66d$>,
>> Billy Mays<> wrote:
>>> Hey CLC,
>>>
>>> I was wondering if there is a way to avoid the C short circuiting for
>>> boolean expressions such as the following:
>>>
>>> if(a&& b&& c) {
>>> /* Do stuff */
>>> }
>>>
>>> I am currently doing GPU programming and it would be better to evaluate
>>> the entire condition, even if a is 0, since branches are slow.

>>
>> Unless a, b, and c are all simple register values, you'll have a hard
>> time convincing me that simple short branches are slower than evaluating
>> them.

>
> actually, as far as performance is concerned, simple computations are
> faster than a (mispredicted) jump over them often even on plain x86
> (including with floating point, albeit with the partial exception of
> slow/complex operations like "sqrt()" or "sin()", or large/bulky
> calculations).


In a purely sequential model, only the accuracy of the branch
prediction will make any difference. Either way you write the code,
you will take exactly one branch if there's an `else', zero or one
branches if not. You'll take the same number of branches with `&&'
as with `&' or a variant; the only difference is in how many branches
you'll execute but not take. If they're usually not taken but usually
predicted to be taken, all I can say is your compiler doesn't predict
very well and should get a job doing weather forecasts.

--
Eric Sosman
d
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off




Advertisments
 



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57