Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > C Programming > Order of function evaluation

Reply
Thread Tools

Order of function evaluation

 
 
Chris Torek
Guest
Posts: n/a
 
      10-26-2003
[Cross-posting reduced, now restricted to comp.lang.c only.]

[I used an example about hoisting strlen() calls out of a "for"
loop in C, which -- if done in the time-consuming part of a program
-- changes an algorithm from O(n^2) to O(n), where n is the length
of the strings.]

In article <0qnmb.9498$mZ5.55796@attbi_s54>,
Glen Herrmannsfeldt <(E-Mail Removed)> wrote:
>This one has sometimes bothered me. In many languages, loop parameters are
>evaluated once and the value used throughout.
>
>for(i=0;i<strlen(p1);i++)
>
>at least implies evaluation of strlen() each time. ...


Yes. This kind of optimization, in the right place, is a big win
(and even in the wrong place it is rarely a loss anyway). So C
compilers "ought to" work hard to do it, if such code is common.
It is difficult to optimize out, however, and such code is thus
not all that common in the first place.

On a more general note: languages that evaluate loop limits once
-- there are quite a few -- often have a more restricted loop
syntax. For instance, Pascal has:

"for" var ":=" start "to" end

with optional "step", and the "to" can be replaced by "downto" to
count down instead of up, but no equivalent to C's:

for (p = head; p != NULL; p = p->next)

(which, in the code I deal with, is just about as common, if not
more common, than the counting loop form). Fortran's DO loop is
similar to Pascal's.

But while these loop forms are rigid and thus too often useless,
they still bypass a real problem in C. Suppose you want to run an
"unsigned int" over all possible values. You might try writing:

unsigned int ui;
...
for (ui = 0; ui <= UINT_MAX; ui++)

but this turns out to be an infinite loop: when ui == UINT_MAX,
which should be the last trip through the loop, the end-of-loop
increment changes ui from UINT_MAX to 0, which is still less than
or equal to UINT_MAX. The same problem occurs with ordinary signed
ints, except that the behavior on incrementing from INT_MAX is
undefined. Actual machines mostly give you INT_MIN, which will
test "<= INT_MAX", likewise leading to an infinite loop. A "nicer"
(in some sense) system might point out the overflow instead.

In Pascal, the loop limits are evaluated once at the top of the
loop, then the number of trips is determined, and the loop runs
that many times, no matter what. This solves the INT_MAX problem
at the expense of inflexibility. For these corner cases in C,
you must write something like:

if (first_value <= last_value) {
unsigned int n_trips_minus_1 = last_value - first_value;
i = first_value;
do {
... loop contents ...
} while (--n_trips_minus_one != 0 && (i++, 1));
}

(replace "unsigned int" with "unsigned long" if needed; note that
LAST_VALUE can be INT_MAX or UINT_MAX so we must use the loop's
final value, not one beyond it). The ugly tricky bit where the
"i++" is executed if and only if the loop will be continued is not
required if "i" has unsigned type (or if signed overflow leading
to undefined behavior does not bother you ) -- in this case:

do {
... loop contents ...
i++;
} while (--n_trips_minus_one);

suffices.

In some of the more common cases, a bit of thought will do the
trick without all this machinery -- for instance, the "loop over
all possible unsigned int values" loop does not need a trip
counter, just the test moved to the bottom:

ui = 0;
do {
... loop contents ...
} while (++ui != 0);

but it still might be nice to have, in C, a special syntax that
"does the right thing" for counting loops.

The danger in going down this road is that one eventually winds up
with a language like Common Lisp, with 13571278165125 forms of loop
construct.
--
In-Real-Life: Chris Torek, Wind River Systems
Salt Lake City, UT, USA (4039.22'N, 11150.29'W) +1 801 277 2603
email: forget about it http://67.40.109.61/torek/index.html (for the moment)
Reading email is like searching for food in the garbage, thanks to spammers.
 
Reply With Quote
 
 
 
 
CBFalconer
Guest
Posts: n/a
 
      10-27-2003
Chris Torek wrote:
>

.... snip ...
> >
> >at least implies evaluation of strlen() each time. ...

>
> Yes. This kind of optimization, in the right place, is a big win
> (and even in the wrong place it is rarely a loss anyway). So C
> compilers "ought to" work hard to do it, if such code is common.
> It is difficult to optimize out, however, and such code is thus
> not all that common in the first place.
>
> On a more general note: languages that evaluate loop limits once
> -- there are quite a few -- often have a more restricted loop
> syntax. For instance, Pascal has:
>
> "for" var ":=" start "to" end
>
> with optional "step", and the "to" can be replaced by "downto" to
> count down instead of up, but no equivalent to C's:
>
> for (p = head; p != NULL; p = p->next)
>

.... snip ...
>
> In Pascal, the loop limits are evaluated once at the top of the
> loop, then the number of trips is determined, and the loop runs
> that many times, no matter what. This solves the INT_MAX problem
> at the expense of inflexibility. For these corner cases in C,
> you must write something like:


Not quite. The Pascal for has a termination point, not
condition. There is no need to evaluate the number of repetitions
in advance, although that may be done. Many Pascal systems have
had a bug in "FOR i := 1 TO maxint DO ...". However the critical
points are that the initial and terminal conditions are evaluated
before the loop is entered, and that the value of i may not be
altered within the loop, nor meaningfully referenced after loop
termination. Having stamped these bugs out years ago in my own
code, I am quite aware of them.

As another example:

i := 5;
FOR i := 1 TO i + 5 DO ...;
writeln(i); (* is indeterminate *)

executes for i values of 1 through 10.

--
Chuck F ((E-Mail Removed)) ((E-Mail Removed))
Available for consulting/temporary embedded and systems.
<http://cbfalconer.home.att.net> USE worldnet address!

 
Reply With Quote
 
 
 
 
Glen Herrmannsfeldt
Guest
Posts: n/a
 
      10-27-2003

"Chris Torek" <(E-Mail Removed)> wrote in message
news:bnhme6$7oq$(E-Mail Removed)...
> [Cross-posting reduced, now restricted to comp.lang.c only.]
>
> [I used an example about hoisting strlen() calls out of a "for"
> loop in C, which -- if done in the time-consuming part of a program
> -- changes an algorithm from O(n^2) to O(n), where n is the length
> of the strings.]
>
> In article <0qnmb.9498$mZ5.55796@attbi_s54>,
> Glen Herrmannsfeldt <(E-Mail Removed)> wrote:
> >This one has sometimes bothered me. In many languages, loop parameters

are
> >evaluated once and the value used throughout.
> >
> >for(i=0;i<strlen(p1);i++)
> >
> >at least implies evaluation of strlen() each time. ...

>
> Yes. This kind of optimization, in the right place, is a big win
> (and even in the wrong place it is rarely a loss anyway). So C
> compilers "ought to" work hard to do it, if such code is common.
> It is difficult to optimize out, however, and such code is thus
> not all that common in the first place.
>
> On a more general note: languages that evaluate loop limits once
> -- there are quite a few -- often have a more restricted loop
> syntax. For instance, Pascal has:
>
> "for" var ":=" start "to" end
>
> with optional "step", and the "to" can be replaced by "downto" to
> count down instead of up, but no equivalent to C's:


I looked up what PL/I does for this. The limits are evaluated once and used
for the loop. The BY (increment) can be positive or negative, (or zero) and
the loop works appropriately. One I hadn't though of before, the address
of the loop variable is also computed once and used throughout. This is
significant because of the way PL/I does pointers, and also CONTROLLED
variables. PL/I DO loops with TO and/or BY can also have a WHILE test at
the same time. If both TO and BY are omited the loop is only done once. A
list of values, or TO/BY sets can also be supplied.

> for (p = head; p != NULL; p = p->next)


Well, it could be done with a WHILE loop in languages that have one.
Putting the initialization and updating in the same statement is a little
convenient, though.

> (which, in the code I deal with, is just about as common, if not
> more common, than the counting loop form). Fortran's DO loop is
> similar to Pascal's.


> But while these loop forms are rigid and thus too often useless,
> they still bypass a real problem in C. Suppose you want to run an
> "unsigned int" over all possible values. You might try writing:
>
> unsigned int ui;
> ...
> for (ui = 0; ui <= UINT_MAX; ui++)


I wouldn't be surprised if the other languages don't catch this one, either.
IBM declared 32 bit addressing on the 360/67 a bug because the loop
instructions would fail in cases like this. If you were looping over
addresses, and hadn't checked where you were in virtual address space, you
could unintentionally run into this problem.

> but this turns out to be an infinite loop: when ui == UINT_MAX,
> which should be the last trip through the loop, the end-of-loop
> increment changes ui from UINT_MAX to 0, which is still less than
> or equal to UINT_MAX. The same problem occurs with ordinary signed
> ints, except that the behavior on incrementing from INT_MAX is
> undefined. Actual machines mostly give you INT_MIN, which will
> test "<= INT_MAX", likewise leading to an infinite loop. A "nicer"
> (in some sense) system might point out the overflow instead.


> In Pascal, the loop limits are evaluated once at the top of the
> loop, then the number of trips is determined, and the loop runs
> that many times, no matter what. This solves the INT_MAX problem
> at the expense of inflexibility. For these corner cases in C,
> you must write something like:


There was some discussion not so long ago about Fortran's treatment of
floating point variables in DO loops. They were not allowed in Fortran-66,
but added later. In that case, also, the number of trip is determined in
advance, such that, in certain rounding cases, the terminating condition
might not occur at the end of the loop!

> if (first_value <= last_value) {
> unsigned int n_trips_minus_1 = last_value - first_value;
> i = first_value;
> do {
> ... loop contents ...
> } while (--n_trips_minus_one != 0 && (i++, 1));
> }
>


(snip of more loops going to UINT_MAX)

-- glen


 
Reply With Quote
 
Christian Bau
Guest
Posts: n/a
 
      10-27-2003
In article <bnhme6$7oq$(E-Mail Removed)>,
Chris Torek <(E-Mail Removed)> wrote:

> In some of the more common cases, a bit of thought will do the
> trick without all this machinery -- for instance, the "loop over
> all possible unsigned int values" loop does not need a trip
> counter, just the test moved to the bottom:
>
> ui = 0;
> do {
> ... loop contents ...
> } while (++ui != 0);
>
> but it still might be nice to have, in C, a special syntax that
> "does the right thing" for counting loops.


A suggestion (which will never make it into the C Standard, but I quite
like it):

Step 1: Change the syntax so that expressions of the form a <= b <= c
become illegal. That is no big loss, and writing them as (a <= b) <= c
would be much better anyway. (Has anybody ever used such an expression? )

Step 2: An alternative syntax for the for loop:

for (expr1 <= lvalue <= expr2)

Semantics: expr1, expr2 and the address of lvalue are evaluated once
without intervening sequence point; then expr1 and expr2 are converted
to the type of lvalue.

If lvalue is an integer, then the loop body is executed with lvalue
being set to all values x such that expr1 <= x <= expr2 in ascending
order. If lvalue is a non-void pointer then the loop is executed with
all pointer values (expr1 + x) such that expr1 <= expr1 + x <= expr2. If
lvalue is a floating point variable, then lvalue is set to all integer
values such that expr1 <= x <= expr2; if any of these integers cannot be
represented as a floating-point number then behavior is undefined.

Instead of <=, any combination of <= and < can be used with the obvious
changes. Instead of <= and <, any combination of >= and > can be used
with the obvious changes, and the values will be used in descending
order.

If lvalue is modified inside the loop other than by the loop semantics,
behavior is undefined. The value of (lvalue) after execution of the loop
is undefined.
 
Reply With Quote
 
Dan Pop
Guest
Posts: n/a
 
      10-27-2003
In <bncdbe$1020c4$(E-Mail Removed)> http://www.velocityreviews.com/forums/(E-Mail Removed)-berlin.de writes:

> I have a possibly rather stupid question about the order of evaluation
>in a statement like this:
>
>result = foo( x ) - bar( y );
>
>How can I make 100% sure that foo(x) is evaluated before bar(y), where
>foo() and bar() can be either functions or macros?


Certainly not by writing the expression this way.

>I got into problems
>with this when writing some hardware-related stuff where foo() and bar()
>are functions or macros (I want to make sure the solution is independent
>of this) that read certain hardware registers and that need to be read
>in a specific order. The compiler "optimized" the code by reversing the
>reads, which isn't a very good idea in this case. For the time being I
>found that rewritting the above as
>
>result = foo( x );
>result -= bar( y );
>
>gets rid of the problem, but I am not sure if this really guarantees
>that foo(x) is evaluated before bar(y) when I use a different compiler
>or a different version.


It's safe enough: the compiler can't assume that foo() and/or bar() are
pure functions and without such assumptions the order of evaluation cannot
be altered. Imagine that foo() contains printf("Hello, "); and bar
contains printf("world!\n");. In the abstract C machine, the output
is "Hello, world!\n" and no conforming C implementation is allowed to
generate a different output.

You can still evaluate the thing as a single expression:

result = foo(x), result -= bar(y);

but it's the same thing as your solution, for all intents and purposes.

Dan
--
Dan Pop
DESY Zeuthen, RZ group
Email: http://www.velocityreviews.com/forums/(E-Mail Removed)
 
Reply With Quote
 
Glen Herrmannsfeldt
Guest
Posts: n/a
 
      10-27-2003

"Christian Bau" <(E-Mail Removed)> wrote in message
news:(E-Mail Removed)...

(snip regarding the termination of for loops at UINT_MAX or INT_MAX)

> A suggestion (which will never make it into the C Standard, but I quite
> like it):
>
> Step 1: Change the syntax so that expressions of the form a <= b <= c
> become illegal. That is no big loss, and writing them as (a <= b) <= c
> would be much better anyway. (Has anybody ever used such an expression? )
>
> Step 2: An alternative syntax for the for loop:
>
> for (expr1 <= lvalue <= expr2)
>
> Semantics: expr1, expr2 and the address of lvalue are evaluated once
> without intervening sequence point; then expr1 and expr2 are converted
> to the type of lvalue.
>
> If lvalue is an integer, then the loop body is executed with lvalue
> being set to all values x such that expr1 <= x <= expr2 in ascending
> order. If lvalue is a non-void pointer then the loop is executed with
> all pointer values (expr1 + x) such that expr1 <= expr1 + x <= expr2. If
> lvalue is a floating point variable, then lvalue is set to all integer
> values such that expr1 <= x <= expr2; if any of these integers cannot be
> represented as a floating-point number then behavior is undefined.
>
> Instead of <=, any combination of <= and < can be used with the obvious
> changes. Instead of <= and <, any combination of >= and > can be used
> with the obvious changes, and the values will be used in descending
> order.
>
> If lvalue is modified inside the loop other than by the loop semantics,
> behavior is undefined. The value of (lvalue) after execution of the loop
> is undefined.


I think I don't like it as overloading the for statement, and also the
relational operators. Note that expr1 <= expr2 <= expr3 already has meaning
in the C language, and it isn't what you want.

There are some extensions to C for parallel machines that use a forall
statement. My suggestion, instead, is to add a forall statement to C, with
reasonably syntax, and including the requirement that the bounds be
evaluated once and that the loop variable not be changed inside the loop.
In addition, it should allow the compiler to reorder the execution of the
loop body, such as executing it in parallel on machines that allow that.

It would slowly gain use, as people realized that evaluating the bounds once
can be an advantage, and the requirements on the loop variable can help
optimizing compilers even on serial machines.

-- glen


 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
function argument evaluation order and asserts and macros jacek.dziedzic@gmail.com C++ 4 09-24-2008 02:28 PM
Order of evaluation of function arguments dragoncoder C Programming 21 12-23-2005 07:08 PM
Evaluation order of function parameters haroon C Programming 25 12-05-2005 05:22 AM
Order of function parameters evaluation subnet C Programming 13 03-09-2005 10:22 AM
Evaluation order for nested function calls cheeser C++ 3 10-05-2003 08:10 AM



Advertisments