Velocity Reviews > unsigned short addition/subtraction overflow

Kevin Goodsell
Guest
Posts: n/a

 12-21-2003
pete wrote:
>
> No.
> 65536 is an allowable value for INT_MAX.

I don't think so. INT_MAX pretty much has to be an odd number, I think.
In fact, to satisfy the requirement that the non-negative integer values
have the same representation as the same values for the corresponding
unsigned type, and the requirement that unsigned types use a pure binary
representation, I think it's safe to say that INT_MAX must be (2^N)-1
for some integer N, which must be 15 or greater.

-Kevin
--
My email address is valid, but changes periodically.

Kevin Goodsell
Guest
Posts: n/a

 12-21-2003
Andy wrote:

>
> Actually what I really meant is for unsigned operations.
> This is what I want to know. Are the following defined in
> C and always guaranteed warped around values?
>
> unsigned short i;
> unsigned long li; /* 32-bit wide */
>
> 1. i = (unsigned short)65535 + (unsigned short)3;

You could write these easer this way:

i = 65535u + 3u;

> 2. i = (unsigned short)1 - (unsigned short)3;
> 3. li = (unsigned long)0xFFFFFFFF + (unsigned long)3;
> 4. li = (unsigned long)1 - (unsigned long)3;
>

It's difficult to produce undefined behavior with unsigned values.
(Unless you divide by zero, maybe. I don't actually know what the
standard says about that, which surprises me.) Overflow doesn't occur
with unsigned types, but it's possible for unsigned types that are
narrower than int to be promoted to (signed) int, which may allow
overflow (and undefined behavior) to occur.

-Kevin
--
My email address is valid, but changes periodically.

James Hu
Guest
Posts: n/a

 12-21-2003
On 2003-12-21, pete <(E-Mail Removed)> wrote:
> James Hu wrote:
>>
>> On 2003-12-21, Andy <(E-Mail Removed)> wrote:
>> > Are 1 through 4 defined behaviors in C?
>> >
>> > unsigned short i;
>> > unsigned long li; /* 32-bit wide */
>> >
>> > 1. i = 65535 + 3;
>> > 2. i = 1 - 3;
>> > 3. li = (unsigned long)0xFFFFFFFF + 3;
>> > 4. li = 1 - 3;

>>
>> Yes.

>
> No.
> 65536 is an allowable value for INT_MAX.
> (65535 + 3) would be integer overflow
> and undefined behavior in that case.

Good catch. I did consider overflow, but I assumed that (65535+3) was
identical to writing (6553 because of computation at translation
time versus computation at run time.

-- James

Chris Torek
Guest
Posts: n/a

 12-21-2003
>Andy wrote:
>> Actually what I really meant is for unsigned operations.
>> This is what I want to know. Are the following defined in
>> C and always guaranteed warped around values?
>>
>> unsigned short i;
>> unsigned long li; /* 32-bit wide */

All C guarantees here is "at least" 32 bits, but here it does
not really matter.

>> 1. i = (unsigned short)65535 + (unsigned short)3;

In article <vCmFb.340\$(E-Mail Removed) t>
Kevin Goodsell <(E-Mail Removed)> writes:
>You could write these easer this way:
>
>i = 65535u + 3u;

Actually, this is potentially quite different.

If ANSI/ISO C used the *correct* rules (according to me )
it would be precisely the same, but we are stuck with quite
bogus widening rules due to a mistaken decision in the 1980s:
"when a narrow unsigned integer type widens, the resulting
type is signed if all the unsigned values fit, otherwise
it is unsigned".

In this particular case, unsigned short widens to either
unsigned int or signed int. Which one we get depends on the
properties of the implementation. This is a really dumb idea,
made in an attempt to be "less surprising" than the "right"
way ("narrow unsigned widens to unsigned"), that actually
turns out to be *more* surprising. But again we are stuck
with the wrong decision -- so let me define it.

What you must do is look in <limits.h> (perhaps by writing a
small C program, since the header may not exist) and compare
the values of USHRT_MAX and INT_MAX. One of the following two
cases will necessarily hold:

a) USHRT_MAX > INT_MAX.

This occurs on, e.g., the 16-bit PDP-11 and old 16-bit
MS-DOS C compilers. Here USHRT_MAX is 65535 while INT_MAX
is 32767.

b) USHRT_MAX <= INT_MAX.

This occurs on, e.g., today's 16-bit-short 32-bit-int C
compilers. Here USHRT_MAX is 65535 while INT_MAX is
2147483647.

In case (a), an unsigned short expression -- no matter what its
actual value is -- that appears in an arithmetic expression is
widened to unsigned int. Thus (unsigned short)65535 is
identical to (unsigned int)65535 or 65535U.

In case (b), howver, an unsigned short -- no matter what its actual
value is -- is widened to a *signed* int. Thus (unsigned short)65535
is identical to (int)65535 or 65535.

If we have two "unsigned short"s, values 65535 and 3 respectively,
and go to add them, we continue to have "case a" and "case b".
In case (a), the sum is 65535U + 3U, which has type unsigned int
and value 2. In case (b), the sum is 65535 + 3, which has type
signed int and value 65538.

In either case, when storing the final values back into an unsigned
short, it is reduced mod (USHRT_MAX+1), so that i becomes 2. The
place where this becomes a problem is not when we stuff the result
back into an unsigned variable, but rather when we compare it in
what the original 1989 C rationale called a "questionably signed"
expression.

Suppose we have the following code fragment:

unsigned short us = 65535;
int i = -1;

if (us > i)
printf("65535 > -1\n");
else
printf("65535 <= -1\n");

According to ANSI C's ridiculous rules (which we must obey anyway),
we decide whether this comparison uses "unsigned int" or "signed
int" based on whether USHRT_MAX exceeds INT_MAX. Once again, we
have the two cases:

case (a), USHRT_MAX > INT_MAX (PDP-11): "us" widens to an
unsigned int, value 65535U; i widens to an unsigned int,
value 65535U. 65535U > 65535U is false and we print
"65535 <= -1". This is, supposedly, "surprising" -- but
it happens!

case (b), USHRT_MAX < INT_MAX (VAX etc): "us" widens to
a signed int, value 65535; i remains signed int, value
-1. 65535 > -1 is true and we print "65535 > -1". This
is supposedly "not surprising" (which is probably true),
but in fact it is only SOMETIMES true.

As far as I am concerned, it is *much* better to be "predictably
surprising" than "unpredictably surprising based on the relative
values of USHRT_MAX and INT_MAX". The reason is that, while C
programmers do get surprised, they get surprised *once*, the *first*
time they mix signed and unsigned this way. This gives them the
opportunity to learn that the results are surprising; from then
on, they have no excuse to be surprised. Moreover, the logic
is trivial to follow: "unsigned widens to unsigned" means "put
an unsigned into an expression and it takes over."

Instead, we have a language where the code "works as expected" --
until it is moved to a machine where case (a) holds instead of case
(b). Programmers learn that mixing signed and unsigned is harmless
and "never surprises", only to find someday that, no, the language
is considerably more perverse than that. The logic is difficult
as well: "unsigned takes over except when it doesn't, based on the
relative values of the corresponding MAXes."

>> 2. i = (unsigned short)1 - (unsigned short)3;
>> 3. li = (unsigned long)0xFFFFFFFF + (unsigned long)3;
>> 4. li = (unsigned long)1 - (unsigned long)3;

>It's difficult to produce undefined behavior with unsigned values.

As long as you stick with unsigned int or unsigned long, anyway,
so that the broken widening rules do not trick you into accidentally
using signed values.

>(Unless you divide by zero, maybe. I don't actually know what the
>standard says about that, which surprises me.)

Division by zero produces undefined behavior, even for 1U / 0U and
the like.

>Overflow doesn't occur
>with unsigned types, but it's possible for unsigned types that are
>narrower than int to be promoted to (signed) int, which may allow
>overflow (and undefined behavior) to occur.

Yes. I claim that this rule is a terrible one; but I note that we
are stuck with it. The best approach is to avoid it -- make sure
you explicitly widen your narrow unsigned types to wider unsigned
types if the result (overflow or result of "questionably signed"
comparison) can matter. This kind of code is undeniably ugly, but
then, working around broken portions of any language (not just C)
is usually ugly.
--
In-Real-Life: Chris Torek, Wind River Systems
Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W) +1 801 277 2603
Reading email is like searching for food in the garbage, thanks to spammers.

Kevin Goodsell
Guest
Posts: n/a

 12-22-2003
Chris Torek wrote:

> In article <vCmFb.340\$(E-Mail Removed) t>
> Kevin Goodsell <(E-Mail Removed)> writes:
>
>>You could write these easer this way:
>>
>>i = 65535u + 3u;

>
>
> Actually, this is potentially quite different.

Yes, obviously. Not sure what I was thinking there. I think I suffer
from "short blindness" - I either miss the word 'short' or
sub-consciously translate it to 'int'. This wasn't the first time.

Thanks for pointing out the error.

-Kevin
--
My email address is valid, but changes periodically.

pete
Guest
Posts: n/a

 12-22-2003
Kevin Goodsell wrote:
>
> pete wrote:
> >
> > No.
> > 65536 is an allowable value for INT_MAX.

>
> I don't think so.

65535
Thank you.

--
pete

Peter Nilsson
Guest
Posts: n/a

 12-22-2003
pete <(E-Mail Removed)> wrote in message news:<(E-Mail Removed)>...
> pete wrote:

....
> > If INT_MAX equals 65535,

>
> I meant 65536.

Why? Neither is likely, but 65536 is considerably less so. Some would
argue (myself included) that 65536 is impossible on a conforming
implementation (be that C90 or C99).

--
Peter

pete
Guest
Posts: n/a

 12-22-2003
Peter Nilsson wrote:
>
> pete <(E-Mail Removed)> wrote in message news:<(E-Mail Removed)>...
> > pete wrote:

> ...
> > > If INT_MAX equals 65535,

> >
> > I meant 65536.

>
> Why? Neither is likely, but 65536 is considerably less so. Some would
> argue (myself included) that 65536 is impossible on a conforming
> implementation (be that C90 or C99).

You would be right.

--
pete

Christopher Benson-Manica
Guest
Posts: n/a

 12-22-2003
Chris Torek <(E-Mail Removed)> spoke thus:

> Actually, this is potentially quite different.

> If ANSI/ISO C used the *correct* rules (according to me )
> it would be precisely the same, but we are stuck with quite
> bogus widening rules due to a mistaken decision in the 1980s:
> "when a narrow unsigned integer type widens, the resulting
> type is signed if all the unsigned values fit, otherwise
> it is unsigned".

> etc.

Wow, what a great article! The only thing I'm unclear on now is why
such a seemingly obvious point escaped the C89 people, and why you

--
Christopher Benson-Manica | I *should* know what I'm talking about - if I
ataru(at)cyberspace.org | don't, I need to know. Flames welcome.

Chris Torek
Guest
Posts: n/a

 12-23-2003
>Chris Torek <(E-Mail Removed)> spoke thus:
>> If ANSI/ISO C used the *correct* rules (according to me ) ...

In article <news:bs76vt\$er9\$(E-Mail Removed)>
Christopher Benson-Manica <(E-Mail Removed)> wrote:
>Wow, what a great article! The only thing I'm unclear on now is why
>such a seemingly obvious point escaped the C89 people, and why you

I was but a poor student at the time (making about four bucks an
hour, with a limit of 20 hrs/week, as "student staff") and could
not afford exotic vacation trips to ANSI C committee meetings.
I did, however, hear from someone who did go to them that this was
actually something of a "hotly debated" topic.

The VAX PCC did it "my" way, and apparently Plauger's C compiler(s)
did it the other way. The "base document" -- i.e., K&R-1 -- did
not even allow for the possibility of "unsigned short" and "unsigned
char", and if you have "narrow unsigned always widens to unsigned"
as a rule, you need an exception for plain char if/when plain char
is unsigned (as on the IBM 370), so that EOF can be negative.

The results of the rules differ only in "questionably signed" cases,
which are rare enough. But the ANSI rules are so ugly to work with
that I would prefer a special exception for "plain char is unsigned
on this implementation, yet nonetheless widens to signed int". Note
that this exception would force the constraint that CHAR_MAX < INT_MAX,
even when char is unsigned, which would have the happy side effect
of making stdio "work right".
--
In-Real-Life: Chris Torek, Wind River Systems
Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W) +1 801 277 2603
Reading email is like searching for food in the garbage, thanks to spammers.