Velocity Reviews > FAQ Related - why cast?

# FAQ Related - why cast?

Martin
Guest
Posts: n/a

 01-04-2005
Two questions relating to FAQ answer 12.42.

(1) In the statement

s.i16 |= (unsigned)(getc(fp) << ;

i16 is declared int. The reason for casting to (unsigned) is explained as
guarding against sign extension. But left-shifting will always fill vacated
bits with zero (assuming the right operand is nonnegative and less than the
number of bits in the left expression's type). So how is the cast useful?

(2) I am puzzled by the cast to unsigned in the following statement:

putc((unsigned)((s.i32 >> 24) & 0xff), fp);

i32 is declared long int.

As I understand it the usual arithmetic conversions will ensure the type of
the expression (s.i32 >> 24) & 0xff will be long int. That long int will be
cast to unsigned int, but what is the point? putc() expects its first
argument to be of type int. So at the moment it's going through

long int -> unsigned int -> int

whereas without the cast it would be

long int -> int

---
Martin

Jack Klein
Guest
Posts: n/a

 01-05-2005
On Tue, 4 Jan 2005 16:39:52 -0000, "Martin"
<martin.o_brien@[no-spam]which.net> wrote in comp.lang.c:

> Two questions relating to FAQ answer 12.42.

You must have the book version of the FAQ, since 12.42 is not in the
online version.

> (1) In the statement
>
> s.i16 |= (unsigned)(getc(fp) << ;
>
> i16 is declared int. The reason for casting to (unsigned) is explained as
> guarding against sign extension. But left-shifting will always fill vacated
> bits with zero (assuming the right operand is nonnegative and less than the
> number of bits in the left expression's type). So how is the cast useful?

If the int returned by getc() is negative, left shifting it produces
undefined behavior. If the int returned by getc() has a value greater
than 255, left shifting it produces undefined behavior. Converting
either of these out-of-range values to unsigned int avoids the
undefined behavior.

> (2) I am puzzled by the cast to unsigned in the following statement:
>
> putc((unsigned)((s.i32 >> 24) & 0xff), fp);
>
> i32 is declared long int.
>
> As I understand it the usual arithmetic conversions will ensure the type of
> the expression (s.i32 >> 24) & 0xff will be long int. That long int will be
> cast to unsigned int, but what is the point? putc() expects its first
> argument to be of type int. So at the moment it's going through
>
> long int -> unsigned int -> int
>
> whereas without the cast it would be
>
> long int -> int

This is somewhat sloppy coding. Generally, bit shifts should not be
used on signed integer types. There are too many potential surprises
(read defects, when the program does not do what the programmer
expected). If s.i32 is negative, the result of the shift is
implementation defined. It would actually make more sense to cast
s.i32 to unsigned long before the shift.

--
Jack Klein
Home: http://JK-Technology.Com
FAQs for
comp.lang.c http://www.eskimo.com/~scs/C-faq/top.html
comp.lang.c++ http://www.parashift.com/c++-faq-lite/
alt.comp.lang.learn.c-c++
http://www.contrib.andrew.cmu.edu/~a...FAQ-acllc.html

Derrick Coetzee
Guest
Posts: n/a

 01-05-2005
Jack Klein wrote:
>> s.i16 |= (unsigned)(getc(fp) << ;

>
> If the int returned by getc() is negative, left shifting it produces
> undefined behavior. If the int returned by getc() has a value greater
> than 255, left shifting it produces undefined behavior.

The shift is done before the cast, though. To avoid undefined behaviour
you would want to do:

s.i16 |= ((unsigned)getc(fp)) << 8;

Also, getc cannot possibly return a value exceeding 255, because it
always returns either an unsigned char value (sizeof(unsigned char) is
1) or a negative value (EOF is negative).
--
Derrick Coetzee
I grant this newsgroup posting into the public domain. I disclaim all
express or implied warranty and all liability. I am not a professional.

Micah Cowan
Guest
Posts: n/a

 01-05-2005
Derrick Coetzee wrote:

> Jack Klein wrote:
>
>>> s.i16 |= (unsigned)(getc(fp) << ;

>>
>>
>> If the int returned by getc() is negative, left shifting it produces
>> undefined behavior. If the int returned by getc() has a value greater
>> than 255, left shifting it produces undefined behavior.

>
>
> The shift is done before the cast, though. To avoid undefined behaviour
> you would want to do:
>
> s.i16 |= ((unsigned)getc(fp)) << 8;
>
> Also, getc cannot possibly return a value exceeding 255, because it
> always returns either an unsigned char value (sizeof(unsigned char) is
> 1) or a negative value (EOF is negative).

I'm not sure why Jack thought that a value of greater than 255
could not be left-shifted, but be assured that it is entirely
possible for getc() to return a value exceding 255, on systems
with more than 8 bits to a byte. There are people here who have
worked on such implementations.

Dietmar Schindler
Guest
Posts: n/a

 01-05-2005
Martin wrote:
> Two questions relating to FAQ answer 12.42.
>
> (1) In the statement
>
> s.i16 |= (unsigned)(getc(fp) << ;
>
> i16 is declared int. The reason for casting to (unsigned) is explained as
> guarding against sign extension. ...

Provided that you stated the FAQ answer correctly, the explanation is
nonsense (the left hand side of the assignment expression is of type
int, and without the cast, the right hand side is also of type int; so
there is no extension).

CBFalconer
Guest
Posts: n/a

 01-05-2005
Jack Klein wrote:
> <martin.o_brien@[no-spam]which.net> wrote in comp.lang.c:
>
>> Two questions relating to FAQ answer 12.42.

>
> You must have the book version of the FAQ, since 12.42 is not in
> the online version.
>
>> (1) In the statement
>>
>> s.i16 |= (unsigned)(getc(fp) << ;
>>
>> i16 is declared int. The reason for casting to (unsigned) is
>> explained as guarding against sign extension. But left-shifting
>> will always fill vacated bits with zero (assuming the right
>> operand is nonnegative and less than the number of bits in the
>> left expression's type). So how is the cast useful?

>
> If the int returned by getc() is negative, left shifting it
> produces undefined behavior. If the int returned by getc() has
> a value greater than 255, left shifting it produces undefined
> behavior. Converting either of these out-of-range values to
> unsigned int avoids the undefined behavior.

Disagree. getc returns the integer value of an unsigned char
(positive) or EOF. The code is faulty since it doesn't handle EOF
anyway. That integer needs to be coerced into an unsigned to allow
the left shift. So the statement should be:

s.i16 |= ((unsigned)getc(fp)) << 8;

which may still not fit into an int, if the int is 16 bits. i16
should have been declared as unsigned.

--
Chuck F () ()
Available for consulting/temporary embedded and systems.
<http://cbfalconer.home.att.net> USE worldnet address!

Martin
Guest
Posts: n/a

 01-05-2005
"Dietmar Schindler" <> wrote in message
news:...
> Provided that you stated the FAQ answer correctly, the explanation is
> nonsense (the left hand side of the assignment expression is of type
> int, and without the cast, the right hand side is also of type int; so
> there is no extension).

To ensure the partial quote I gave in my initial post was not misleading,
this is the question and answer from the book (c)1996 by Addison-Wesley
Publishing Company, Inc.

Question: How can I write code to conform to these old, binary data file
formats?

Answer: It's difficult because of word size and byte-order differences,
floating-point formats, and structure padding. To get the control you need
over these particulars, you may have to read and write things a byte at a
time, shuffling and rearranging as you go. (This isn't always as bad as it
sounds and gives you both code portability and complete
control.) For example, suppose that you want to read a data structure,
consisting of a character, a 32-bit integer, and a 16-bit integer, from the
stream fp into the C structure

struct mystruct {
char c;
long int i32;
int i16;
};

You might use code like this:

s.c = getc(fp);

s.i32 = (long)getc(fp) << 24;
s.i32 |= (long)getc(fp) << 16;
s.i32 |= (unsigned)(getc(fp) << ;
s.i32 |= getc(fp);

s.i16 = getc(fp) << 8;
s.i16 |= getc(fp);

This code assumes that getc reads 8-bit characters and that the data is
stored most significant byte first ("big endian"). The casts to (long)
ensure that the 16- and 24-bit shifts operate on long values (see question
3.14), and the cast to (unsigned) guards against sign extension. (In
general, it's safer to use all unsigned types when writing code like this,
but see question 3.19.)

The corresponding code to write the structure might look like:

putc(s.c, fp);
putc((unsigned)((s.i32 >> 24) & 0xff), fp);
putc((unsigned)((s.i32 >> 16) & 0xff), fp);
putc((unsigned)((s.i32 >> & 0xff), fp);
putc((unsigned)(s.i32 & 0xff), fp);
putc(s.i16 >> & 0xff, fp);
putc(s.i16 & 0xff, fp);

See also questions 2.12, 12.38, 16.7, and 20.5.

--
Martin
http://martinobrien.co.uk/

Eric Sosman
Guest
Posts: n/a

 01-05-2005
Martin wrote:
>
> To ensure the partial quote I gave in my initial post was not misleading,
> this is the question and answer from the book (c)1996 by Addison-Wesley
> Publishing Company, Inc.
>
> Question: How can I write code to conform to these old, binary data file
> formats?
>
> Answer: It's difficult because of word size and byte-order differences,
> floating-point formats, and structure padding. To get the control you need
> over these particulars, you may have to read and write things a byte at a
> time, shuffling and rearranging as you go. (This isn't always as bad as it
> sounds and gives you both code portability and complete
> control.) For example, suppose that you want to read a data structure,
> consisting of a character, a 32-bit integer, and a 16-bit integer, from the
> stream fp into the C structure
>
> struct mystruct {
> char c;
> long int i32;
> int i16;
> };
>
> You might use code like this:
>
> s.c = getc(fp);
>
> s.i32 = (long)getc(fp) << 24;
> s.i32 |= (long)getc(fp) << 16;
> s.i32 |= (unsigned)(getc(fp) << ;
> s.i32 |= getc(fp);
>
> s.i16 = getc(fp) << 8;
> s.i16 |= getc(fp);
>
> This code assumes that getc reads 8-bit characters and that the data is
> stored most significant byte first ("big endian"). The casts to (long)
> ensure that the 16- and 24-bit shifts operate on long values (see question
> 3.14), and the cast to (unsigned) guards against sign extension. (In
> general, it's safer to use all unsigned types when writing code like this,
> but see question 3.19.)

This code seems to arise from an odd combination of
caution, carelessness, and micro-optimization. The design
considerations may have evolved along these lines:

Caution: Since an `int' could be as narrow as 16 bits,
use `long' to store the final value, safe in the knowledge
that `long' is at least 32 bits wide. For the same reason,
convert the first two getc() results from `int' to `long'
before shifting, since the shifts might be too wide for a
narrow `int'.

Optimization: The third getc() result is shifted only
8 bits, so it will fit in an `int' even if `int' is only
16 bits wide. Doing arithmetic on an `int' may be a hair
faster than on a `long', so shift first and convert later.

Carelessness: If `int' is only 16 bits wide, this
shift may slide a high-order 1-bit from the getc() result
into the sign position of the `int'. This will cause no
harm on most machines, but the C language doesn't actually
specify what will happen. (The same carelessness afflicts
the shifting of the first byte, too.)

Caution: If the shift did in fact slide a 1-bit into
the sign position of a 16-bit `int' and thereby make it
negative, converting this `int' to `long' will propagate
the sign bit leftward and the subsequent `|' will clobber
the two bytes already processed. Hence the `unsigned' cast:
if `int' is 16 bits wide it will be zero-extended instead of
sign-extended, and if `int' is wider it won't be negative
anyhow.

Optimization: Since the fourth getc() result is non-
negative and doesn't get shifted, this sign bit is zero and
conversion to `long' will not "smear" the first three bytes.
The conversion can go straight from `int' to `long' safely.

Carelessness: Of course, all these getc() calls can fail,
and the results should be checked against EOF before being
used. I assume Mr. Summit omitted the checks for brevity.
(Alternatively, the individual checks could be omitted if
tests of feof() and ferror() followed the whole sequence.)

The optimizations seem pointless to me. If there is any
speed advantage for shift-convert over convert-shift, that
advantage will be tiny compared to the I/O activity that
provides the incoming bytes. Suppose a disk read takes 10ms
to fetch 64KB of input: that's ~150ns per byte, or about 450
processing cycles on a 3GHz machine. If shift-then-convert
saves two cycles, say, you have saved a whopping two-tenths
of one percent -- it seems likely that almost any program you
can name presents more significant optimization opportunities
elsewhere. (The other way to think about this is to note that
64KB per 10ms means bytes arrive at a rate of 6.5MHz, which is
peanuts compared to even a 1GHz=1000MHz machine.)

If we throw out the pointless optimizations, we get
something like

s.i32 = (long)getc(fp) << 24;
s.i32 |= (long)getc(fp) << 16;
s.i32 |= (long)getc(fp) << 8;
s.i32 |= (long)getc(fp) << 0;

.... which, I submit, makes up in clarity what little it gives
away in efficiency.

> The corresponding code to write the structure might look like:
>
> putc(s.c, fp);
> putc((unsigned)((s.i32 >> 24) & 0xff), fp);
> putc((unsigned)((s.i32 >> 16) & 0xff), fp);
> putc((unsigned)((s.i32 >> & 0xff), fp);
> putc((unsigned)(s.i32 & 0xff), fp);
> putc(s.i16 >> & 0xff, fp);
> putc(s.i16 & 0xff, fp);

I'm afraid this baffles me. I could understand, e.g.

putc( ((unsigned)(s.i32 >> 24)) & 0xFF, fp);

on the grounds of avoiding the need for a `long' version of
0xFF, but as written I simply don't get it. (Besides, the
next-to-last line is missing a parenthesis.) You'd better
address your question to Mr. Summit directly.

--

Martin
Guest
Posts: n/a

 01-05-2005
"Eric Sosman" wrote:
(Besides, the next-to-last line is missing a parenthesis.)
> You'd better address your question to Mr. Summit directly.

Thanks for that response. The penultimate line should be

putc((s.i16 >> & 0xff, fp);

as you point out.

Martin

CBFalconer
Guest
Posts: n/a

 01-05-2005
Martin wrote:
>

.... snip ...
>
> For example, suppose that you want to read a data structure,
> consisting of a character, a 32-bit integer, and a 16-bit
> integer, from the stream fp into the C structure
>
> struct mystruct {
> char c;
> long int i32;
> int i16;
> };
>
> You might use code like this:
>
> s.c = getc(fp);
>
> s.i32 = (long)getc(fp) << 24;
> s.i32 |= (long)getc(fp) << 16;
> s.i32 |= (unsigned)(getc(fp) << ;
> s.i32 |= getc(fp);

I hope not. What if CHAR_BIT is greater than 8? What about EOF?
What you might do (assuming hi byte first in the stream) is:

#include <limits.h>
unsigned long u;
int i;

for (i = 0, u = 0; i < 4; i++) {
/* you may want to include error traps for getc
returning anything larger than 255 or EOF */
u = u * 256 + (getc(fp) & 0xff);
}
if (u < LONG_MAX) s.i32 = u;
else {
/* take corrective action on overflow */
/* creating a neg. value is system dependant */
}

and if you really need the obfuscation you can use "<< 8" in place
of "* 256".

Notice how the standard network assumption of hi byte first eases
the translation of an incoming stream, and does not hamper
generation of an output stream. You can also settle the possible
negations etc. on the initial input byte, and make any following
code bulletproof.

--
Chuck F () ()
Available for consulting/temporary embedded and systems.
<http://cbfalconer.home.att.net> USE worldnet address!