Velocity Reviews > List of undefined behaviour and other sneeky bugs

# List of undefined behaviour and other sneeky bugs

John Reye
Guest
Posts: n/a

 05-03-2012
1)
a[i] = i++;

http://c-faq.com/expr/evalorder1.html

So this attempt at optimisation is wrong:

int i;
int a[10];
for (i = 0; i < sizeof(a); )
a[i] = i++; /* set a to {1, 2, 3, ...} */

Correctly optimized, it should be:
for (i = 0; i < sizeof(a); )
i = a[i] = i+1; /* set a to {1, 2, 3, ...} */

2)

int i;
char b[10];
(int *)b = i;

Alignment problem. There is not guarantee that b is so aligned, that
it's address satisfies alignment-requirements, which an int-pointer
would need (such as even address, or maby "address divisible by 4", or
whatever it happens to be)

3)

struct
{
char a;
int b;
} mystruct;

// set byte at lowest address within b to 0x12;
char *p = &mystruct;
p[1] = 0x12;

between a and b, so that b is aligned for good memory-access.

Rather use:
*((char *)&mystruct.b) = 0x12;

Or:
struct
{
char a;
union {
int b;
char c;
};
} mystruct;
mystruct.c = 0x12

Would this union work on every platform??

Which undefined behaviour (or other bugs) do you think is interesting
Thanks.

Jens Thoms Toerring
Guest
Posts: n/a

 05-03-2012
John Reye <> wrote:
> 1)
> a[i] = i++;

> http://c-faq.com/expr/evalorder1.html

> So this attempt at optimisation is wrong:

> int i;
> int a[10];
> for (i = 0; i < sizeof(a); )
> a[i] = i++; /* set a to {1, 2, 3, ...} */

> Correctly optimized, it should be:
> for (i = 0; i < sizeof(a); )
> i = a[i] = i+1; /* set a to {1, 2, 3, ...} */

I don
't know what this has to do with "optimiation" but, tes, the
second version is more correct. But there's still a problemL
you use of sizeif(a) as the value for the end of the loop. But
sizeof(a) is the number of bytes in that array, not the number
of elements (and these are typically, except for char arrays,
different). Use instead "sizeof a / sizeof *a" (and make sure
that 'a' is an array and not merely a pointer).

> 2)
> int i;
> char b[10];
> (int *)b = i;

> Alignment problem. There is not guarantee that b is so aligned, that
> it's address satisfies alignment-requirements, which an int-pointer
> would need (such as even address, or maby "address divisible by 4", or
> whatever it happens to be)

True. If, for some reasons, you must do that use memcpy().

> 3)
> struct
> {
> char a;
> int b;
> } mystruct;

> // set byte at lowest address within b to 0x12;
> char *p = &mystruct;
> p[1] = 0x12;

> Bug: struct padding was forgotten. There will probably be some padding
> between a and b, so that b is aligned for good memory-access.

> Rather use:
> *((char *)&mystruct.b) = 0x12;

Correct but also don't do that unless you have very good reasons
- why would you set just one of the bytes of an int? Depending
on the endianness etc. of your machine the result when the 'b'
member then is used as an int will be quite different.

> Or:
> struct
> {
> char a;
> union {
> int b;
> char c;
> };
> } mystruct;
> mystruct.c = 0x12

> Would this union work on every platform??

Yes. But the you won't be able to use the 'b' element of
the union since reading a different member than has been
set the last time round also invokes undefined behavior.

> Which undefined behaviour (or other bugs) do you think is interesting

All of them There's a complete list at the end of the C
standard (at least in C89, see A.6.2, and C99, see Annex J2).

Regards, Jens
--
\ Jens Thoms Toerring ___
\__________________________ http://toerring.de

John Reye
Guest
Posts: n/a

 05-03-2012
Thanks for the warning about sizeof!! I missed that.

On May 3, 7:38*pm, j...@toerring.de (Jens Thoms Toerring) wrote:
> > Or:
> > * * * * * * * * * struct
> > * * * * * * * * * {
> > * * * * * * * * * * *char a;
> > * * * * * * * * * * *union {
> > * * * * * * * * * * * * int *b;
> > * * * * * * * * * * * * char c;
> > * * * * * * * * * * *};
> > * * * * * * * * * } mystruct;
> > * * * * * * * * * mystruct.c = 0x12
> > Would this union work on every platform??

>
> Yes. But the you won't be able to use the 'b' element of
> the union since reading a different member than has been
> set the last time round also invokes undefined behavior.

Oh no, please no. Why on earth would that be undefined behaviour?

mystruct.c = 0x12;
int tmp = mystruct.b;
f(tmp);

Do you really mean, that the code above does not guarantee that the
lowest byte-address of tmp is set to 0x12??
I simply cannot believe that. It it is so, then please suggest to me
how I can solve this.

Ben Pfaff
Guest
Posts: n/a

 05-03-2012
John Reye <> writes:

> Thanks for the warning about sizeof!! I missed that.
>
> On May 3, 7:38Â*pm, j...@toerring.de (Jens Thoms Toerring) wrote:
>> > Or:
>> > Â* Â* Â* Â* Â* Â* Â* Â* Â* struct
>> > Â* Â* Â* Â* Â* Â* Â* Â* Â* {
>> > Â* Â* Â* Â* Â* Â* Â* Â* Â* Â* Â*char a;
>> > Â* Â* Â* Â* Â* Â* Â* Â* Â* Â* Â*union {
>> > Â* Â* Â* Â* Â* Â* Â* Â* Â* Â* Â* Â* int Â*b;
>> > Â* Â* Â* Â* Â* Â* Â* Â* Â* Â* Â* Â* char c;
>> > Â* Â* Â* Â* Â* Â* Â* Â* Â* Â* Â*};
>> > Â* Â* Â* Â* Â* Â* Â* Â* Â* } mystruct;
>> > Â* Â* Â* Â* Â* Â* Â* Â* Â* mystruct.c = 0x12
>> > Would this union work on every platform??

>>
>> Yes. But the you won't be able to use the 'b' element of
>> the union since reading a different member than has been
>> set the last time round also invokes undefined behavior.

>
> Oh no, please no. Why on earth would that be undefined behaviour?

C99 6.2.6.1 "Language" says:

7 When a value is stored in a member of an object of union
type, the bytes of the object representation that do not
correspond to that member but do correspond to other members
take unspecified values, but the value of the union object
shall not thereby become a trap representation.

So you can't portably rely on the value of 'b' after assigning to
'c'.

Jens Thoms Toerring
Guest
Posts: n/a

 05-03-2012
John Reye <> wrote:
> Thanks for the warning about sizeof!! I missed that.

> On May 3, 7:38Â*pm, j...@toerring.de (Jens Thoms Toerring) wrote:
> > > Or:
> > > Â* Â* Â* Â* Â* Â* Â* Â* Â* struct
> > > Â* Â* Â* Â* Â* Â* Â* Â* Â* {
> > > Â* Â* Â* Â* Â* Â* Â* Â* Â* Â* Â*char a;
> > > Â* Â* Â* Â* Â* Â* Â* Â* Â* Â* Â*union {
> > > Â* Â* Â* Â* Â* Â* Â* Â* Â* Â* Â* Â* int Â*b;
> > > Â* Â* Â* Â* Â* Â* Â* Â* Â* Â* Â* Â* char c;
> > > Â* Â* Â* Â* Â* Â* Â* Â* Â* Â* Â*};
> > > Â* Â* Â* Â* Â* Â* Â* Â* Â* } mystruct;
> > > Â* Â* Â* Â* Â* Â* Â* Â* Â* mystruct.c = 0x12
> > > Would this union work on every platform??

> >
> > Yes. But the you won't be able to use the 'b' element of
> > the union since reading a different member than has been
> > set the last time round also invokes undefined behavior.

> Oh no, please no. Why on earth would that be undefined behaviour?

> mystruct.c = 0x12;
> int tmp = mystruct.b;
> f(tmp);

> Do you really mean, that the code above does not guarantee that the
> lowest byte-address of tmp is set to 0x12??

That's exactly what it does - it sets the byte at the lowest
memory address. The problem is only: what do you get when you
now read the whole int? On a little-endian machine this will
have modified the least significant byte while on a big-endian
machine the most significant byte. Thus the standard isn't able
to define what value you will get when you read 'b' after you
have written something to 'c'. Or a different example: when you
have

union {
double a;
int b;
} my_union;

my_union.b = 3;
printf( "%f\n", my_union.a );

What would you expect to get? Or how could doing something
like that be defined properly?
Regards, Jens
--
\ Jens Thoms Toerring ___
\__________________________ http://toerring.de

John Reye
Guest
Posts: n/a

 05-03-2012
On May 3, 8:09*pm, Ben Pfaff wrote:
> C99 6.2.6.1 "Language" says:
>
> 7 * *When a value is stored in a member of an object of union
> * * *type, the bytes of the object representation that do not
> * * *correspond to that member but do correspond to other members
> * * *take unspecified values, but the value of the union object
> * * *shall not thereby become a trap representation.
>
> So you can't portably rely on the value of 'b' after assigning to
> 'c'.

Thanks for the exact C-standard reference.
Still: this is very unsettling.

Would this solve the issue ?? ->

struct
{
char a;
union {
int b;
struct {
char byte0;
char byte1;
char byte2;
char byte3;
};
};
} mystruct;
mystruct.byte0 = 0x12;
int tmp = mystruct.b;
f(b);

If it does solve the issue, and the lowest-address-byte of tmp is
0x12, then:
Ahhh it is not portable. Other ints have only 2 bytes.
Is there a portable way of fixing this??

Ben Pfaff
Guest
Posts: n/a

 05-03-2012
John Reye <> writes:

> On May 3, 8:09Â*pm, Ben Pfaff wrote:
> Would this solve the issue ?? ->
>
> struct
> {
> char a;
> union {
> int b;
> struct {
> char byte0;
> char byte1;
> char byte2;
> char byte3;
> };
> };
> } mystruct;
> mystruct.byte0 = 0x12;
> int tmp = mystruct.b;
> f(b);
>
> If it does solve the issue, and the lowest-address-byte of tmp is
> 0x12, then:
> Ahhh it is not portable. Other ints have only 2 bytes.
> Is there a portable way of fixing this??

your proposed solution? I can think of multiple solutions, but I

John Reye
Guest
Posts: n/a

 05-03-2012
On May 3, 8:11*pm, (Jens Thoms Toerring) wrote:
> > mystruct.c = 0x12;
> > int tmp = mystruct.b;
> > f(tmp);
> > Do you really mean, that the code above does not guarantee that the
> > lowest byte-address *of tmp is set to 0x12??

>
> That's exactly what it does - it sets the byte at the lowest

OK, so then there is no problem with reading b, right.
I always did say: lowest byte-address.

What you write there is a different issue:
>The problem is only: what do you get when you
> now read the whole int? On a little-endian machine this will
> have modified the least significant byte while on a big-endian
> machine the most significant byte. Thus the standard isn't able
> to define *what value you will get when you read 'b' after you
> have written something to 'c'.

Well yes, because there are machines with different endianness.

But I can get around that quite easy like this:

#include <stdio.h>

#define LITTLE_ENDIAN /* undef it, if big endian!! */

#ifdef LITTLE_ENDIAN

struct byte4 {
unsigned char byte0;
unsigned char byte1;
unsigned char byte2;
unsigned char byte3;
};
#else

struct byte4 {
unsigned char byte3;
unsigned char byte2;
unsigned char byte1;
unsigned char byte0;
};

#endif

int main(void)
{
struct
{
char a;
union {
int b;
struct byte4 b4;
};
} mystruct;
mystruct.b4.byte0 = 0x12; // this sets least signif. byte!
printf("%x\n", mystruct.b);
}

John Reye
Guest
Posts: n/a

 05-03-2012
> your proposed solution? *I can think of multiple solutions, but I

Hang on. I think I just understood the following:

On May 3, 7:38 pm, (Jens Thoms Toerring) wrote:
> Yes. But the you won't be able to use the 'b' element of
> the union since reading a different member than has been
> set the last time round also invokes undefined behavior.

On May 3, 8:09 pm, Ben Pfaff wrote:
> So you can't portably rely on the value of 'b' after assigning to
> 'c'.

I thought this means that after writing c
mystruct.c = 0x12;
there is no guarantee that the lowest-byte address of mystruct.b is
0x12.

This is what got me all confused.
So it was a misunderstanding, from my side.

Ultimately: we cannot rely on the VALUE of mystruct.b, if the VALUE is
interpreted as integer.
However the lowest-byte-address within mystruct.b is guaranteed to be
0x12.

John Reye
Guest
Posts: n/a

 05-03-2012
To lesson the confusion.
Note: My last 2 posts are unrelated.

AAA)
In the post that has the code
#ifdef LITTLE_ENDIAN
I set the LEAST SIGNIFICANT BYTE of mystruct.b to 0x12.

BBB)
In all other posts above, I set the LOWEST-ADDRESSED-BYTE WITHIN
mystruct.b to 0x12.

AAA) and BBB) are not the same because of the endianness issue:
little endian is not big endian.