Velocity Reviews > Pointer Equality for Different Array Objects

# Pointer Equality for Different Array Objects

Shao Miller
Guest
Posts: n/a

 02-03-2012
(More bounds-checking.)

Wait a minute... N1256 6.5.9p6:

"Two pointers compare equal if and only if both are null pointers,
both are pointers to the same object (including a pointer to an object
and a subobject at its beginning) or function, both are pointers to one
past the last element of the same array object, or one is a pointer to
one past the end of one array object and the other is a pointer to the
start of a different array object that happens to immediately follow the
first array object in the address space.94)"

So if we have:

union u_test {
int a[2][2];
int b[4];
};
union u_test test = { { { 0 } } };
int x = &test.a[1][1] == &test.b[3];

Is 'x' zero or one?

If we can claim, for the purposes of supporting bounds-checking, that
'&test.a[1][1]' points into an 'int[2]' and nothing more, and that
'&test.b[3]' points into an 'int[4]' and nothing more, then aren't these
pointers pointing to distinct objects?

If they're pointing to the same object, why might they not be for the
purposes of 6.5.6p8?

Just to illustrate an identical expression statement:

int x = (*(test.a + 1) + 1) == (test.b + 3);

--
"The stationery store has moved. Aaargh!"

Shao Miller
Guest
Posts: n/a

 02-03-2012
On 2/2/2012 19:34, Shao Miller wrote:
> Just to illustrate an identical expression statement:
>
> int x = (*(test.a + 1) + 1) == (test.b + 3);
>

Erm, declaration, I meant.

--
"The stationery store has moved. Aaargh!"

Kaz Kylheku
Guest
Posts: n/a

 02-03-2012
On 2012-02-03, Shao Miller <(E-Mail Removed)> wrote:
> (More bounds-checking.)
>
> Wait a minute... N1256 6.5.9p6:
>
> "Two pointers compare equal if and only if both are null pointers,
> both are pointers to the same object (including a pointer to an object
> and a subobject at its beginning) or function, both are pointers to one
> past the last element of the same array object, or one is a pointer to
> one past the end of one array object and the other is a pointer to the
> start of a different array object that happens to immediately follow the
> first array object in the address space.94)"

This can only happen in a correct program if the two array-like objects
are themselves members of a larger array.

>
> So if we have:
>
> union u_test {
> int a[2][2];
> int b[4];
> };
> union u_test test = { { { 0 } } };
> int x = &test.a[1][1] == &test.b[3];
>
> Is 'x' zero or one?

One. These are the same address because the elements of an array are
contiguous. A simulated two dimensional array has the storage layout
of a one-dimensional array, and so a[1][1] corresponds to b[3].

> If we can claim, for the purposes of supporting bounds-checking, that

For the purposes of bounds checking, you have two pointers here,
neither of which is out of bounds of the object from which they are derived.

A pointer comaprison (for exact equality, at that) does not create
a bounds-checking issue.

> '&test.a[1][1]' points into an 'int[2]' and nothing more, and that
> '&test.b[3]' points into an 'int[4]' and nothing more, then aren't these
> pointers pointing to distinct objects?

They are not distinct objects because they are overlapped in a union.

Even these have to compare equal:

union { int x; double y; } u;

&u.y == (double *) &u.x;

The members of a union all start at the same address. They not
distinct objects.

Shao Miller
Guest
Posts: n/a

 02-03-2012
On 2/2/2012 19:47, Kaz Kylheku wrote:
> On 2012-02-03, Shao Miller<(E-Mail Removed)> wrote:
>> (More bounds-checking.)
>>
>> Wait a minute... N1256 6.5.9p6:
>>
>> "Two pointers compare equal if and only if both are null pointers,
>> both are pointers to the same object (including a pointer to an object
>> and a subobject at its beginning) or function, both are pointers to one
>> past the last element of the same array object, or one is a pointer to
>> one past the end of one array object and the other is a pointer to the
>> start of a different array object that happens to immediately follow the
>> first array object in the address space.94)"

>
> This can only happen in a correct program if the two array-like objects
> are themselves members of a larger array.
>
>>
>> So if we have:
>>
>> union u_test {
>> int a[2][2];
>> int b[4];
>> };
>> union u_test test = { { { 0 } } };
>> int x =&test.a[1][1] ==&test.b[3];
>>
>> Is 'x' zero or one?

>
> One. These are the same address because the elements of an array are
> contiguous. A simulated two dimensional array has the storage layout
> of a one-dimensional array, and so a[1][1] corresponds to b[3].
>

Thank you for your response. I agree that they point to the same
location. But 6.5.9p6 doesn't state that pointing to the same location
is sufficient to yield an equality.

>> If we can claim, for the purposes of supporting bounds-checking, that

>
> For the purposes of bounds checking, you have two pointers here,
> neither of which is out of bounds of the object from which they are derived.
>
> A pointer comaprison (for exact equality, at that) does not create
> a bounds-checking issue.
>

Agreed. But does the result create an issue for the subject of
"bounds-checking?"

>> '&test.a[1][1]' points into an 'int[2]' and nothing more, and that
>> '&test.b[3]' points into an 'int[4]' and nothing more, then aren't these
>> pointers pointing to distinct objects?

>
> They are not distinct objects because they are overlapped in a union.
>

If they are not distinct objects, then they are the same object, right?
Well what type is the object? Is it an 'int'? If so, is it treated
as an 'int[1]' for the purposes of pointer arithmetic? Or is it an
array object, instead? If so, how many elements in the array? Could
you do:

&test.a[1][1] - 2;

?

> Even these have to compare equal:
>
> union { int x; double y; } u;
>
> &u.y == (double *)&u.x;
>
> The members of a union all start at the same address. They not
> distinct objects.

That example is different because "both are pointers to the same object
(including a pointer to an object and a subobject at its beginning)".
In the original example, whatever each pointer pointed to was not at the
beginning of any containing object... Was it? (Or were they?)

--
"The stationery store has moved. Aaargh!"

Shao Miller
Guest
Posts: n/a

 02-03-2012
On 2/2/2012 20:45, pete wrote:
> Shao Miller wrote:
>>
>> On 2/2/2012 19:47, Kaz Kylheku wrote:

>
>>> They are not distinct objects
>>> because they are overlapped in a union.
>>>

>>
>> If they are not distinct objects,
>> then they are the same object, right?
>> Well what type is the object?

>
> N1570
> 6.3.2.1 Lvalues, arrays, and function designators
> 1
> When an object is said to have a particular type,
> the type is specified by the lvalue used to designate the object.
>

Thanks! But does that help to determine the count of elements in some
containing array object?

--
"The stationery store has moved. Aaargh!"

Kaz Kylheku
Guest
Posts: n/a

 02-03-2012
On 2012-02-03, Shao Miller <(E-Mail Removed)> wrote:
> Thank you for your response. I agree that they point to the same
> location. But 6.5.9p6 doesn't state that pointing to the same location
> is sufficient to yield an equality.

It does if you interpret "same object" as being "same location".

What is an object? "A region of data storage in the execution environment,
the contents of which can represent values".

Pointers to the same object are pointers to the same region of data storage.

>>> If we can claim, for the purposes of supporting bounds-checking, that

>>
>> For the purposes of bounds checking, you have two pointers here,
>> neither of which is out of bounds of the object from which they are derived.
>>
>> A pointer comaprison (for exact equality, at that) does not create
>> a bounds-checking issue.
>>

>
> Agreed. But does the result create an issue for the subject of
> "bounds-checking?"

I don't think there can be any expectation of bounds checking between two views
of the same storage through a union. There is just no sense there of a pointer
from one straying out of bounds and into the other.

> If they are not distinct objects, then they are the same object, right?
> Well what type is the object? Is it an 'int'? If so, is it treated
> as an 'int[1]' for the purposes of pointer arithmetic? Or is it an
> array object, instead? If so, how many elements in the array? Could
> you do:

It is all these things simultaneously. You could ask these same questions
of a non-union object. Given int x[5], what is x[1]? Is it just an int?
Or is it a portion of the whole array? So what type is it?

ISO C uses the term "subobject" for embedded objects. x[1] is an int object
which s a subobject of the int[5] array.

Both arrays involved in the union have contain subobjects of type int which
correspond together.

If two views of a different type are aliased through a union, then
the type is whichever one is the last through which a value is stored.
This is from the special rules about unions.

I think it should hold for this kind of array aliasing. (But you're not even
whether certain pointers are reliably equal.)

> &test.a[1][1] - 2;

Also consider a[0][3]. This kind of "wrong geometry" access is in a kind of
gray area, where the standard isn't of a lot of help.

Right or wrong, some C programs are going to do this and find it to be quite
portable. So if you're doing bounds-checking, it's not entirely realistic to
insist on diagnose such things. A programmer who doesn't consider that to be a
bounds error will be irked by the diagnostics, perceived to be a nuisance.

On the other hand, the "wrong geometry" access to a[0][3] could be unintended,
and indicate a bug. Anotehr programmer might be thankful to have that diagnosed
(maybe even the same programmer, in a different programming situation).

from many years ago. Some of the arguments hinged on whether such an access is
done blatantly with array indexing, or displacement of pointer directly obtaind
from "array" decay. Under that kind of hair-splitting &a[0][0] + 3 seems less
wrong than than a[0][3] because the former has a pointer to just an int, which
is then being displaced, in a way that is disconnected from the geometry of the
array.

Kaz Kylheku
Guest
Posts: n/a

 02-03-2012
On 2012-02-03, pete <(E-Mail Removed)> wrote:
> Shao Miller wrote:
>>
>> On 2/2/2012 19:47, Kaz Kylheku wrote:

>
>> > They are not distinct objects
>> > because they are overlapped in a union.
>> >

>>
>> If they are not distinct objects,
>> then they are the same object, right?
>> Well what type is the object?

>
> N1570
> 6.3.2.1 Lvalues, arrays, and function designators
> 1
> When an object is said to have a particular type,
> the type is specified by the lvalue used to designate the object.

This is a simplification if taken by itself. Some objects have a declared
type, and if you form some other lvalue to access them, you're in
undefined behavior land.

Shao Miller
Guest
Posts: n/a

 02-03-2012
On 2/2/2012 21:35, Kaz Kylheku wrote:
> On 2012-02-03, Shao Miller<(E-Mail Removed)> wrote:
>> Thank you for your response. I agree that they point to the same
>> location. But 6.5.9p6 doesn't state that pointing to the same location
>> is sufficient to yield an equality.

>
> It does if you interpret "same object" as being "same location".
>
> What is an object? "A region of data storage in the execution environment,
> the contents of which can represent values".
>
> Pointers to the same object are pointers to the same region of data storage.
>

A nice conclusion. There is a fellow in another C-devoted forum who
insists on using the word "into" when discussing pointers. Pointers
always point "into" something, and that something would make sense as
"region of data storage."

Surely in:

int i = 42;
char * p = &i;

the pointer 'p' doesn't point "to" the 'int'-typed object designated by
'i'. After all, a 'char *' points "to" a 'char'. But it certainly
points "into" the object designated by 'i'.

>>>> If we can claim, for the purposes of supporting bounds-checking, that
>>>
>>> For the purposes of bounds checking, you have two pointers here,
>>> neither of which is out of bounds of the object from which they are derived.
>>>
>>> A pointer comaprison (for exact equality, at that) does not create
>>> a bounds-checking issue.
>>>

>>
>> Agreed. But does the result create an issue for the subject of
>> "bounds-checking?"

>
> I don't think there can be any expectation of bounds checking between two views
> of the same storage through a union. There is just no sense there of a pointer
> from one straying out of bounds and into the other.
>

I'm sorry that the demonstration failed. I intended to highlight that
if we are talking about what a pointer points to (or into), the notion
ought to be consistent throughout an interpretation of the definitions
of Standard C.

If we are going to say that the pointers compare as equal, it's because
they point to the same object (or subobject). But then with pointer
arithmetic, we have the vague ("if the array is large enough"). Well
what array? All [non-bit-field] objects are an array of bytes. That
array? For single-dimensional arrays, that array? For
multi-dimensional arrays, which dimension do we pick for our notion of
"the array" in order to determine if it's large enough for the pointer
arithmetic to be defined?

Essentially, given:

int a[2][2] = { { 0 } };

if we say that 'a[1] + 1' yields a pointer value X and that pointer
arithmetic is only defined for { X - 1, X, X + 1 }, we can potentially
justify that conclusion by saying "the array object" is not the larger
containing array, but the second 'int[2]' of the larger containing
array, only.

But if _that's_ the object being pointed into, I think we ought to stick
to that for pointer equality. So then the two pointers in the original
code do _not_ point to the same object or a sub-object at its beginning.

On the other hand, if we justify the pointer equality by saying that the
objects occupy the same location or that the pointers point into the
same region of data storage, well then pointer arithmetic should be
defined across that region of data storage, not just a particular
partition of it.

That is, it seems odd if two pointers with the same type, and pointing
into the same region of data storage, and pointing at the same byte in
that region of data storage, and comparing as equal, have different
defined boundaries for "the array" when under consideration for pointer
arithmetic.

And the pointer equality definition is so specific with its "if and only
if."

Add to that the definition of all object representations being
accessible via 'unsigned char' type and it seems that any bounds are at
the beginning of the contiguous region of memory and at one byte past
the end.

>> If they are not distinct objects, then they are the same object, right?
>> Well what type is the object? Is it an 'int'? If so, is it treated
>> as an 'int[1]' for the purposes of pointer arithmetic? Or is it an
>> array object, instead? If so, how many elements in the array? Could
>> you do:

>
> It is all these things simultaneously. You could ask these same questions
> of a non-union object. Given int x[5], what is x[1]? Is it just an int?
> Or is it a portion of the whole array? So what type is it?
>

Agreed. I'm glad you find them analogous.

> ISO C uses the term "subobject" for embedded objects. x[1] is an int object
> which s a subobject of the int[5] array.
>
> Both arrays involved in the union have contain subobjects of type int which
> correspond together.
>

Agreed. And it seems that an 'int[4][5]' array has 20 sub-objects of
type 'int' that correspond to each of the combinations of index that can
be used with the 'int[4][5]' and that point to 'int' objects (not one past).

> If two views of a different type are aliased through a union, then
> the type is whichever one is the last through which a value is stored.
> This is from the special rules about unions.
>

And of course there's "type punning." In your example below, is
'a[0][3]' not similarly type punning the 'int[2][2]' as an 'int[4]' and
designating/accessing the fourth element?

> I think it should hold for this kind of array aliasing. (But you're not even
> whether certain pointers are reliably equal.)
>
>> &test.a[1][1] - 2;

>
> Also consider a[0][3]. This kind of "wrong geometry" access is in a kind of
> gray area, where the standard isn't of a lot of help.
>

which is why this thread was "more bounds-checking." The gray area is
what I'm trying to explore... Definitions, consequences, consistency, etc.

> Right or wrong, some C programs are going to do this and find it to be quite
> portable. So if you're doing bounds-checking, it's not entirely realistic to
> insist on diagnose such things. A programmer who doesn't consider that to be a
> bounds error will be irked by the diagnostics, perceived to be a nuisance.
>
> On the other hand, the "wrong geometry" access to a[0][3] could be unintended,
> and indicate a bug. Anotehr programmer might be thankful to have that diagnosed
> (maybe even the same programmer, in a different programming situation).
>

The clearest case I can think of is where a 'for' loop can be known at
translation-time to allow for an array index to go out-of-bounds without
any fancy business happening to the index. This seems worth warning about!

A run-time check might set up traps at one byte before and one byte
after a range of data storage. That seems sensible, too!

But for any stricter run-time bounds-checking, such as catching
'a[i][j]' where the ranges for 'i' and 'j' aren't known at
translation-time and go out-of-bounds, there could be checks for each
dimension of a multi-dimensional array, but is that consistent with C?

> from many years ago. Some of the arguments hinged on whether such an access is
> done blatantly with array indexing, or displacement of pointer directly obtaind
> from "array" decay. Under that kind of hair-splitting&a[0][0] + 3 seems less
> wrong than than a[0][3] because the former has a pointer to just an int, which
> is then being displaced, in a way that is disconnected from the geometry of the
> array.
>
>

Yeah, but I wouldn't call it hair-splitting... The array subscripting
operator is defined with an identity given in terms of the binary
addition operator and the unary indirection operator (and parentheses).
I think that's rather important and worth discussion if it's at all a
"gray area."

It would seem unfair to give the array subscripting notation 'a[0][3]'
some kind of bounds-preferential treatment versus the identical '*(*(a +
0) + 3)' notation.

But we do see references to "provenance" in at least one defect report,
and this seems related to the "provenance" of a pointer. If it "came
from" an array with certain boundaries, then pointer arithmetic is only
defined for its use with those boundaries, despite the fact that another
pointer with difference boundaries is identical in every other way.

Of course "provenance" seems like a "gray area," since you can combine
things (such as via bit-wise operators). Then whence did they come? An
example would be combining two objects into a destination such that the
effective type of the destination cannot be determined.

a[0][3]
*( *( a + 0 ) + 3 )
*( *( 'int[2][2]' + 0 ) + 3 )
*( *( 'int (*)[2]' + 0 ) + 3 )
*( *( 'int (*)[2]' ) + 3)
*( 'int[2]' + 3 )
*( 'int *' + 3 )
*( 'int *' )
'int'

--
"The stationery store has moved. Aaargh!"

Shao Miller
Guest
Posts: n/a

 02-03-2012
On 2/3/2012 01:27, Shao Miller wrote:
>
> Surely in:
>
> int i = 42;
> char * p = &i;
>

Kaz Kylheku
Guest
Posts: n/a

 02-03-2012
On 2012-02-03, Shao Miller <(E-Mail Removed)> wrote:
> But for any stricter run-time bounds-checking, such as catching
> 'a[i][j]' where the ranges for 'i' and 'j' aren't known at
> translation-time and go out-of-bounds, there could be checks for each
> dimension of a multi-dimensional array, but is that consistent with C?

You know, who cares? All that matters is: is this check valuable to the user?
Checks can be stronger or weaker than the standard language. Suppose that the
C standard explicitly said that an array is just flat memory that can be
aliased with any multi-dimensional array geometry. Well, someone might still
want some of their code to pass array dimension bounds checks.

> It would seem unfair to give the array subscripting notation 'a[0][3]'
> some kind of bounds-preferential treatment versus the identical '*(*(a +
> 0) + 3)' notation.

No, but the preferential treatment could actually stem from the
"a + displacement" where a is not a pointer, but an array (that converts to a
pointer on evaluation) and not from the choice of notation.

In such an expression, there is enough info to know that the displacement is in
bounds with respect to the static type of a, regardless of the larger
container in which that array finds itself.

> But we do see references to "provenance" in at least one defect report,
> and this seems related to the "provenance" of a pointer. If it "came
> from" an array with certain boundaries, then pointer arithmetic is only
> defined for its use with those boundaries, despite the fact that another
> pointer with difference boundaries is identical in every other way.
>
> Of course "provenance" seems like a "gray area," since you can combine
> things (such as via bit-wise operators). Then whence did they come?

Clearly, at some point provenance has to be severed. A good rule of thumb (if
provenance were to stop being a gray area) might be that a pointer value
that is derived from a conversion from array, plus any combination of
displacements, has provenance from that array. As soon as &..*.. is involved,
it should be lost: e.g. &a[0] or &*(a + 0) ought to drop provenance.