Velocity Reviews > contiguity of arrays

# contiguity of arrays

Steve Kobes
Guest
Posts: n/a

 09-25-2004
Is this legal? Must it print 4?

int a[2][2] = {{1, 2}, {3, 4}}, *b = a[0];
printf("%d\n", *(b + 3));

--Steve

Al Bowers
Guest
Posts: n/a

 09-25-2004

Steve Kobes wrote:
> Is this legal? Must it print 4?
>
> int a[2][2] = {{1, 2}, {3, 4}}, *b = a[0];
> printf("%d\n", *(b + 3));
>
> --Steve

Yes it is legal and will print 4.
This array describes a contiguously allocated nonempty set of objects of
type int.
b = a[0] positions the int *b to the beginning of the array object as
would b = &a[0][0].

#include <stdio.h>

int main(void)
{
int a[2][2] = {{1, 2}, {3, 4}}, *b = a[0];

printf("a[2][2] = ((1,2},{3,4}};\nint *b;\n"
"Using b = a[0]. *(b + 3) = %d\n", *(b + 3));
b = &a[0][0];
printf("Using b = &a[0][0]. *(b + 3) = %d\n",*(b+3));
printf("The pointer values &a[0][0] and a[0] are%sequal\n",
(&a[0][0] == a[0])?" ":" not ");
return 0;
}

--
Al Bowers
Tampa, Fl USA
mailto: http://www.velocityreviews.com/forums/(E-Mail Removed) (remove the x to send email)
http://www.geocities.com/abowers822/

Chris Torek
Guest
Posts: n/a

 09-25-2004
>Steve Kobes wrote:
>> Is this legal? Must it print 4?
>>
>> int a[2][2] = {{1, 2}, {3, 4}}, *b = a[0];
>> printf("%d\n", *(b + 3));

In article <news:(E-Mail Removed)>
Al Bowers <(E-Mail Removed)> wrote:
>Yes it is legal and will print 4.

I believe there are those in comp.std.c, at least, who will disagree
with you.

>This array describes a contiguously allocated nonempty set of objects of
>type int.
>b = a[0] positions the int *b to the beginning of the array object as
>would b = &a[0][0].

In particular, "b" points to the first of the two consecutive "int"s
{1, 2}. These two consecutive "int"s are indeed followed immediately
in memory by the two consecutive "int"s {3, 4} -- but a compiler
is allowed, through arbitrarily magical means, to insert bounds-checking
on array subscripts and pointer arithmetic. (Remember that these
two are quite similar: subscripting is simply address arithmetic
followed by pointer indirection, and pointer indirection is simply
subscripting -- *p and p[0] are identical.[%] Note that we are allowed
to compute &array[N] but not follow the resulting pointer, so there
is a slight hitch in using the same code for both -- you also need
a boolean flag "this address arithmetic will be followed by
indirection, or not", to handle this in any bounds-checking code.)

Suppose that the compiler does in fact do this magical bounds-checking,
so that it "knows" that b points to the first of only *two* ints stored
in the array a[0]. In this case, b[0] and b[1] are allowed, but b[2]
is rejected with an "out of bounds array subscript" error.

(Personally, I find it all quite unlikely, but it does seem to be
allowed. I also think that, if one writes a compiler with the
decidedly non-magical bounds-checking done via either "fat pointers"
or some kind of runtime table lookups, the bounds for b should run
from 0 to 3 inclusive, in this case, rather than 0 to 1. But these
are my opinions of "the way it ought to work" rather than "what
the letter of the C standard says".)

In practice, I do not believe anyone has yet found an implementation
where b[3] (or equivalently, *(b + 3)) is caught as an error. The
most likely place to find this kind of runtime[%%] checking is in
those in years.

[% As a result, you can always replace (*p) with p[0], including
in expressions like (*p).field, which is more usually written as
p->field. Hence the -> operator is replaceable with "[0].", e.g.,

if (a->b->c == 42)

you can write:

if (a[0].b[0].c == 42)

This example just goes to show that C has a plethora of operators. ]

[%% This particular case can be caught at compile time, but in
general one would want to test the subscript in b[i], or the offset
in (b+i), at runtime, both because "i" might be difficult or
impossible to predict, and also because "b" might also be difficult
or impossible to predict:

int a1[10], a2[2];
int *b;
...
if (user_input == 42)
b = a1;
else if (user_input == 6 * 9)
b = a2;
...
do_something(b[3]);

Here, a general-purpose bounds-checker might do something vaguely
like this:

struct bounds *bp;

bp = bounds_lookup(b);
if (bp == NULL)
__runtime_error("invalid pointer");
if (bp->upper_limit < 3)
__runtime_error("out of bounds array subscript");
tmp = b[3];
do_something(tmp);

to handle the call to do_something(). The bounds_lookup() function
compares the supplied pointer value against a table of all active
arrays -- all "global variables" that have array type, all outstanding
malloc()s, and all local variables that are still live on the current
stack frame, for instance -- and finds the corresponding array
bounds information. We can do this without concerning ourselves
with ordinary simple pointers (&localvar, where localvar is *not*
an array) because we have a nonzero subscript. If this were using
b[i] the code would have to do:

if (i != 0) {
struct bounds *bp;
... as before ...
} else
tmp = *b; /* cannot do bounds checking on zero subscripts */
do_something(tmp);

becaues each ordinary variable "acts like" an array of size 0.
Alternatively, bounds_lookup() might actually have bounds-table
of just all arrays.

The bounds-checking above needs to handle pointer arithmetic as
well, so that if we do "b++", the upper limit is reduced by one
and the lower limit goes from 0 to -1. (There are other methods
for handling this that are probably more practical in real
bounds-checkers; this version is just for illustration.)

Bounds-checking tends to slow down programs quite a bit, and is an
area in which good compiler-level optimization is important. If
a loop is guaranteed to run from 0 to N-1 inclusive, a single
bounds-check on N-1 before entering the loop at all replaces many
runtime tests. If the bounds-check can be done at compile time,
then omitted entirely from the runtime code, that is even better.]
--
In-Real-Life: Chris Torek, Wind River Systems
Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W) +1 801 277 2603
Reading email is like searching for food in the garbage, thanks to spammers.

pete
Guest
Posts: n/a

 09-25-2004
Chris Torek wrote:
>
> >Steve Kobes wrote:
> >> Is this legal? Must it print 4?
> >>
> >> int a[2][2] = {{1, 2}, {3, 4}}, *b = a[0];
> >> printf("%d\n", *(b + 3));

>
> In article <news:(E-Mail Removed)>
> Al Bowers <(E-Mail Removed)> wrote:
> >Yes it is legal and will print 4.

>
> I believe there are those in comp.std.c, at least, who will disagree
> with you.

If it was

int a[2][2] = {{1, 2}, {3, 4}}, *b = (int *)a;
^^^^^^^
then I believe you could access b[3].

--
pete

Barry Schwarz
Guest
Posts: n/a

 09-25-2004
On 25 Sep 2004 17:21:37 GMT, Chris Torek <(E-Mail Removed)> wrote:

>>Steve Kobes wrote:
>>> Is this legal? Must it print 4?
>>>
>>> int a[2][2] = {{1, 2}, {3, 4}}, *b = a[0];
>>> printf("%d\n", *(b + 3));

>
>In article <news:(E-Mail Removed)>
>Al Bowers <(E-Mail Removed)> wrote:
>>Yes it is legal and will print 4.

>
>I believe there are those in comp.std.c, at least, who will disagree
>with you.

I could not follow your discussion which followed so I don't know
exactly which point I'm about to disagree with. Instead I will try to
present an argument that I believe shows the original question about
legality must be answered in the affirmative.

For an array q (not declared as a function parameter),
sizeof q / sizeof *q must evaluate to the number of elements in b.

For a pointer p to type T, the expression p+1 must evaluate to the
same value as (T*)((char*)p + sizeof(T))

Therefore

sizeof a must evaluate to the same value as 2*sizeof a[0]

sizeof a[0] and sizeof a[1] both must evaluate to the same value
as 2*sizeof a[0][0].

sizeof a[0][0] must evaluate to the same value as sizeof(int)

sizeof a must evaluate to the same value as 4*sizeof(int)

Since a contains 4 int and is exactly large enough to contain 4 int,
these 4 int must be located at b, b+1, b+2, and b+3.

snip

<<Remove the del for email>>

Mark McIntyre
Guest
Posts: n/a

 09-25-2004
On 25 Sep 2004 22:48:34 GMT, in comp.lang.c , Barry Schwarz
<(E-Mail Removed)> wrote:

>On 25 Sep 2004 17:21:37 GMT, Chris Torek <(E-Mail Removed)> wrote:
>
>>>Steve Kobes wrote:
>>>> Is this legal? Must it print 4?
>>>>
>>>> int a[2][2] = {{1, 2}, {3, 4}}, *b = a[0];
>>>> printf("%d\n", *(b + 3));

>>
>>In article <news:(E-Mail Removed)>
>>Al Bowers <(E-Mail Removed)> wrote:
>>>Yes it is legal and will print 4.

>>
>>I believe there are those in comp.std.c, at least, who will disagree
>>with you.

>
>I could not follow your discussion which followed so I don't know
>exactly which point I'm about to disagree with. Instead I will try to
>present an argument that I believe shows the original question about
>legality must be answered in the affirmative.

(snip argument which is based on the size of the objects, and 1-d array
arithmetic. ) The size argument is spurious. The compiler is allowed to
tell you the object is size 4, even if all 4 members were in separate
memory arrays in different solar systems. Theres no limit to the internal
magic the compiler can perform.

--
Mark McIntyre
CLC FAQ <http://www.eskimo.com/~scs/C-faq/top.html>

----== Posted via Newsfeed.Com - Unlimited-Uncensored-Secure Usenet News==----
http://www.newsfeed.com The #1 Newsgroup Service in the World! >100,000 Newsgroups
---= 19 East/West-Coast Specialized Servers - Total Privacy via Encryption =---

pete
Guest
Posts: n/a

 09-25-2004
Barry Schwarz wrote:
>
> On 25 Sep 2004 17:21:37 GMT, Chris Torek <(E-Mail Removed)> wrote:
>
> >>Steve Kobes wrote:
> >>> Is this legal? Must it print 4?
> >>>
> >>> int a[2][2] = {{1, 2}, {3, 4}}, *b = a[0];
> >>> printf("%d\n", *(b + 3));

> >
> >In article <news:(E-Mail Removed)>
> >Al Bowers <(E-Mail Removed)> wrote:
> >>Yes it is legal and will print 4.

> >
> >I believe there are those in comp.std.c, at least, who will disagree
> >with you.

>
> I could not follow your discussion which followed so I don't know
> exactly which point I'm about to disagree with. Instead I will try to
> present an argument that I believe shows the original question about
> legality must be answered in the affirmative.
>
> For an array q (not declared as a function parameter),
> sizeof q / sizeof *q must evaluate to the number of elements in b.
>
> For a pointer p to type T, the expression p+1 must evaluate to the
> same value as (T*)((char*)p + sizeof(T))
>
> Therefore
>
> sizeof a must evaluate to the same value as 2*sizeof a[0]
>
> sizeof a[0] and sizeof a[1] both must
> evaluate to the same value as 2*sizeof a[0][0].
>
> sizeof a[0][0] must evaluate to the same value as sizeof(int)
>
> sizeof a must evaluate to the same value as 4*sizeof(int)
>
> Since a contains 4 int and is exactly large enough to contain 4 int,
> these 4 int must be located at b, b+1, b+2, and b+3.

The problem is that b, with (b = a[0])
is pointed at the first element of a two element array.

If you have

int a = 0, b = 0;

then, you can't do this

if (a + 1 == b) { /* The condition check is valid */
printf("%d\n", a[1]); /* Then printf call isn't valid */
}

Whether or not there is an int object at (a + 1) is not the point.
Overunning the bounds of the object, is the point.

--
pete

pete
Guest
Posts: n/a

 09-26-2004
pete wrote:
>
> Barry Schwarz wrote:
> >
> > On 25 Sep 2004 17:21:37 GMT, Chris Torek <(E-Mail Removed)> wrote:
> >
> > >>Steve Kobes wrote:
> > >>> Is this legal? Must it print 4?
> > >>>
> > >>> int a[2][2] = {{1, 2}, {3, 4}}, *b = a[0];
> > >>> printf("%d\n", *(b + 3));
> > >
> > >In article <news:(E-Mail Removed)>
> > >Al Bowers <(E-Mail Removed)> wrote:
> > >>Yes it is legal and will print 4.
> > >
> > >I believe there are those in comp.std.c, at least, who will disagree
> > >with you.

> >
> > I could not follow your discussion which followed so I don't know
> > exactly which point I'm about to disagree with. Instead I will try to
> > present an argument that I believe shows the original question about
> > legality must be answered in the affirmative.
> >
> > For an array q (not declared as a function parameter),
> > sizeof q / sizeof *q must evaluate to the number of elements in b.
> >
> > For a pointer p to type T, the expression p+1 must evaluate to the
> > same value as (T*)((char*)p + sizeof(T))
> >
> > Therefore
> >
> > sizeof a must evaluate to the same value as 2*sizeof a[0]
> >
> > sizeof a[0] and sizeof a[1] both must
> > evaluate to the same value as 2*sizeof a[0][0].
> >
> > sizeof a[0][0] must evaluate to the same value as sizeof(int)
> >
> > sizeof a must evaluate to the same value as 4*sizeof(int)
> >
> > Since a contains 4 int and is exactly large enough to contain 4 int,
> > these 4 int must be located at b, b+1, b+2, and b+3.

>
> The problem is that b, with (b = a[0])
> is pointed at the first element of a two element array.
>
> If you have
>
> int a = 0, b = 0;
>
> then, you can't do this
>
> if (a + 1 == b) { /* The condition check is valid */
> printf("%d\n", a[1]); /* Then printf call isn't valid */
> }

Major typos there.

Should be:

if (&a + 1 == &b) { /* The condition check is valid */
printf("%d\n", (&a)[1]); /* Then printf call isn't valid */
}

> Whether or not there is an int object at (a + 1) is not the point.
> Overunning the bounds of the object, is the point.

--
pete

E. Robert Tisdale
Guest
Posts: n/a

 09-26-2004
Steve Kobes wrote:

> Is this legal? Must it print 4?
>
> int a[2][2] = {{1, 2}, {3, 4}}, *b = a[0];
> printf("%d\n", *(b + 3));

Despite the fact that this works everywhere,
neither C89 or C99 "guarantees" this.

You can reference *(b+0) and *(b+1) and
you can compute (b+2) but you can't reference *(b+2).
It isn't "legal" to compute (b+3) much less reference *(b+3).

If this "equivalence" is important to you,
I don't think that you need to be too concerned

Michael Mair
Guest
Posts: n/a

 09-26-2004
Hiho,

> (snip argument which is based on the size of the objects, and 1-d array
> arithmetic. ) The size argument is spurious. The compiler is allowed to
> tell you the object is size 4, even if all 4 members were in separate
> memory arrays in different solar systems. Theres no limit to the internal
> magic the compiler can perform.

True. But I have yet to see the compiler that will correctly wrap _all_
calls to memcpy() and all other functions getting size_t and pointer
arguments for this case.
It is much easier to build non-broken programs the other way.

Cheers
Michael