Velocity Reviews > Addressing multidimensional arrays

Baboon
Guest
Posts: n/a

 12-11-2008

Suppose I have a multidimensional array, like so

T arr[N][M];

and in one place of my code it makes sense to access all N*M elements
via one single index.

Now, if I understand the C standard correctly, even though all elements
are continuous in memory (they are, are they not?), indices can only ever
be 0..N-1 and 0..M-1, respectively.

But, if I have a pointer T *p pointing at arr[0][0], and I increment p
N*M times, and access arr along the way, would that invoke UB? I mean,
arr is still one single object, right?

What if I use p[i] with i=0..M*N-1 ?

vippstar@gmail.com
Guest
Posts: n/a

 12-11-2008
On Dec 11, 3:06 am, Baboon <(E-Mail Removed)> wrote:
> Suppose I have a multidimensional array, like so
>
> T arr[N][M];
>
> and in one place of my code it makes sense to access all N*M elements
> via one single index.
>
> Now, if I understand the C standard correctly, even though all elements
> are continuous in memory (they are, are they not?), indices can only ever
> be 0..N-1 and 0..M-1, respectively.
>
> But, if I have a pointer T *p pointing at arr[0][0], and I increment p
> N*M times, and access arr along the way, would that invoke UB? I mean,
> arr is still one single object, right?

It's UB. Here's a relevant discussion via google groups.
b413ac5f3d3214ed/584ff61646b8bab6>

> What if I use p[i] with i=0..M*N-1 ?

p is presumably T (*)[M]. p[i] is T [M]. You don't have 0 - M*N - 1
such objects. (ie p doesn't point to that many objects).
So it's plain wrong.

vippstar@gmail.com
Guest
Posts: n/a

 12-11-2008
On Dec 11, 6:46 am, Barry Schwarz <(E-Mail Removed)> wrote:
> On Wed, 10 Dec 2008 17:26:41 -0800 (PST), (E-Mail Removed) wrote:
> >On Dec 11, 3:06 am, Baboon <(E-Mail Removed)> wrote:
> >> Suppose I have a multidimensional array, like so

>
> >> T arr[N][M];

>
> >> and in one place of my code it makes sense to access all N*M elements
> >> via one single index.

>
> >> Now, if I understand the C standard correctly, even though all elements
> >> are continuous in memory (they are, are they not?), indices can only ever
> >> be 0..N-1 and 0..M-1, respectively.

>
> >> But, if I have a pointer T *p pointing at arr[0][0], and I increment p
> >> N*M times, and access arr along the way, would that invoke UB? I mean,
> >> arr is still one single object, right?

>
> >> What if I use p[i] with i=0..M*N-1 ?

>
> >p is presumably T (*)[M]. p[i] is T [M]. You don't have 0 - M*N - 1

>
> What part of "I have a pointer T *p" leads you to presume that p is
> anything other than a simple T*.

I can't recall what it was that I was thinking of at the time, most

Phil Carmody
Guest
Posts: n/a

 12-11-2008
"Jujitsu Lizard" <(E-Mail Removed)> writes:
> "Baboon" <(E-Mail Removed)> wrote in message
> news:(E-Mail Removed)...
>>
>> Suppose I have a multidimensional array, like so
>>
>> T arr[N][M];
>>
>> and in one place of my code it makes sense to access all N*M elements
>> via one single index.
>>
>> Now, if I understand the C standard correctly, even though all elements
>> are continuous in memory (they are, are they not?), indices can only ever
>> be 0..N-1 and 0..M-1, respectively.
>>
>> But, if I have a pointer T *p pointing at arr[0][0], and I increment p
>> N*M times, and access arr along the way, would that invoke UB? I mean,
>> arr is still one single object, right?
>>
>> What if I use p[i] with i=0..M*N-1 ?

>
> The trick you are suggesting is done every day of the week with
> various arrays. If you have an array a[N][M], C is guaranteed to
> store the M*N elements contiguously in row-major order.
>
> The link that the other responder posted cited various obscure
> optimizations that might break things. This is true, but it rarely
> happens in practice. The reason is that when people are messing with
> arrays using a single index (one dimensional) when the array is
> defined with multiple dimensions, what normally happens is that one
> calls a function that accepts a T * pointer. It is rare that someone
> defines an array a[N][M] and then uses a[N-1][M] or something like
> this in that form. It normally doesn't occur that way in the code.
>
> This code should be fine:
>
> #define M 33
> #define N 41
>
> long arr[N][M];
>
> void f(long *p)
> {
> unsigned i, j;
>
> for (i=0; i<N; i++)
> {
> for (j=0; j<M; j++)
> {
> p[i * M + j] = 0;
> }
> }
> }
>
> int main(void)
> {
> f(arr);

arr isn't a long*

> return(0);
> }
>
> This is done every day of the week.

People write code that fails to compile without a diagnostic
that's telling them they're doing something they shouldn't
every day of the week?

Quite probably true. Doesn't make it right.

Phil
--
I tried the Vista speech recognition by running the tutorial. I was
amazed, it was awesome, recognised every word I said. Then I said the
wrong word ... and it typed the right one. It was actually just
detecting a sound and printing the expected word! -- pbhj on /.

jameskuyper
Guest
Posts: n/a

 12-11-2008

Jujitsu Lizard wrote:
> "Phil Carmody" <(E-Mail Removed)> wrote in message
> news:(E-Mail Removed)...
> >
> >> This code should be fine:
> >>
> >> #define M 33
> >> #define N 41
> >>
> >> long arr[N][M];
> >>
> >> void f(long *p)
> >> {
> >> unsigned i, j;
> >>
> >> for (i=0; i<N; i++)
> >> {
> >> for (j=0; j<M; j++)
> >> {
> >> p[i * M + j] = 0;
> >> }
> >> }
> >> }
> >>
> >> int main(void)
> >> {
> >> f(arr);

> >
> > arr isn't a long*
> >
> >> return(0);
> >> }
> >>
> >> This is done every day of the week.

> >
> > People write code that fails to compile without a diagnostic
> > that's telling them they're doing something they shouldn't
> > every day of the week?
> >
> > Quite probably true. Doesn't make it right.

>
> Phil,
>
> In this particular case, I'd be grateful if you could be more specific. Are
> you saying it is wrong?

You're passing an argument of type long (*)[M] to a function where the
corresponding parameter is declared as 'long*'. This is not a
conversion that can be done implicitly. Since there's a prototype for
the function in scope, this is a constraint violation; if it was a non-
prototyped declaration it would be undefined behavior. Even if you
made the conversion explictly, the standard fails to specify where the
result of the conversion points.

You can avoid all of those problems by passing arr[0] rather than arr
to the function. Then the only problem that you have left is indexing
an array (arr[0]) beyond it's length (M). While a conforming
implementation of C is certainly permitted to compile such code in a
way that it formats your hard disk, this is a very popular bit of
undefined behavior, and as a result most compilers will handle it the
way you expect it to be handled.

Ben Bacarisse
Guest
Posts: n/a

 12-11-2008
"Jujitsu Lizard" <(E-Mail Removed)> writes:

> "Phil Carmody" <(E-Mail Removed)> wrote in message
> news:(E-Mail Removed)...
>>
>>> This code should be fine:
>>>
>>> #define M 33
>>> #define N 41
>>>
>>> long arr[N][M];
>>>
>>> void f(long *p)
>>> {
>>> unsigned i, j;
>>>
>>> for (i=0; i<N; i++)
>>> {
>>> for (j=0; j<M; j++)
>>> {
>>> p[i * M + j] = 0;
>>> }
>>> }
>>> }
>>>
>>> int main(void)
>>> {
>>> f(arr);

>>
>> arr isn't a long*
>>
>>> return(0);
>>> }
>>>
>>> This is done every day of the week.

>>
>> People write code that fails to compile without a diagnostic
>> that's telling them they're doing something they shouldn't
>> every day of the week?
>>
>> Quite probably true. Doesn't make it right.

>
> In this particular case, I'd be grateful if you could be more specific. Are
> you saying it is wrong?

"Wrong" is a tricky word in C. Your code requires a diagnostic since
it contains a constraint violation. This is, in one sense at least,
as wrong as you can get, but the code might well work anyway. To be
clear about it, arr converts to long (*)[33] not long * and there is
no implicit conversion between these two types.

There are lots of alternatives. You could pass &arr[0][0], arr[0] or
even (void *)arr. All of these expressions will convert to a pointer
of the right type, but there is a subtle question -- how much can be
added to the resulting pointer inside f whilst staying within the
permissions granted by 6.5.6 p8?

When a pointer points to an array, you are permitted to use + and - to
construct pointers that lie within (and "one past the end") of that
array. An argument can be made that passing (void *)arr is the safest
option despite it looking odd. If you pass arr[0] assignments to
p[33] and higher will be undefined as far as standard C is concerned
although most implementations will use more relaxed semantics.
Because of the definition of [], * and & there is no real difference
between passing &arr[0][0] and passing arr[0].

--
Ben.

Keith Thompson
Guest
Posts: n/a

 12-11-2008
"Jujitsu Lizard" <(E-Mail Removed)> writes:
> "Phil Carmody" <(E-Mail Removed)> wrote in message
> news:(E-Mail Removed)...
>>
>>> This code should be fine:
>>>
>>> #define M 33
>>> #define N 41
>>>
>>> long arr[N][M];
>>>
>>> void f(long *p)
>>> {
>>> unsigned i, j;
>>>
>>> for (i=0; i<N; i++)
>>> {
>>> for (j=0; j<M; j++)
>>> {
>>> p[i * M + j] = 0;
>>> }
>>> }
>>> }
>>>
>>> int main(void)
>>> {
>>> f(arr);

>>
>> arr isn't a long*
>>
>>> return(0);
>>> }
>>>
>>> This is done every day of the week.

>>
>> People write code that fails to compile without a diagnostic
>> that's telling them they're doing something they shouldn't
>> every day of the week?
>>
>> Quite probably true. Doesn't make it right.

>
> Phil,
>
> In this particular case, I'd be grateful if you could be more specific. Are
> you saying it is wrong?

I think he's saying it contains a constraint violation.

When I compile your code, I get the following diagnostic on
the call "f(arr);":

c.c: In function 'main':
c.c:21: warning: passing argument 1 of 'f' from incompatible pointer type

Your function f expects an argument of type long*. You pass it an
argument of type long(*)[N]. My compiler merely issued a warning, but
it would have been perfectly legal for it to reject the program.

You can avoid the constraint violation by changing the call to either
f(arr[0]);
or
f(&arr[0][0]);

But then you're invoking undefined behavior in f, though it's likely
to behave as you expect under most if not all implementations. Most
compilers will allow you to treat a two-dimensional array as a
one-dimensional array, but they're not required to do so. A compiler
that generates code that performs bounds checking, or whose
optimization phase is particularly aggressive, might result in
something other than the behavior you expect.

--
Keith Thompson (The_Other_Keith) http://www.velocityreviews.com/forums/(E-Mail Removed) <http://www.ghoti.net/~kst>
Nokia
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"

James Kuyper
Guest
Posts: n/a

 12-12-2008
Jujitsu Lizard wrote:
> "Keith Thompson" <(E-Mail Removed)> wrote in message
> news:(E-Mail Removed)...
>>
>> You can avoid the constraint violation by changing the call to either
>> f(arr[0]);
>> or
>> f(&arr[0][0]);

>
> The second form suggested would be my preferred one. Thanks for the
>
>> But then you're invoking undefined behavior in f, though it's likely
>> to behave as you expect under most if not all implementations. Most
>> compilers will allow you to treat a two-dimensional array as a
>> one-dimensional array, but they're not required to do so. A compiler
>> that generates code that performs bounds checking, or whose
>> optimization phase is particularly aggressive, might result in
>> something other than the behavior you expect.

....
> The "function boundary" is "long *". The function should work correctly
> with a pointer to a long. That implies that array operations on a
> pointer to a long should accept any unsigned integer and generate a
> correct memory reference.

You may desire such a feature; that's not how C works. Unsigned integers
and pointers are different types. While any integer can be converted to
a pointer, the conversion never occurs implicitly, it must always be
explicitly requested with a cast. Per 6.3.2.3p5, even when you do use a
cast to force the conversion, "Except as previously specified, the
result is implementation-defined, might not be correctly aligned, might
not point to an entity of the referenced type, and might be a trap
representation."

> The compiler has to honor the function boundary. The function accepts a
> pointer to long. It should treat it as if it could be ANY pointer to long.

Not quite. For instance, consider the following example:

long arr[3][5];

{
for(int row=0; row<3; row++)
for(int col=0; col<3; col++)
arr[row][col] += p[12];
}

Now, in the general case, an implementation is required to generate code
for functions like add_arr that works correctly even if p points at a
location within "arr". "Correctly" means, in this case, that if one of
+= operations changes the value referred to by p[12], then the next pass
through the loop should use the updated value.

However, in this particular case, there's no legal way to pass a value
'p' to add_arr such that p[12] points to an element of arr. That's
because the largest amount that you can add to any pointer that points
to any element of 'arr' is 5; add more than that and you're in a
different sub-array of arr, violating

As a result, an implementation is free to perform an optimization that
reads the value of p[12] just once, and keeps that value in a register
for the entire loop.

That's a rather contrived example, but its the best I could com up with
right now.

> The key issue in my mind, which might be crackpot, is whether the
> compiler is allowed to optimize WITHIN a function based on what it
> believes it knows about all function invocations.

If it believes it correctly, yes. If the only way you can create a
situation which doesn't match that belief is by writing code that has
undefined behavior, it is permitted to make such optimizations - the
undefined behavior will take the form of having those optimizations fail.

jameskuyper
Guest
Posts: n/a

 12-12-2008
Jujitsu Lizard wrote:
> "James Kuyper" <(E-Mail Removed)> wrote in message
> news:XQl0l.1196\$(E-Mail Removed)...
> > Jujitsu Lizard wrote:
> >> "Keith Thompson" <(E-Mail Removed)> wrote in message
> >> news:(E-Mail Removed)...
> >>>
> >>> You can avoid the constraint violation by changing the call to either
> >>> f(arr[0]);
> >>> or
> >>> f(&arr[0][0]);
> >>
> >> The second form suggested would be my preferred one. Thanks for the
> >>
> >>> But then you're invoking undefined behavior in f, though it's likely
> >>> to behave as you expect under most if not all implementations. Most
> >>> compilers will allow you to treat a two-dimensional array as a
> >>> one-dimensional array, but they're not required to do so. A compiler
> >>> that generates code that performs bounds checking, or whose
> >>> optimization phase is particularly aggressive, might result in
> >>> something other than the behavior you expect.

....
> What I was trying to say is that if you have
>
> long arr[3][5];
>
> and you call a function defined as
>
> void myfunc(long *q)
>
> and you call it as
>
> myfunc(&(arr[0][0]));
>
> within that function it does something like:
>
> q[14] = 0;
>
> that should work per the definition of C.

"The definition of C" is the C standard, which specifies that this
code has undefined behavior. Therefore, "work per the definition of C"
doesn't tell you much about what such a program may do. Any behavior
"works per the definition of C", including aborting your program.

> ... In this case the "function
> boundary" consists a pointer to a long. I don't believe the compiler may
> make assumptions about _which_ pointer to long may be passed or about how
> large the index on q[] might be. But I have to look this up.

q[14] is defined as being equivalent to *(q+14). When an object is
part of an array, the rules of pointer arithmetic (6.5.6p define
what the range of integer values is that can be added to a pointer to
that object. Those rules are based upon the size of the array and the
position of the object within that array. An object that is not a part
of an array is treated, for the purpose of these rules, as if it were
the only element of a 1-element array.

The key point is, those rules cannot be interpreted, in this context,
as referring to the array 'arr'. If such an interpretation were
possible, then the part of those rules which tells you where q+14
points to, would mean that it points at arr[14], which is nonsense.
The array that those rules refer to must have an element type that is
the same as the type pointed at by the pointer. The only array
containing arr[0][0] with that property is arr[0], not arr itself. The
element type of arr is char[5], not char. arr[0] has a length of 5,
not a length of 15.

Therefore, in order to add an integer value 'i' to &arr[0][0], what
those rules require is that 0<=i && i<=5. Otherwise, the behavior is
undefined. Those rules also specify that, while you can add 5 to &arr
[0][0], any attempt to dereference that pointer has undefined
behavior.

> >> The compiler has to honor the function boundary. The function accepts a
> >> pointer to long. It should treat it as if it could be ANY pointer to
> >> long.

> >
> > Not quite. For instance, consider the following example:
> >
> > long arr[3][5];
> >
> > int add_arr(int *p, int *q)

The parameter "q was added at an intermediate stage when I was
thinking about demonstrating this issue in a different fashion. I
should have removed it when I changed my mind. Please ignore it.

> > {
> > for(int row=0; row<3; row++)
> > for(int col=0; col<3; col++)
> > arr[row][col] += p[12];
> > }
> >
> > Now, in the general case, an implementation is required to generate code
> > for functions like add_arr that works correctly even if p points at a
> > location within "arr". "Correctly" means, in this case, that if one of +=
> > operations changes the value referred to by p[12], then the next pass
> > through the loop should use the updated value.
> >
> > However, in this particular case, there's no legal way to pass a value 'p'
> > to add_arr such that p[12] points to an element of arr. That's because the
> > largest amount that you can add to any pointer that points to any element
> > of 'arr' is 5; add more than that and you're in a different sub-array of
> > arr, violating

Sorry: that should have ended with the citation of 6.5.6p8.

> > As a result, an implementation is free to perform an optimization that
> > reads the value of p[12] just once, and keeps that value in a register for
> > the entire loop.
> >
> > That's a rather contrived example, but its the best I could com up with
> > right now.

....
> I do like your example. I understand it.
>
> If what you say is correct, I have a lot of reading to do.
>
> Personally, I wouldn't think twice about using the function above in the way
> you indicated is a violation. If you are correct, that means I'm dangerous
> and I need to retrain myself. I'm not being sarcastic there--I'm serious.
>
> My mental model of arrays in C is that they are defined to be in row-major
> order JUST SO THAT YOU CAN MONKEY WITH THEM IN UNINTENDED WAYS. My
> understanding of C is that every array is really one-dimensional and the
> possibility of multidimensional arrays is just for programmer convenience.

That last part, at least, is correct. In this case, 'arr' is a one-
dimensional array whose element type is char[5]. arr[0] is a one-
dimensional array whose element type is char. The standard refers to
arr as a 'multidimensional' array, but the way C handles such arrays
means that they are really just arrays of arrays. What difference does
that distinction make? In languages with true multidimensional arrays,
it's just as easy (syntactically, at least) to refer to a column of an
array as to a row. There's no C syntax for referring to a column of
arr.

Tim Rentsch
Guest
Posts: n/a

 12-22-2008
Ben Bacarisse <(E-Mail Removed)> writes:

> "Jujitsu Lizard" <(E-Mail Removed)> writes:
>
> > "Phil Carmody" <(E-Mail Removed)> wrote in message
> > news:(E-Mail Removed)...
> >>
> >>> This code should be fine:
> >>>
> >>> #define M 33
> >>> #define N 41
> >>>
> >>> long arr[N][M];
> >>>
> >>> void f(long *p)
> >>> {
> >>> unsigned i, j;
> >>>
> >>> for (i=0; i<N; i++)
> >>> {
> >>> for (j=0; j<M; j++)
> >>> {
> >>> p[i * M + j] = 0;
> >>> }
> >>> }
> >>> }
> >>>
> >>> int main(void)
> >>> {
> >>> f(arr);
> >>
> >> arr isn't a long*
> >>
> >>> return(0);
> >>> }
> >>>
> >>> This is done every day of the week.
> >>
> >> People write code that fails to compile without a diagnostic
> >> that's telling them they're doing something they shouldn't
> >> every day of the week?
> >>
> >> Quite probably true. Doesn't make it right.

> >
> > In this particular case, I'd be grateful if you could be more specific. Are
> > you saying it is wrong?

>
> "Wrong" is a tricky word in C. Your code requires a diagnostic since
> it contains a constraint violation. This is, in one sense at least,
> as wrong as you can get, but the code might well work anyway. To be
> clear about it, arr converts to long (*)[33] not long * and there is
> no implicit conversion between these two types.
>
> There are lots of alternatives. You could pass &arr[0][0], arr[0] or
> even (void *)arr. All of these expressions will convert to a pointer
> of the right type, but there is a subtle question -- how much can be
> added to the resulting pointer inside f whilst staying within the
> permissions granted by 6.5.6 p8?
>
> When a pointer points to an array, you are permitted to use + and - to
> construct pointers that lie within (and "one past the end") of that
> array. An argument can be made that passing (void *)arr is the safest
> option despite it looking odd. [...]

Wouldn't you agree that there's a (marginally) better argument that
the safest option is to use (void*) &arr instead? There seems no
question that this pointer may be used to access any location within
the confines of the object arr (or to point to the first location
past it).

 Posting Rules You may not post new threads You may not post replies You may not post attachments You may not edit your posts BB code is On Smilies are On [IMG] code is On HTML code is OffTrackbacks are On Pingbacks are On Refbacks are Off Forum Rules

 Similar Threads Thread Thread Starter Forum Replies Last Post Francesco C++ 2 11-06-2009 09:04 AM Vinodh C++ 2 03-03-2009 11:22 PM Philipp Java 21 01-20-2009 08:33 AM d[ - - ]b ASP .Net 2 05-18-2004 12:17 PM Jay Java 1 01-30-2004 04:27 PM