Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > C Programming > Re: Who owns the variable in my header file ?

Reply
Thread Tools

Re: Who owns the variable in my header file ?

 
 
James Kuyper
Guest
Posts: n/a
 
      10-08-2012
On 10/08/2012 06:13 AM, BartC wrote:
....
> OK, so it's just about workable. But I can see some issues:
>
> o It seems it needs to be all or nothing; *all* pointers in an
> implementation must be fat, including those you have no intention of doing
> any arithmetic on.


That's probably true, but if you don't intend to do any arithmetic on a
given pointer, and the compiler can figure out your intentions, it could
put the "fat" pointers on a diet, as an optimization.

> o It doesn't really address the issue of array bounds: it only stops a
> pointer from wandering outside an allocated block, but which contains
> multiple arrays or array rows. Not even in the static [5][4][3] array. Only
> in static 1D arrays are the bounds protected. So, under the scheme I
> outlined above, they can't be used for subscript checking as proposed by
> Richard Damon.


I don't see how you reached the conclusion that "... only in static 1D
arrays are the bounds protected.". Array bounds would apply to automatic
and allocated arrays, too. The clause in the standard which currently
specifies undefined behavior for violating the bounds of a array talks
only about a 1D array - it's applicability to multi-dimensional arrays
is derived solely from the fact that what C calls a multi-dimensional
array is just a 1D array whose element type is also a 1D array,
recursively all the way down to the point where the element type is no
longer an array. Therefore, protection of 1D arrays is sufficient to
protect multi-dimensional arrays.

The comment that started this sub-thread merely suggests that violation
of array bounds should trap, which is no better than the undefined
behavior allowed by the current standard. I've suggested that this
should be changing, for instance to the raising of a standard-defined
new signal.

Can you provide an example of code which, according to the current
standard, contains a array bounds violation, where a fat-pointer
implementation would not enable the raising of such a signal?

Here's an example of a fully dynamic 2-d array, where fat pointers would
have no trouble raising such a signal when the array bounds are violated:

int **ragarray = malloc(rows*sizeof *ragarray);
size_t *cols = malloc(rows* sizeof *cols);
if(ragarray && cols)
{
// TBD: Fill in cols

// The bounds for ragarray are ragarray, and
// ragarray+rows*sizeof *ragarray
int row;

for(row=0; row<rows; row++)
{
ragarray[row] = malloc(cols[row] * sizeof *ragarray[row]);
if(ragarray[row] == NULL)
{
// error handling TBD
}
else
{
// The bounds for ragarray[row] are ragarray[row],
// ragarray[row] + cols[row] * sizeof *ragarray[row]
}
}
if(no errors)
{
// Each of the following expressions would violate an array
// bound and therefore result in raising the specified
// signal.
ragarray-1; // Violates lower bound on ragarray
ragarray+(rows+1); // Violates upper bound on ragarray
ragarray[rows] = NULL; // Violates upper bound on ragarray
row = rows/2;
ragarray[row]-1; // Violates lower bound on ragarray[row]

ragarray[row]+cols[row]+1;
//violates upper bound on ragarray[row]

ragarray[row][cols[row]] = 5;
// violates upper bound on ragarray[row]
}
}
// Free all of the allocated memory.

> o In fact they don't seem designed for programmer use at all, only for
> internal protection, which means:
>
> o Can't be used to extract the length of an array (pointer) passed to a
> function
> o Can't be used to construct a slice or sub-range of a larger array or
> block


Fat pointers would enable such features; they just weren't part of the
suggestion.

> o Doubtless there are miscellaneous language issues to be sorted (converting
> to and from an int for example; bounds info will be lost)


Not necessarily - it just requires a larger int size. Of course, the
larger int size required might not be supported, but the same is true of
the smaller int size required by skinny pointers - that's why
*[u]intptr_t and [U]INTPTR_* are optional.
--
James Kuyper
 
Reply With Quote
 
 
 
 
BartC
Guest
Posts: n/a
 
      10-08-2012
"James Kuyper" <(E-Mail Removed)> wrote in message
news:k4ue3s$3te$(E-Mail Removed)...
> On 10/08/2012 06:13 AM, BartC wrote:


>> o It doesn't really address the issue of array bounds: it only stops a
>> pointer from wandering outside an allocated block, but which contains
>> multiple arrays or array rows. Not even in the static [5][4][3] array.
>> Only
>> in static 1D arrays are the bounds protected.


> I don't see how you reached the conclusion that "... only in static 1D
> arrays are the bounds protected.". Array bounds would apply to automatic
> and allocated arrays, too.


OK: my comment wasn't quite right: protection also extends to 1D allocated
arrays which *exclusively* inhabit an allocated block. Such as in
(presumably) your example code, and in my earlier post. (And by "static" I
simply meant not dynamic.)

This applies also to multi-dimensional allocated arrays built on top of such
1D arrays. So protection is better than I expected.

But what I had in mind were:

o Static (ie non-dynamic) N-dimensional arrays (unless the compiler adjusts
pointers so that they are constrained solely within the enclosing row)

o N-dimensional arrays allocated as one solid block (so a pointer could
cross from one row to another)

o Allocated blocks with mixed use (structs containing arrays, or containing
more than one array, or the array contains other data before and/or after,
or an array has been over-allocated but currently has smaller bounds, etc
etc)

(But what I also had in mind was being able to extract array dimensions from
the pointer, which is subject to the problems with fat pointers that are
concerned only lower and upper limits of some allocation.)

> The comment that started this sub-thread merely suggests that violation
> of array bounds should trap, which is no better than the undefined
> behavior allowed by the current standard. I've suggested that this
> should be changing, for instance to the raising of a standard-defined
> new signal.
>
> Can you provide an example of code which, according to the current
> standard, contains a array bounds violation, where a fat-pointer
> implementation would not enable the raising of such a signal?


typedef struct {
int a[4];
int x,y,z;
} S;

S *p = malloc(sizeof(S));

p->a[4]=8;

--
Bartc

 
Reply With Quote
 
 
 
 
James Kuyper
Guest
Posts: n/a
 
      10-08-2012
On 10/08/2012 08:13 AM, BartC wrote:
> "James Kuyper" <(E-Mail Removed)> wrote in message
> news:k4ue3s$3te$(E-Mail Removed)...
>> On 10/08/2012 06:13 AM, BartC wrote:

....
>>> Only
>>> in static 1D arrays are the bounds protected.

>
>> I don't see how you reached the conclusion that "... only in static 1D
>> arrays are the bounds protected.". Array bounds would apply to automatic
>> and allocated arrays, too.

>
> OK: my comment wasn't quite right: protection also extends to 1D allocated
> arrays which *exclusively* inhabit an allocated block. Such as in
> (presumably) your example code, and in my earlier post. (And by "static" I
> simply meant not dynamic.)


The standard defines four meanings for 'static'. I recommend reducing
the potential for confusion by using the word 'static' only when one of
those meanings applies. The standard doesn't define dynamic, and uses
the term primarily in section F, when referring to IEC 60559 dynamic
rounding, which doesn't seem relevant here.

> This applies also to multi-dimensional allocated arrays built on top of such
> 1D arrays. So protection is better than I expected.
>
> But what I had in mind were:
>
> o Static (ie non-dynamic) N-dimensional arrays (unless the compiler adjusts
> pointers so that they are constrained solely within the enclosing row)


Could you please convert your use of "static" or "non-dynamic" into
standard C terminology? I can't come up with a plausible meaning that
makes your statements here both true and relevant.

> o N-dimensional arrays allocated as one solid block (so a pointer could
> cross from one row to another)


Since moving such a pointer from one row to another currently has well
defined behavior, that doesn't constitute an array bounds violation as
far as the standard is concerned. I did propose a syntax, using a cast,
that could be defined as imposing an arbitrary array length on a
pointer, that was smaller than the actual array length. The use of that
construct to give the resulting pointer a longer length should, itself,
cause the specified signal to be raise()d, since it would constitute
disabling a safety.

> o Allocated blocks with mixed use (structs containing arrays, or containing
> more than one array, or the array contains other data before and/or after,
> or an array has been over-allocated but currently has smaller bounds, etc
> etc)


Those don't present any problems for the use of fat pointers to prevent
array bounds violations.

> (But what I also had in mind was being able to extract array dimensions from
> the pointer, which is subject to the problems with fat pointers that are
> concerned only lower and upper limits of some allocation.)


Well, yes. You need a pointer to an entire array to get the bounds on
that array. You could propose even fatter pointers, ones that contain a
pointer to an array-description-block of some kind.

>> The comment that started this sub-thread merely suggests that violation
>> of array bounds should trap, which is no better than the undefined
>> behavior allowed by the current standard. I've suggested that this
>> should be changed, for instance to the raising of a standard-defined
>> new signal.
>>
>> Can you provide an example of code which, according to the current
>> standard, contains a array bounds violation, where a fat-pointer
>> implementation would not enable the raising of such a signal?

>
> typedef struct {
> int a[4];
> int x,y,z;
> } S;
>
> S *p = malloc(sizeof(S));
>
> p->a[4]=8;


p->a is an lvalue of array type; in this context, as in most others, it
is automatically converted into a pointer to the first element of that
array. At the time when that conversion happens, the resulting fat
pointer would be given bounds of p->a and p->a+4. Except when preceded
by &, the expression p->a[4] violates that upper bound, and would
therefore cause the appropriate signal to be raised.

Since the subscript is a constant, and the declaration of S is
necessarily in scope where this expression is used, the compiler can
simply optimize "p->a[4]=8" at compile time into a direct call to
raise(). Only if the subscript were a variable would the compiler
actually need to construct a fat pointer. However, that optimization is
simply a side issue.

Why did you think that example would be problematic? Did you expect p->a
to inherit the same array bounds as p itself? That wouldn't be right;
and it's trivial to get it right.
--
James Kuyper
 
Reply With Quote
 
BartC
Guest
Posts: n/a
 
      10-08-2012
"James Kuyper" <(E-Mail Removed)> wrote in message
news:k4umo9$n8s$(E-Mail Removed)...
> On 10/08/2012 08:13 AM, BartC wrote:
>> "James Kuyper" <(E-Mail Removed)> wrote in message


>>> Can you provide an example of code which, according to the current
>>> standard, contains a array bounds violation, where a fat-pointer
>>> implementation would not enable the raising of such a signal?

>>
>> typedef struct {
>> int a[4];
>> int x,y,z;
>> } S;
>>
>> S *p = malloc(sizeof(S));
>>
>> p->a[4]=8;


> Why did you think that example would be problematic? Did you expect p->a
> to inherit the same array bounds as p itself? That wouldn't be right;
> and it's trivial to get it right.


Well, yes. I got the impression a fat pointer described the extents of an
allocated block and that was it. Now maybe the compiler can create a
modified pointer when it knows it's going to point to an array and when it
knows the bounds (where they stop short of the size of the block).

But this was a bad example on my part because it uses a fixed array size; it
can check bounds without using any fat pointers!

I can probably create a more contrived example, but I concede that fat
pointers, might be workable in that they can trap the majority of array
indexing errors, when arrays are created and used sensibly (which probably
isn't the case with a lot of my code...).

It also sounds like quite a bit of extra work behind the scenes might be
needed, beyond just creating the fat pointer to start with, and doing a
per-access check of the bounds

--
Bartc

 
Reply With Quote
 
James Kuyper
Guest
Posts: n/a
 
      10-08-2012
On 10/08/2012 11:20 AM, BartC wrote:
> "James Kuyper" <(E-Mail Removed)> wrote in message
> news:k4umo9$n8s$(E-Mail Removed)...
>> On 10/08/2012 08:13 AM, BartC wrote:
>>> "James Kuyper" <(E-Mail Removed)> wrote in message

>
>>>> Can you provide an example of code which, according to the current
>>>> standard, contains a array bounds violation, where a fat-pointer
>>>> implementation would not enable the raising of such a signal?
>>>
>>> typedef struct {
>>> int a[4];
>>> int x,y,z;
>>> } S;
>>>
>>> S *p = malloc(sizeof(S));
>>>
>>> p->a[4]=8;

>
>> Why did you think that example would be problematic? Did you expect p->a
>> to inherit the same array bounds as p itself? That wouldn't be right;
>> and it's trivial to get it right.

>
> Well, yes. I got the impression a fat pointer described the extents of an
> allocated block and that was it.


No - the whole point of the fat pointer was to enforce array bounds
violations, so what it describes is limits on access to the array, not
the limits on the allocated block.

> I can probably create a more contrived example, but I concede that fat
> pointers, might be workable in that they can trap the majority of array
> indexing errors, when arrays are created and used sensibly (which probably
> isn't the case with a lot of my code...).
>
> It also sounds like quite a bit of extra work behind the scenes might be
> needed, beyond just creating the fat pointer to start with, and doing a
> per-access check of the bounds


Oh yes - the message which first mentioned fat pointers also mentioned
the significant amount of overhead costs associated with such pointers,
as the reason why the C standard does NOT require that the behavior be
well-defined.

 
Reply With Quote
 
Keith Thompson
Guest
Posts: n/a
 
      10-08-2012
"BartC" <(E-Mail Removed)> writes:
[...]
> "James Kuyper" <(E-Mail Removed)> wrote in message
> news:k4t503$s2b$(E-Mail Removed)...
>
>> The fat pointer would have to contain three pieces of information: the
>> location it currently points at, the lowest location in memory that can
>> be reached by pointer subtraction with defined behavior, and the highest
>> location in memory that can reached by pointer addition with defined
>> behavior.

>
> might look like this:
>
> int *p=malloc(100*sizeof(int));
>
> p might contain (A, A, A+400), since malloc() knows nothing about what kind
> of objects are to be stored in the block. (A is the address of some heap
> memory, and A+400 is the highest value p can have, but cannot be
> dereferenced). (All offsets in fat pointers are char offsets!)


That doesn't have to be the case. There are a number of ways the
offset information in a fat pointer could be stored. It could be
stored as two additional pointers, to the beginning and (one past)
the end of the relevant array object, or as byte offsets (which
might save space if the offset can be stored more compactly as a
pointer), or as *object* offsets, dependent on the pointed-to type.

In a previous life, I worked on an Ada compiler that had to deal with
this, since Ada requires array bounds checking. The implementation
I worked on didn't use fat pointers; instead, the bounds information
was associated with the array *type*. For an array whose bounds
are compile-time constants, the bounds were not stored at run time;
the compiler would just generate code to check against the constant
values. If the bounds were run-time values, bound checking code
would refer to implicitly created variables associated with the
array type. (Unlike C, Ada's array indexing is not defined in terms
of pointer arithmetic; I don't know whether this scheme would work
for a hypothetical C implementation.)

One interesting thing: A lot of bound checks could be eliminated
during optimization. For example, if you wrote the Ada equivalent
of this C code:

for (int i = 0; i < some_value; i ++) {
arr[i] = 42;
}

the bounds check could be done just once at the top of the loop.

It might be interesting to design a C-like language with *mandatory*
full bounds checking. I wonder how much of C's current semantic would
have to be sacrificed. (I don't intend to design such a language, and
certainly not in this newsgroup.)

[...]

--
Keith Thompson (The_Other_Keith) http://www.velocityreviews.com/forums/(E-Mail Removed) <http://www.ghoti.net/~kst>
Will write code for food.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
 
Reply With Quote
 
BartC
Guest
Posts: n/a
 
      10-08-2012
"Keith Thompson" <(E-Mail Removed)> wrote in message
news:(E-Mail Removed)...
> "BartC" <(E-Mail Removed)> writes:


> In a previous life, I worked on an Ada compiler that had to deal with
> this, since Ada requires array bounds checking. The implementation
> I worked on didn't use fat pointers; instead, the bounds information
> was associated with the array *type*.


I don't know Ada but I'd imagine that there must be an array type whose
dimension is different for each instance of the type (a 'flex' bound). In
that case bounds checking can be nearly as easy as it is for fixed bounds!

>For an array whose bounds
> are compile-time constants, the bounds were not stored at run time;
> the compiler would just generate code to check against the constant
> values. If the bounds were run-time values, bound checking code
> would refer to implicitly created variables associated with the
> array type.


Yes, the bounds must be stored in or near the array. Exactly how, is the
compiler's problem.

> It might be interesting to design a C-like language with *mandatory*
> full bounds checking. I wonder how much of C's current semantic would
> have to be sacrificed. (I don't intend to design such a language, and
> certainly not in this newsgroup.)


(I don't blame you! I'm currently working on two languages, one used to
implement the other. The higher level one has bounds checking and everything
else thrown in.

The lower-level, C-like one, doesn't have bounds checking. But what is more
interesting than bounds checking, is also having bounds of arrays and
strings available to the programmer, for passing implicitly between
functions, and generally making life easier. I had an idea for a scheme that
is simpler than the fat pointers that have been discussed, and more
efficient. If I ever get round to making it work I'll post some details as
it could be applied to C too.)

--
Bartc

 
Reply With Quote
 
Keith Thompson
Guest
Posts: n/a
 
      10-08-2012
"BartC" <(E-Mail Removed)> writes:
> "Keith Thompson" <(E-Mail Removed)> wrote in message
> news:(E-Mail Removed)...
>> "BartC" <(E-Mail Removed)> writes:

>
>> In a previous life, I worked on an Ada compiler that had to deal with
>> this, since Ada requires array bounds checking. The implementation
>> I worked on didn't use fat pointers; instead, the bounds information
>> was associated with the array *type*.

>
> I don't know Ada but I'd imagine that there must be an array type whose
> dimension is different for each instance of the type (a 'flex' bound). In
> that case bounds checking can be nearly as easy as it is for fixed bounds!


Now that I think about it, I was imprecise. Bounds are associated with
an array *subtype*. For example, String is an array type with
unspecified bounds:

type String is array(Positive range<>) of Character;

A given String variable has specific bounds, which may be computed at
compile time or run time:

S: String(1..N);

which is equivalent to:

subtype anon is String(1..N);
S: anon;

There's some compatibity between distinct subtypes of the same type;
"S1 := S2;" is legal, and raises an exception if the two subtypes
have incompatible bounds.

It requires the compiler to do a lot of work behind the scenes.
Ada does *not* tend to follow the "Spirit of C" (though you can write
similarly low-level code in either language).

>>For an array whose bounds
>> are compile-time constants, the bounds were not stored at run time;
>> the compiler would just generate code to check against the constant
>> values. If the bounds were run-time values, bound checking code
>> would refer to implicitly created variables associated with the
>> array type.

>
> Yes, the bounds must be stored in or near the array. Exactly how, is the
> compiler's problem.


I'm not sure what you mean by "in or near". The bounds have to be
*available*; they can be either stored or computed (the language
doesn't care as long as S'First, S'Last, and S'Length yield the
correct results). If they're stored, they don't have to be anywhere
near the memory location containing the array object.

And I think this is approaching the limits of any hypothetical relevance
to C.

--
Keith Thompson (The_Other_Keith) (E-Mail Removed) <http://www.ghoti.net/~kst>
Will write code for food.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
 
Reply With Quote
 
Tim Rentsch
Guest
Posts: n/a
 
      12-17-2012
Richard Damon <(E-Mail Removed)> writes:

> On 10/7/12 6:40 AM, BartC wrote:
>>
>>
>> "Richard Damon" <(E-Mail Removed)> wrote in message
>> news:k4qm0b$jr0$(E-Mail Removed)...
>>> On 10/6/12 5:30 AM, Nick Keighley wrote:
>>>
>>>> As someone remarked this business with "undefined behaviour" is true
>>>> of pretty much all programming languages (I'm not convinced Godel has
>>>> anything to contribute to this). To some extent C stresses it more,
>>>> this is partly because C runs nearly everywhere and has huge numbers
>>>> of implementations.

>>
>>> If we removed pointers into arrays (and passing
>>> arrays with unspecified bounds), then the compiler could easily add code
>>> to check the subscripts to the array and trap on error conditions. If we
>>> want to support pointers into arrays, then these pointers could also be
>>> made "fatter" to include the bounds of the object they point to (and for
>>> multidimensional arrays, the bounds for each of the larger arrays the
>>> array is part of).

>>
>> Arrays can have any numbers of dimensions, so would be highly impractical
>> for any of a thousand possible pointers into an array for each to duplicate
>> it's half-dozen or dozen dimensions. You would likely also need
>> different pointers for each of the sub-dimensions.
>>
>> And for an array whose dimensions are not realised until runtime, or for
>> 'ragged' arrays where the bounds vary through the array, how would
>> such a pointer be initialised? Other languages would tend to build the
>> bounds into the arrays themselves.
>>
>> In any case, C allows pointers into all sorts of objects, including
>> non-arrays, or a single element of that multi-dimensional array, or to
>> cast one type of pointer into another; you wouldn't then be able to step
>> or do arithmetic on such a pointer, without by-passing the bounds checking.
>>
>> So 'undefined behaviour', if it's as simple as having the wrong value in a
>> pointer, is built-in to the language!
>>
>> (For single-dimensional arrays, a 'fat' pointer containing exactly one
>> bound, could work, provided they are a new explicit type in addition to
>> regular pointers. Then an array allocator could return such a pointer,
>> which
>> can be passed to functions and would carry it's length for use by programs,
>> and could optionally be used for bounds checking by internal code. But for
>> multi-dimensions, it gets complicated...)
>>
>>> This add significant overhead to the pointer and the
>>> operations.

>>
>> Not if the alternative is to have to always pass the length of the array
>> together with a pointer to the array. Having bounds-checking code inserted
>> would be an extra overhead, but that can be optional.
>>

>
> If we have an array: int foo[5][6];
> there are 3 types that might be used with this.
>
> int (*p1)[5][6] = &foo;
> which is a pointer to the full array. Such a pointer could also be used
> to point into int bar[4][5][6];


Not wrong, but an assignment like p = &bar[2]; allows access to
all elements of bar.

> There is also int (*p2)[6] = &foo[n]; which points to one row of the
> array, and


Similarly, the pointer variable p2 allows access to all of foo.

> int *p3 = &foo[n][m]; which points to an element of foo;


Again, not wrong, but p3 allows access to all of foo[n].

> Note that for p1 set equal to &foo p1++, p1--, p1+1, or p1[1] are all
> invalid operations, as p1 doesn't point into an array, but a single
> object (that happens to be an array). [snip remainder]


And so is treated as an array of length one for pointer arithmetic,
which does leave p1-- and p1[1] as undefined behavior, but p1+1
or p1++ are both well-defined.
 
Reply With Quote
 
Shao Miller
Guest
Posts: n/a
 
      12-17-2012
On 12/17/2012 14:06, Tim Rentsch wrote:
> Richard Damon <(E-Mail Removed)> writes:
>>
>> If we have an array: int foo[5][6];
>> there are 3 types that might be used with this.
>>
>> int (*p1)[5][6] = &foo;
>> which is a pointer to the full array. Such a pointer could also be used
>> to point into int bar[4][5][6];

>
> Not wrong, but an assignment like p = &bar[2]; allows access to
> all elements of bar.
>


(Presumably that was 'p1' in that assignment.) Let's see... It should
be the same as:

p1 = &(*(bar + 2));

and so also:

p1 = bar + 2;

So because the pointer originating from the 'bar' expression has pointee
range 'bar[0]' through 'bar[sizeof bar / sizeof *bar]', then all
elements are accessible, with the last "pointee" actually being the "one
past" position.

The pointer arithmetic doesn't change the bounds, but additional levels
of indirection could.

- Shao Miller
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Re: Who owns the variable in my header file ? Edward A. Falk C Programming 5 10-11-2012 08:30 PM
Re: Who owns the variable in my header file ? James Kuyper C Programming 0 10-04-2012 12:43 PM
Re: Who owns the variable in my header file ? Ike Naar C Programming 0 10-03-2012 07:52 PM
Re: Who owns the variable in my header file ? Kaz Kylheku C Programming 0 10-03-2012 07:40 PM



Advertisments