Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > C Programming > Changing the meaning of the array indexing brackets "[ ]" to do bounds checking and then ...

Reply
Thread Tools

Changing the meaning of the array indexing brackets "[ ]" to do bounds checking and then ...

 
 
Casey Hawthorne
Guest
Posts: n/a
 
      11-02-2009
Changing the meaning of the array indexing brackets "[ ]" to do bounds
checking and then using another notation for array indexing without
bounds checking, say "[| |]", might help minimize buffer overruns.

By default, previous programs would have bounds checking turned on,
unsless explicilty turned off.
--
Regards,
Casey
 
Reply With Quote
 
 
 
 
robertwessel2@yahoo.com
Guest
Posts: n/a
 
      11-02-2009
On Nov 2, 2:21*pm, Casey Hawthorne <(E-Mail Removed)> wrote:
> Changing the meaning of the array indexing brackets "[ ]" to do bounds
> checking and then using another notation for array indexing without
> bounds checking, say "[| |]", might help minimize buffer overruns.
>
> By default, previous programs would have bounds checking turned on,
> unsless explicilty turned off.



There's nothing to prevent someone from doing that now, nor is it a
new idea. It might break some programs which were relying on
undefined behavior, but I don't think too many people would find that
too objectionable (and assuming there was a way to turn off the
function, presumably with a compiler switch).

There are two problems. The minor one is overhead. Even if you know
the object being accessed, bounds checking is not free, and that will
cause issues in many places where C is a good choice of languages. To
some extent clever compilers can minimize the overhead, but they
cannot eliminate it altogether.

The major problem is that at the point where the array access happens,
whether that be the brackets or pointer arithmetic with deferencing,
the meta information about the "array," or whatever other object is
being accessed, is not really available to the C compiler. So you
don't really know the bounds of the array (or whatever) you're
accessing at that point. The only real solution is a fat pointer of
some sort that contains information about the object it's pointing to,
but that has very considerable overhead in both size and space. And
again, while compilers can mitigate that somewhat, there are limits,
especially considering the space overhead.
 
Reply With Quote
 
 
 
 
jacob navia
Guest
Posts: n/a
 
      11-02-2009
Casey Hawthorne a écrit :
> Changing the meaning of the array indexing brackets "[ ]" to do bounds
> checking and then using another notation for array indexing without
> bounds checking, say "[| |]", might help minimize buffer overruns.
>
> By default, previous programs would have bounds checking turned on,
> unsless explicilty turned off.
> --
> Regards,
> Casey


The lcc-win C compiler implements this. It is called "operator overloading".

You can define your own [ ] operators, and the compiler will call your function
that can do bounds checking in the array.

You define a structure with size information, and a pointer to the data stored
(and maybe other data). For this structure you redefine the operator [ ] and
you check the array bounds.

When you do not want to check anymore you do NOT define any operator overloading
and you replace the array structure by a normal array. The code that uses the
array doesn't need to be modified.

I have been arguing for this extension to C since 2002.

jacob
 
Reply With Quote
 
Keith Thompson
Guest
Posts: n/a
 
      11-02-2009
jacob navia <(E-Mail Removed)> writes:
> Casey Hawthorne a écrit :
>> Changing the meaning of the array indexing brackets "[ ]" to do bounds
>> checking and then using another notation for array indexing without
>> bounds checking, say "[| |]", might help minimize buffer overruns.
>>
>> By default, previous programs would have bounds checking turned on,
>> unsless explicilty turned off.

>
> The lcc-win C compiler implements this. It is called "operator
> overloading".
>
> You can define your own [ ] operators, and the compiler will call
> your function that can do bounds checking in the array.

[...]

Does lcc-win allow a [] operator to be defined for existing types that
already have a built-in [] operator? For example, can I do something
like this (I'm not sure of the syntax)?

int operator[](int *arr, size_t index)
{
return /* something */;
}

I suspect the answer is no.

I think the OP is suggesting changing the semantics of the existing
built-in [] operator, so that existing code gains bounds checking
with no modifications to the source.

Of course implementations are already allowed to do this, even
without operator overloading. All cases in which a bounds check
would fail invoke undefined behavior anyway, so the implementation
is free to generate code that detects the error.

The overhead would be substantial, since you'd need fat pointers to
retain bounds information for function parameters. And it would
break some existing code whose behavior is, strictly speaking,
undefined, but that works in practice. The most obvious examples
are the "struct hack" (which could be modified to use the C99 feature
that replaces it, but the point is to avoid changing the source), and
code that treats a multidimensional array as a one-dimensional array.

Code that avoids these odd corners of the language could certainly
benefit from a bounds checking implementation. And in fact I believe
such implementations exist, though I don't know if they go so far as
to implement fat pointers to enable checking on pointer parameters.
(At the very least, *testing* code under such an implementation could
be instructive.)

--
Keith Thompson (The_Other_Keith) http://www.velocityreviews.com/forums/(E-Mail Removed) <http://www.ghoti.net/~kst>
Nokia
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
 
Reply With Quote
 
bartc
Guest
Posts: n/a
 
      11-03-2009

"Eric Sosman" <(E-Mail Removed)> wrote in message
news:1257204809.568894@news1nwk...
> (E-Mail Removed) wrote:
>> On Nov 2, 2:21 pm, Casey Hawthorne <(E-Mail Removed)> wrote:
>>> Changing the meaning of the array indexing brackets "[ ]" to do bounds
>>> checking and then using another notation for array indexing without
>>> bounds checking, say "[| |]", might help minimize buffer overruns.
>>>


> Structs and unions are splendid sources of complication here.
> For example,
>
> union u { int i; char c[10]; };
> union u *p = malloc(sizeof *p); /* assume success */
>
> Imagine a not uncommon platform where sizeof(int) == 4 and
> int requires 4-byte alignment; we'll probably wind up with a 12-
> byte union, so the final line would be equivalent to
>
> union u *p = malloc(12);

....
> But now it gets even more entertaining:
>
> int *ip = &p->i;
> char *cp = p->c;
>
> What bounds are carried with the fat pointers ip and cp? For

....
This must be the case even if the conversion is indirect, via
> umpty-seven layers of void* and separately compiled modules
> where union u has never been heard of:
>
> strcpy (p->c, "x");
> char *q = strchr(p->c, 'x');
> union u *p2 = (union u*)q;
>
> The pointer p2 must cover all twelve bytes, even though it is
> derived from a pointer that we'd prefer to have cover only ten.
>
> Seems to me that doing a thorough job might require not just
> a pointer-plus-bounds, but a pointer-with-history -- not just a
> fat pointer, but an obese pointer.


Your points all seem valid, but they remind me of when my boss wanted some
special feature or wanted something to work as he thought it should. I would
then proceed to explain a dozen ways why it didn't make sense and was
completely impossible.

But a day or so later I got the thing (hardware, software or both) working
just like he wanted! It wasn't quite so impossible after all.

The OP is only talking about arrays, and not pointers. But these are
inextricably linked so I guess we need 'proper' arrays, ie. distinct from
the sort of arrays/pointers currently in C. Then there wouldn't be all this
baggage to make the implementation of array bounds 'impossible'. That's
still a major language extension though.

--
Bartc


 
Reply With Quote
 
Stephen Sprunk
Guest
Posts: n/a
 
      11-03-2009
Keith Thompson wrote:
> The overhead would be substantial, since you'd need fat pointers to
> retain bounds information for function parameters. And it would
> break some existing code whose behavior is, strictly speaking,
> undefined, but that works in practice. The most obvious examples
> are the "struct hack" (which could be modified to use the C99 feature
> that replaces it, but the point is to avoid changing the source), and
> code that treats a multidimensional array as a one-dimensional array.


Why would it break the struct hack? The fat pointer would have the
bounds of the actual allocation, not the struct type, right?

Or are you proposing that converting a fat pointer from one type to
another changes the bounds? I can see some bugs that would catch, but
it would also break a lot of perfectly valid code, which IMHO is
unacceptable.

> Code that avoids these odd corners of the language could certainly
> benefit from a bounds checking implementation. And in fact I believe
> such implementations exist, though I don't know if they go so far as
> to implement fat pointers to enable checking on pointer parameters.


There was a GCC branch that did fat pointers and bounds checking, but it
didn't get very far since it changed the ABI, requiring a recompile of
every library on the system. Few people are willing to go through that.

Presumably they could have added another feature to disable bounds
checking on certain functions or arguments, but they apparently didn't
or that caused other problems...

> (At the very least, *testing* code under such an implementation could
> be instructive.)


It would be a good start, but presumably if you knew all the cases to
test, you could eliminate the bugs themselves and not need the bounds
checking in the first place. If you don't test every case, the bounds
violation will escape into the wild where there is no checking.

S

--
Stephen Sprunk "God does not play dice." --Albert Einstein
CCIE #3723 "God is an inveterate gambler, and He throws the
K5SSS dice at every possible opportunity." --Stephen Hawking
 
Reply With Quote
 
Casey Hawthorne
Guest
Posts: n/a
 
      11-03-2009
On Mon, 02 Nov 2009 15:18:21 -0800, Keith Thompson <(E-Mail Removed)>
wrote:

>jacob navia <(E-Mail Removed)> writes:
>> Casey Hawthorne a ?crit :
>>> Changing the meaning of the array indexing brackets "[ ]" to do bounds
>>> checking and then using another notation for array indexing without
>>> bounds checking, say "[| |]", might help minimize buffer overruns.
>>>
>>> By default, previous programs would have bounds checking turned on,
>>> unsless explicilty turned off.

>>
>> The lcc-win C compiler implements this. It is called "operator
>> overloading".
>>
>> You can define your own [ ] operators, and the compiler will call
>> your function that can do bounds checking in the array.

>[...]
>
>Does lcc-win allow a [] operator to be defined for existing types that
>already have a built-in [] operator? For example, can I do something
>like this (I'm not sure of the syntax)?
>
> int operator[](int *arr, size_t index)
> {
> return /* something */;
> }
>
>I suspect the answer is no.
>
>I think the OP is suggesting changing the semantics of the existing
>built-in [] operator, so that existing code gains bounds checking
>with no modifications to the source.
>
>Of course implementations are already allowed to do this, even
>without operator overloading. All cases in which a bounds check
>would fail invoke undefined behavior anyway, so the implementation
>is free to generate code that detects the error.
>
>The overhead would be substantial, since you'd need fat pointers to
>retain bounds information for function parameters. And it would
>break some existing code whose behavior is, strictly speaking,
>undefined, but that works in practice. The most obvious examples
>are the "struct hack" (which could be modified to use the C99 feature
>that replaces it, but the point is to avoid changing the source), and
>code that treats a multidimensional array as a one-dimensional array.
>
>Code that avoids these odd corners of the language could certainly
>benefit from a bounds checking implementation. And in fact I believe
>such implementations exist, though I don't know if they go so far as
>to implement fat pointers to enable checking on pointer parameters.
>(At the very least, *testing* code under such an implementation could
>be instructive.)



I thought of this question, of buffer overruns, after one of the
people interviewed for the book "Coders at Work" said that C was great
for systems programming by well trained programmers, but that C had
leaked out into the applications area.
For systems programming you do need the access to the machine that C
provides, but for applications programming, you don't need/shouldn't
have such access.

--
Regards,
Casey
 
Reply With Quote
 
Nick Keighley
Guest
Posts: n/a
 
      11-03-2009
On 2 Nov, 20:27, user923005 <(E-Mail Removed)> wrote:
> On Nov 2, 12:21*pm, Casey Hawthorne <(E-Mail Removed)>


> > Changing the meaning of the array indexing brackets "[ ]" to do bounds
> > checking and then using another notation for array indexing without
> > bounds checking, say "[| |]", might help minimize buffer overruns.


note: C++'s vector class did this the other way round

> > By default, previous programs would have bounds checking turned on,
> > unsless explicilty turned off.

>
> Won't work in C, as C does not have operator overloading.


what? He's (She's?) talking about making a change to the language.


> However, you could use this, which is easier and works with atomic
> types like int:http://duma.sourceforge.net/



--
"Resistance is futile. Read the C-faq."
-- James Hu (c.l.c.)
 
Reply With Quote
 
Nick Keighley
Guest
Posts: n/a
 
      11-03-2009
On 2 Nov, 21:31, "(E-Mail Removed)" <(E-Mail Removed)>
wrote:
> On Nov 2, 2:21*pm, Casey Hawthorne <(E-Mail Removed)> wrote:
>
> > Changing the meaning of the array indexing brackets "[ ]" to do bounds
> > checking and then using another notation for array indexing without
> > bounds checking, say "[| |]", might help minimize buffer overruns.

>
> > By default, previous programs would have bounds checking turned on,
> > unsless explicilty turned off.

>
> There's nothing to prevent someone from doing that now, nor is it a
> new idea. *It might break some programs which were relying on
> undefined behavior, but I don't think too many people would find that
> too objectionable


the struct hack

typedef struct
{
Byte type;
Byte data[1];
} Message;

void process_msg (byte raw [64], size_t size)
{
Message *msg;
msg = (Message*)raw;

if (msg->data[1] == LOCAL)
....
}

the struct hack may be deprecated but it does occur quite often. (I
prefer it to C99;s variable array)


> (and assuming there was a way to turn off the
> function, presumably with a compiler switch).
>
> There are two problems. *The minor one is overhead. *Even if you know
> the object being accessed, bounds checking is not free, and that will
> cause issues in many places where C is a good choice of languages. *To
> some extent clever compilers can minimize the overhead, but they
> cannot eliminate it altogether.
>
> The major problem is that at the point where the array access happens,
> whether that be the brackets or pointer arithmetic with deferencing,
> the meta information about the "array," or whatever other object is
> being accessed, is not really available to the C compiler. *So you
> don't really know the bounds of the array (or whatever) you're
> accessing at that point. *The only real solution is a fat pointer of
> some sort that contains information about the object it's pointing to,
> but that has very considerable overhead in both size and space. *And
> again, while compilers can mitigate that somewhat, there are limits,
> especially considering the space overhead.


Yeah, exactly, much of the time the compiler cannot easily determine
what the bounds are.

void refrobnicate (char*);

void snafu (size_t n)
{
char *fonz = malloc (n);

if (fonz == 0) handle_memory_error();

if (fonz[42] == 0) /* is there a bounds error here? */
{
char *yolonde = fonz + 42;
refrobnicate (yalonde);
}
}

void refrobnicate (char *yalonde)
{
if (yalonde [23] == 0) /* or here? */
more_stuff();
}

so you have to fat pointerise the code (pass some hidden info around
with the memory block).



 
Reply With Quote
 
Nick Keighley
Guest
Posts: n/a
 
      11-03-2009
On 3 Nov, 00:22, "bartc" <(E-Mail Removed)> wrote:
> "Eric Sosman" <(E-Mail Removed)> wrote in message
>
> news:1257204809.568894@news1nwk...
>
>
>
>
>
> > (E-Mail Removed) wrote:
> >> On Nov 2, 2:21 pm, Casey Hawthorne <(E-Mail Removed)> wrote:
> >>> Changing the meaning of the array indexing brackets "[ ]" to do bounds
> >>> checking and then using another notation for array indexing without
> >>> bounds checking, say "[| |]", might help minimize buffer overruns.

>
> > * * Structs and unions are splendid sources of complication here.
> > For example,

>
> > union u { int i; char c[10]; };
> > union u *p = malloc(sizeof *p); */* assume success */

>
> > * * Imagine a not uncommon platform where sizeof(int) == 4 and
> > int requires 4-byte alignment; we'll probably wind up with a 12-
> > byte union, so the final line would be equivalent to

>
> > union u *p = malloc(12);

> ...
> > * * But now it gets even more entertaining:

>
> > int *ip = &p->i;
> > char *cp = p->c;

>
> > What bounds are carried with the fat pointers ip and cp? *For

>
> ...
> *This must be the case even if the conversion is indirect, via
>
> > umpty-seven layers of void* and separately compiled modules
> > where union u has never been heard of:

>
> > strcpy (p->c, "x");
> > char *q = strchr(p->c, 'x');
> > union u *p2 = (union u*)q;

>
> > The pointer p2 must cover all twelve bytes, even though it is
> > derived from a pointer that we'd prefer to have cover only ten.

>
> > * * Seems to me that doing a thorough job might require not just
> > a pointer-plus-bounds, but a pointer-with-history -- not just a
> > fat pointer, but an obese pointer.

>
> Your points all seem valid, but they remind me of when my boss wanted some
> special feature or wanted something to work as he thought it should. I would
> then proceed to explain a dozen ways why it didn't make sense and was
> completely impossible.
>
> But a day or so later I got the thing (hardware, software or both) working
> just like he wanted! It wasn't quite so impossible after all.
>
> The OP is only talking about arrays, and not pointers.


can you really do that?

void process_ptr (char*);
void process_arr (char[]);

Do they take the same type? They do in C but do they in Hawthorne-C?



> But these are
> inextricably linked so I guess we need 'proper' arrays, ie. distinct from
> the sort of arrays/pointers currently in C.


I don't think you have C anymore...


> Then there wouldn't be all this
> baggage to make the implementation of array bounds 'impossible'. That's
> still a major language extension though.


looks a language redesign to me. Why not just abolish pointers and
call it Java?


 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Bounds checking and safety in C jacob navia C Programming 140 08-07-2007 04:20 PM
Indexing services under Windows XP SP2 - Can I disable MS Indexing Service to hasten Google's OR does Google Desktop uses this MS Indexing Service? ricardodefaria Computer Support 6 08-05-2007 04:14 AM
Help. SessionID is x then y then x then y BodiKlamph@gmail.com ASP General 0 09-03-2005 03:02 PM
Array bounds checking Chris Java 5 07-06-2005 07:23 AM
I thought that array bounds checking needed two comparisons; however, ... Casey Hawthorne Java 21 06-05-2004 08:04 PM



Advertisments