Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > C Programming > when is typecasting (unsigned char*) to (char*) dangerous?

Reply
Thread Tools

when is typecasting (unsigned char*) to (char*) dangerous?

 
 
Keith Thompson
Guest
Posts: n/a
 
      11-16-2011
Harald van Dijk <(E-Mail Removed)> writes:
> On Nov 16, 10:02*pm, James Kuyper <(E-Mail Removed)> wrote:
>> On 11/16/2011 03:41 PM, Harald van Dijk wrote:
>> > On Nov 16, 9:24 pm, James Kuyper <(E-Mail Removed)> wrote:

>> ...
>> >> I know of no reason why signed char (and therefore, char) cannot have
>> >> trap representations. However, every statement in 6.2.6.1p5 which says
>> >> that the behavior is undefined when a trap representation is involved,
>> >> explicitly excludes all character types, not just unsigned char. I'm not
>> >> quite sure what to make of that fact, but I'm sure that explicitly
>> >> excluding all character types was intentional; I'm not so sure whether
>> >> it was intentional to allow signed char to have trap representations.

>>
>> > 6.2.6.1p5 refers to the trap representations for the type of the
>> > object. In other words, if an object p of type void * holds a trap
>> > representation, 6.2.6.1p5 makes it explicit that reading that object
>> > as void * is not valid.

>>
>> So, in your opinion, what is the significance of the exclusion of
>> character types from those statements? What do those statements mean,
>> with those exclusions, that differs from what they would mean if those
>> exclusions were dropped? Please accompany your explanation with specific
>> examples of code that would have defined behavior under the existing
>> rules, but not with that modification, or vice-versa.

>
> If those exclusions were dropped, then using memcpy (or rather, a
> custom function written in standard C that behaves exactly like
> memcpy) to copy an object holding a trap representation would be
> invalid.
>
> /* the standard function memcpy, but implemented in 100% standard C */
> extern void *mymemcpy(void *dest, void *src, size_t n);
>
> struct S
> {
> int ptrIsValid;
> void *ptr;
> };
>
> {
> struct S s1, s2;
> s2.ptrIsValid = 0; /* ptr is left uninitialised */
> mymemcpy(&s1, &s2, sizeof(s1));
> }
>
> Without the exclusion in 6.2.6.1p5, if pointer types can have trap
> representations, mymemcpy would potentially use a character type to
> read a trap representation. This should be allowed, and by excluding
> character types in that paragraph, this is allowed.


memcpy() can just use unsigned char, which is guaranteed not to have
padding bits or trap representations.

--
Keith Thompson (The_Other_Keith) http://www.velocityreviews.com/forums/(E-Mail Removed) <http://www.ghoti.net/~kst>
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
 
Reply With Quote
 
 
 
 
Harald van Dijk
Guest
Posts: n/a
 
      11-17-2011
On Nov 17, 12:32*am, Keith Thompson <(E-Mail Removed)> wrote:
> Harald van Dijk <(E-Mail Removed)> writes:
> > On Nov 16, 10:02*pm, James Kuyper <(E-Mail Removed)> wrote:
> >> On 11/16/2011 03:41 PM, Harald van Dijk wrote:
> >> > On Nov 16, 9:24 pm, James Kuyper <(E-Mail Removed)> wrote:
> >> ...
> >> >> I know of no reason why signed char (and therefore, char) cannot have
> >> >> trap representations. However, every statement in 6.2.6.1p5 which says
> >> >> that the behavior is undefined when a trap representation is involved,
> >> >> explicitly excludes all character types, not just unsigned char. I'm not
> >> >> quite sure what to make of that fact, but I'm sure that explicitly
> >> >> excluding all character types was intentional; I'm not so sure whether
> >> >> it was intentional to allow signed char to have trap representations.

>
> >> > 6.2.6.1p5 refers to the trap representations for the type of the
> >> > object. In other words, if an object p of type void * holds a trap
> >> > representation, 6.2.6.1p5 makes it explicit that reading that object
> >> > as void * is not valid.

>
> >> So, in your opinion, what is the significance of the exclusion of
> >> character types from those statements? What do those statements mean,
> >> with those exclusions, that differs from what they would mean if those
> >> exclusions were dropped? Please accompany your explanation with specific
> >> examples of code that would have defined behavior under the existing
> >> rules, but not with that modification, or vice-versa.

>
> > If those exclusions were dropped, then using memcpy (or rather, a
> > custom function written in standard C that behaves exactly like
> > memcpy) to copy an object holding a trap representation would be
> > invalid.

>
> > /* the standard function memcpy, but implemented in 100% standard C */
> > extern void *mymemcpy(void *dest, void *src, size_t n);

>
> > struct S
> > {
> > * int ptrIsValid;
> > * void *ptr;
> > };

>
> > {
> > * struct S s1, s2;
> > * s2.ptrIsValid = 0; /* ptr is left uninitialised */
> > * mymemcpy(&s1, &s2, sizeof(s1));
> > }

>
> > Without the exclusion in 6.2.6.1p5, if pointer types can have trap
> > representations, mymemcpy would potentially use a character type to
> > read a trap representation. This should be allowed, and by excluding
> > character types in that paragraph, this is allowed.

>
> memcpy() can just use unsigned char, which is guaranteed not to have
> padding bits or trap representations.


Yes, but that was not the point I was trying to make. I'm reading
s2.ptr, which potentially holds a trap representation, and 6.2.6.1p5
disallows reading trap representations. It doesn't refer to
representations that do not represent a value in the type of the
lvalue expression, it refers to representations that do not represent
a value in the type of the object. The object has type void *, even if
it's accessed using unsigned char. That's why there needs to be a
specific exception for when the lvalue expression has character type.
 
Reply With Quote
 
 
 
 
Keith Thompson
Guest
Posts: n/a
 
      11-17-2011
Harald van Dijk <(E-Mail Removed)> writes:
[...]
> Yes, but that was not the point I was trying to make. I'm reading
> s2.ptr, which potentially holds a trap representation, and 6.2.6.1p5
> disallows reading trap representations. It doesn't refer to
> representations that do not represent a value in the type of the
> lvalue expression, it refers to representations that do not represent
> a value in the type of the object. The object has type void *, even if
> it's accessed using unsigned char. That's why there needs to be a
> specific exception for when the lvalue expression has character type.


That's an interesting interpretation, but I'm still not quite convinced.

Here's the paragraph:

Certain object representations need not represent a value of
the object type. If the stored value of an object has such a
representation and is read by an lvalue expression that does
not have character type, the behavior is undefined. If such
a representation is produced by a side effect that modifies
all or any part of the object by an lvalue expression that
does not have character type, the behavior is undefined.
Such a representation is called a *trap representation*.

I still think that it refers to (or *should* refer to) a trap
representation for the type of the lvalue.

Consider:

void *ptr = malloc(sizeof (int));
assert(ptr != NULL);
/* code to set the bytes pointed to by ptr to a
trap representation for type int */
int n = *(int*)ptr;

The object created by the malloc() call isn't an object of type int;
it's just raw storage. If 6.2.6.1p5 doesn't imply that accessing
it via an lvalue of type int has undefined behavior, then what does?

--
Keith Thompson (The_Other_Keith) (E-Mail Removed) <http://www.ghoti.net/~kst>
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
 
Reply With Quote
 
Harald van Dijk
Guest
Posts: n/a
 
      11-17-2011
On Nov 17, 4:22*am, Keith Thompson <(E-Mail Removed)> wrote:
> Here's the paragraph:
>
> * * Certain object representations need not represent a value of
> * * the object type. If the stored value of an object has such a
> * * representation and is read by an lvalue expression that does
> * * not have character type, the behavior is undefined. If such
> * * a representation is produced by a side effect that modifies
> * * all or any part of the object by an lvalue expression that
> * * does not have character type, the behavior is undefined.
> * * Such a representation is called a *trap representation*.
>
> I still think that it refers to (or *should* refer to) a trap
> representation for the type of the lvalue.
>
> Consider:
>
> * * void *ptr = malloc(sizeof (int));
> * * assert(ptr != NULL);
> * * /* code to set the bytes pointed to by ptr to a
> * * * *trap representation for type int */
> * * int n = *(int*)ptr;
>
> The object created by the malloc() call isn't an object of type int;
> it's just raw storage. *If 6.2.6.1p5 doesn't imply that accessing
> it via an lvalue of type int has undefined behavior, then what does?


If *ptr doesn't hold an int, then reading it as an int is a violation
of the aliasing rules.
 
Reply With Quote
 
Keith Thompson
Guest
Posts: n/a
 
      11-17-2011
Harald van Dijk <(E-Mail Removed)> writes:
> On Nov 17, 4:22*am, Keith Thompson <(E-Mail Removed)> wrote:
>> Here's the paragraph:
>>
>> * * Certain object representations need not represent a value of
>> * * the object type. If the stored value of an object has such a
>> * * representation and is read by an lvalue expression that does
>> * * not have character type, the behavior is undefined. If such
>> * * a representation is produced by a side effect that modifies
>> * * all or any part of the object by an lvalue expression that
>> * * does not have character type, the behavior is undefined.
>> * * Such a representation is called a *trap representation*.
>>
>> I still think that it refers to (or *should* refer to) a trap
>> representation for the type of the lvalue.
>>
>> Consider:
>>
>> * * void *ptr = malloc(sizeof (int));
>> * * assert(ptr != NULL);
>> * * /* code to set the bytes pointed to by ptr to a
>> * * * *trap representation for type int */
>> * * int n = *(int*)ptr;
>>
>> The object created by the malloc() call isn't an object of type int;
>> it's just raw storage. *If 6.2.6.1p5 doesn't imply that accessing
>> it via an lvalue of type int has undefined behavior, then what does?

>
> If *ptr doesn't hold an int, then reading it as an int is a violation
> of the aliasing rules.


6.5p7:

An object shall have its stored value accessed only by an lvalue
expression that has one of the following types:

-- a type compatible with the effective type of the object,
[...]

Going back to paragraph 6:

For all other accesses to an object having no declared type, the
effective type of the objec simply the type of the lvalue used for
the access.

So the effective type of the object is int.

--
Keith Thompson (The_Other_Keith) (E-Mail Removed) <http://www.ghoti.net/~kst>
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
 
Reply With Quote
 
Harald van Dijk
Guest
Posts: n/a
 
      11-17-2011
On Nov 17, 8:46*am, Keith Thompson <(E-Mail Removed)> wrote:
> Harald van Dijk <(E-Mail Removed)> writes:
> > On Nov 17, 4:22*am, Keith Thompson <(E-Mail Removed)> wrote:
> >> Here's the paragraph:

>
> >> * * Certain object representations need not represent a value of
> >> * * the object type. If the stored value of an object has such a
> >> * * representation and is read by an lvalue expression that does
> >> * * not have character type, the behavior is undefined. If such
> >> * * a representation is produced by a side effect that modifies
> >> * * all or any part of the object by an lvalue expression that
> >> * * does not have character type, the behavior is undefined.
> >> * * Such a representation is called a *trap representation*.

>
> >> I still think that it refers to (or *should* refer to) a trap
> >> representation for the type of the lvalue.

>
> >> Consider:

>
> >> * * void *ptr = malloc(sizeof (int));
> >> * * assert(ptr != NULL);
> >> * * /* code to set the bytes pointed to by ptr to a
> >> * * * *trap representation for type int */
> >> * * int n = *(int*)ptr;

>
> >> The object created by the malloc() call isn't an object of type int;
> >> it's just raw storage. *If 6.2.6.1p5 doesn't imply that accessing
> >> it via an lvalue of type int has undefined behavior, then what does?

>
> > If *ptr doesn't hold an int, then reading it as an int is a violation
> > of the aliasing rules.

>
> 6.5p7:
>
> * * An object shall have its stored value accessed only by an lvalue
> * * expression that has one of the following types:
>
> * * -- a type compatible with the effective type of the object,
> * * [...]
>
> Going back to paragraph 6:
>
> * * For all other accesses to an object having no declared type, the
> * * effective type of the objec simply the type of the lvalue used for
> * * the access.
>
> So the effective type of the object is int.


Depending on the omitted "code to set the bytes pointed to by ptr to a
trap representation for type int", there are three possibilities right
before n is initialised:
1) *ptr has no effective type
2) *ptr has an effective type that is int
3) *ptr has an effective type that is not int

You're right, 1) and 2) are effectively the same, the effective type
becomes int. In both of those cases, 6.2.6.1p5 says the behaviour is
undefined. For possibility 3), 6.5p7 says the behaviour is undefined.
 
Reply With Quote
 
Lauri Alanko
Guest
Posts: n/a
 
      11-17-2011
I actually stumbled into this problem just now.

I need to represent UTF-8 strings in C, but although the common wisdom
is that UTF-8 is nicely compatible with legacy C code, it seems that
this isn't strictly true: we cannot portably cast an (unsigned char*)
buffer containing a UTF-8-encoded, 0-terminated string to (char*) and
operate on it with standard C functions. A UTF-8 encoded buffer may
contain the byte value 0x80, which, when casted to char, might be a
trap representation on platforms where CHAR_MIN is -127.

This is awful shame, since there are byte values that never occur in
well-formed UTF-8: 0xc0, 0xc1 and 0xf5-0xff. If one of those had been
0x80, to my understanding there wouldn't have been a problem.

But if char might not be able to represent all the possible bytes of
UTF-8, how can C1X have UTF-8 encoded string literals? Maybe I'll
write a separate post about this to comp.std.c.


Lauri
 
Reply With Quote
 
Harald van Dijk
Guest
Posts: n/a
 
      11-17-2011
On Nov 17, 8:09*pm, Lauri Alanko <(E-Mail Removed)> wrote:
> I actually stumbled into this problem just now.
>
> I need to represent UTF-8 strings in C, but although the common wisdom
> is that UTF-8 is nicely compatible with legacy C code, it seems that
> this isn't strictly true: we cannot portably cast an (unsigned char*)
> buffer containing a UTF-8-encoded, 0-terminated string to (char*) and
> operate on it with standard C functions. A UTF-8 encoded buffer may
> contain the byte value 0x80, which, when casted to char, might be a
> trap representation on platforms where CHAR_MIN is -127.


The standard C functions in <string.h> treat their arguments as
unsigned char * (7.21.1p3 for the interested), so strlen() etc. work
even on the system you describe, of course assuming you expect a
string length in bytes.
 
Reply With Quote
 
Keith Thompson
Guest
Posts: n/a
 
      11-17-2011
Lauri Alanko <(E-Mail Removed)> writes:
> I actually stumbled into this problem just now.
>
> I need to represent UTF-8 strings in C, but although the common wisdom
> is that UTF-8 is nicely compatible with legacy C code, it seems that
> this isn't strictly true: we cannot portably cast an (unsigned char*)
> buffer containing a UTF-8-encoded, 0-terminated string to (char*) and
> operate on it with standard C functions. A UTF-8 encoded buffer may
> contain the byte value 0x80, which, when casted to char, might be a
> trap representation on platforms where CHAR_MIN is -127.
>
> This is awful shame, since there are byte values that never occur in
> well-formed UTF-8: 0xc0, 0xc1 and 0xf5-0xff. If one of those had been
> 0x80, to my understanding there wouldn't have been a problem.
>
> But if char might not be able to represent all the possible bytes of
> UTF-8, how can C1X have UTF-8 encoded string literals? Maybe I'll
> write a separate post about this to comp.std.c.


C does seem to be a bit inconsistent about whether unsigned chars can
safely be aliased as plain chars. Plain char *can* have trap
representations, and if it does, a lot of common C idioms can break.

Practically speaking, implementations don't do this.

I think you can be reasonably safe if you do something like:

#include <limits.h>
#include <assert.h>
...
assert(CHAR_BIT == 8 &&
((CHAR_MIN == -128 && CHAR_MAX == +127) ||
(CHAR_MIN == 0 && CHAR_MAX == +255)));

There are tricks you can use to get the effect of a compile-time
assertion as well.

--
Keith Thompson (The_Other_Keith) (E-Mail Removed) <http://www.ghoti.net/~kst>
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
 
Reply With Quote
 
Phil Carmody
Guest
Posts: n/a
 
      11-18-2011
Vincenzo Mercuri <(E-Mail Removed)> writes:
> Keith Thompson ha scritto:
> [...]
> > If you're going to cast (not "typecast") an unsigned char* to char*,
> > surely you have a reason for doing so, presumably to access the
> > pointed-to memory as char.

>
> Yes, I can't imagine of a reason for making such a cast without
> accessing the pointed-to memory...


When it's an arg for a call-back function, which you will use
by first casting back to the right type?

Phil
--
Unix is simple. It just takes a genius to understand its simplicity
-- Dennis Ritchie (1941-2011), Unix Co-Creator
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
ArrayList typecasting from binary SQL data =?Utf-8?B?Smlt?= ASP .Net 1 04-11-2005 04:49 PM
Advanced pointer typecasting Robert Street C++ 3 02-21-2004 01:33 AM
Typecasting char pointer to structure venkatesh C++ 1 12-06-2003 06:46 AM
Typecasting operator on simple types vs. classes Nicolay Korslund C++ 7 09-30-2003 04:33 PM
Understanding Typecasting in C++ Kapil Khosla C++ 3 07-20-2003 11:40 AM



Advertisments