Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > C Programming > VLA question

Reply
Thread Tools

VLA question

 
 
Keith Thompson
Guest
Posts: n/a
 
      06-28-2013
James Kuyper <(E-Mail Removed)> writes:
> On 06/27/2013 06:28 PM, Keith Thompson wrote:
>> James Kuyper <(E-Mail Removed)> writes:

> ...
>>> As I indicated above, the problem I described arises only on
>>> implementations where CHAR_MAX > INT_MAX. If CHAR_BIT==8, then you can't
>>> have been testing on such a system.

>>
>> Sure I was using a system with CHAR_BIT==8.
>>
>> My point was that I think Stephen is mistaken in his statement that:
>>
>> The only example I can envision a problem with is a character
>> literal that today is negative. IIRC, the conversion to char is
>> well-defined in that case. However, if character literals were
>> char, it'd have a large positive value.

>
> He was just paraphrasing what I said - if he was wrong, I was wrong.
>
>> I don't think that changing charctaer constants from int to char
>> would cause the values of any such constants to change from negative
>> to positive, assuming the signedness of char isn't changed at the
>> same time.

>
> 6.4.4.4p10: "If an integer character constant contains a single
> character or escape sequence, its value is the one that results when an
> object with type char whose value is that of the single character or
> escape sequence is converted to type int."
> If a char object contains the representation of a value greater than
> INT_MAX, when that value is converted to int, the result will be
> negative. Therefore, under the current rules, the corresponding
> character literals must have a negative value. If the rules were changed
> to give them the type char, they would have the actual value of the
> corresponding char objects, which would be greater than INT_MAX.


Got it, you're right.

Example:

CHAR_BIT == 16
sizeof(int) == 1
CHAR_MIN == 0
CHAR_MAX == 65535
INT_MIN == -32768
INT_MAX == +32767

`\xffff' is a character constant, which is of type int. Its value
is the result of converting (char)65535 to type int, which is likely
to be -1. If character constants were of type char, it would have
the positive value (char)65535 of type char.

Just to add to the frivolity, the result of the conversion
is implementation-defined. Throw one's-complement and
sign-and-magnitude into the mix, and things get fun.

--
Keith Thompson (The_Other_Keith) http://www.velocityreviews.com/forums/(E-Mail Removed) <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
 
Reply With Quote
 
 
 
 
Malcolm McLean
Guest
Posts: n/a
 
      06-28-2013
On Friday, June 28, 2013 3:03:02 AM UTC+1, Öö Tiib wrote:
> On Thursday, 27 June 2013 17:17:28 UTC+3, Eric Sosman wrote:
>
> There may be other, even trickier cases. When a tool fails to
> understand a piece of code then that indicates that human might
> fail as well. So it may be better to simplify code not smartify tools.
>


If a tool doesn't always work, then it can be extremely irritating.
You might have thousands of C files to process. It fails on just one of them,
but that means you've got to get a programmer to fix up the code manually,
then document that, for that instance, the tool-chain fails. That adds a lot
of cost, and means that errors can much more easily slip in.
 
Reply With Quote
 
 
 
 
Öö Tiib
Guest
Posts: n/a
 
      06-28-2013
On Friday, 28 June 2013 19:29:44 UTC+3, Malcolm McLean wrote:
> On Friday, June 28, 2013 3:03:02 AM UTC+1, Öö Tiib wrote:
> > On Thursday, 27 June 2013 17:17:28 UTC+3, Eric Sosman wrote:
> >
> > There may be other, even trickier cases. When a tool fails to
> > understand a piece of code then that indicates that human might
> > fail as well. So it may be better to simplify code not smartify tools.

>
> If a tool doesn't always work, then it can be extremely irritating.


I have not seen any tools that always work. Some are more robust, some
less but godly robustness is missing. It is because there are always
defects in code, compilers, linkers, standard libraries operating systems
and hardware on what that all runs.

> You might have thousands of C files to process. It fails on just one of them,
> but that means you've got to get a programmer to fix up the code manually,
> then document that, for that instance, the tool-chain fails. That adds a lot
> of cost, and means that errors can much more easily slip in.


When we are talking about repositories of thousands of files then we are
likely talking about efforts of thousands of man-days and so we are
likely talking about teams of tens of developers? Mere build may take
several minutes. Therefore the build (involving compilers, code
generators,static analyzers, running unit tests etc.) is best to be done
by continuous integration system (or farm) to save the time of each
developer building it.

A tool does not suddenly start to fail out of blue. Either someone
modified the file with what it fails or modified the tool or modified
something on what one or other depends. If integration is continuous then
it is very clear who committed that breaking change. Just back out
that breaking change-set, notify the one who committed it and let him to
deal with it. If he can't then he will find aid who can. We are software
developers so dodging defects is our everyday bread and butter.
 
Reply With Quote
 
Stephen Sprunk
Guest
Posts: n/a
 
      06-29-2013
On 27-Jun-13 12:44, Keith Thompson wrote:
> Stephen Sprunk <(E-Mail Removed)> writes:
>> On 27-Jun-13 07:28, James Kuyper wrote:
>>> The value of a character literal will be same, whether it has
>>> type 'int' or type 'char', so long as char is signed, or is
>>> unsigned with CHAR_MAX <= INT_MAX. Only if CHAR_MAX > INT_MAX
>>> could it matter. Character literals that currently have a
>>> negative value would instead have a positive value greater than
>>> INT_MAX.

>>
>> The only example I can envision a problem with is a character
>> literal that today is negative. IIRC, the conversion to char is
>> well-defined in that case. However, if character literals were
>> char, it'd have a large positive value. Storing in a char would
>> still be fine, but storing in an int would require a
>> possibly-problematic conversion.

>
> That doesn't seem right. A character constant that has a negative
> value today (because plain char is a signed type) would still have a
> negative value if character constants were of type char. It would
> just be a negative value of type char.


With unsigned plain char, a character literal with a negative int value
today would have a large positive value if its type changed to char,
assuming the implementation didn't change to signed plain char at the
same time.

CHAR_MAX > INT_MAX with signed plain char requires int to have padding
bits and less range than char, which AIUI isn't allowed.

S

--
Stephen Sprunk "God does not play dice." --Albert Einstein
CCIE #3723 "God is an inveterate gambler, and He throws the
K5SSS dice at every possible opportunity." --Stephen Hawking
 
Reply With Quote
 
Malcolm McLean
Guest
Posts: n/a
 
      06-29-2013
On Friday, June 28, 2013 7:02:05 PM UTC+1, Öö Tiib wrote:
> On Friday, 28 June 2013 19:29:44 UTC+3, Malcolm McLean wrote:
>
> > If a tool doesn't always work, then it can be extremely irritating.

>
> I have not seen any tools that always work. Some are more robust, some
> less but godly robustness is missing.
>

The computer can break.
But most C compilers will always compile valid C code, most text editors will
always show the real contents of files, most compressors will always
archive correctly. The bugs are elsewhere. If you use the Unix philosophy
of "each tool does one thing" then those tools tend to be stable and
bug free. If you use the alternative philosophy of the "integrated
system" then you're constantly adding features, and often things
break. (However integrated systems are often easier to use, it's not
all one way).
 
Reply With Quote
 
Keith Thompson
Guest
Posts: n/a
 
      06-29-2013
Stephen Sprunk <(E-Mail Removed)> writes:
> On 27-Jun-13 12:44, Keith Thompson wrote:
>> Stephen Sprunk <(E-Mail Removed)> writes:
>>> On 27-Jun-13 07:28, James Kuyper wrote:
>>>> The value of a character literal will be same, whether it has
>>>> type 'int' or type 'char', so long as char is signed, or is
>>>> unsigned with CHAR_MAX <= INT_MAX. Only if CHAR_MAX > INT_MAX
>>>> could it matter. Character literals that currently have a
>>>> negative value would instead have a positive value greater than
>>>> INT_MAX.
>>>
>>> The only example I can envision a problem with is a character
>>> literal that today is negative. IIRC, the conversion to char is
>>> well-defined in that case. However, if character literals were
>>> char, it'd have a large positive value. Storing in a char would
>>> still be fine, but storing in an int would require a
>>> possibly-problematic conversion.

>>
>> That doesn't seem right. A character constant that has a negative
>> value today (because plain char is a signed type) would still have a
>> negative value if character constants were of type char. It would
>> just be a negative value of type char.

>
> With unsigned plain char, a character literal with a negative int value
> today would have a large positive value if its type changed to char,
> assuming the implementation didn't change to signed plain char at the
> same time.


Right -- but that's only an issue when CHAR_BIT >= 16, which is the
context I missed in my previous response. As I also noted elsethread,
the conversion from char to int, where char is an unsigned type and the
value doesn't fit, is implementation-defined; the result is *probably*
negative, but it's not guaranteed.

> CHAR_MAX > INT_MAX with signed plain char requires int to have padding
> bits and less range than char, which AIUI isn't allowed.


I think that's right.

--
Keith Thompson (The_Other_Keith) (E-Mail Removed) <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
 
Reply With Quote
 
Öö Tiib
Guest
Posts: n/a
 
      06-29-2013
On Saturday, 29 June 2013 20:09:33 UTC+3, Malcolm McLean wrote:
> On Friday, June 28, 2013 7:02:05 PM UTC+1, Öö Tiib wrote:
> > On Friday, 28 June 2013 19:29:44 UTC+3, Malcolm McLean wrote:
> >
> > > If a tool doesn't always work, then it can be extremely irritating.

> >
> > I have not seen any tools that always work. Some are more robust, some
> > less but godly robustness is missing.
> >

> The computer can break.
> But most C compilers will always compile valid C code, most text editors will
> always show the real contents of files, most compressors will always
> archive correctly. The bugs are elsewhere.


That I told. When things fail let the author of situation to fix it.
99% of cases he did mess something up. 1% of cases he discovers a bug in
compiler or the like.

> If you use the Unix philosophy of "each tool does one thing" then those
> tools tend to be stable and bug free.


I like that philosophy. I described a simple tool that can preprocess
source before compiler (add casts to mallocs, maybe add extern "C" to
headers). As result majority of good C code can be compiled with
both C and C++ compiler. There will be cases that still can't but then
it is better to let human to adjust them instead of making the tool more
smart and error-prone.

> If you use the alternative philosophy of the "integrated
> system" then you're constantly adding features, and often things
> break. (However integrated systems are often easier to use, it's not
> all one way).


That is AFAIK still Unix philosophy. We pipe together those simple tools
to get more sophisticated results. If we just would use each of those
simple tools alone by hand then Unix would be annoying to use. That set
will more likely fail since there are more tools and details but on each
case it is usually simple to understand problem in some simple step in
that complex chain.
 
Reply With Quote
 
James Kuyper
Guest
Posts: n/a
 
      07-01-2013
On 06/27/2013 12:07 PM, Stephen Sprunk wrote:
> On 27-Jun-13 07:28, James Kuyper wrote:
>> On 06/26/2013 03:35 PM, Stephen Sprunk wrote:
>>> IMHO, any code that relies on character literals being of type int
>>> is _already_ broken, and the current mis-definition does nothing
>>> but hide such bugs from the programmer, which is not worth
>>> preserving.

....
>> The value of a character literal will be same, whether it has type
>> 'int' or type 'char', so long as char is signed, or is unsigned with
>> CHAR_MAX <= INT_MAX. Only if CHAR_MAX > INT_MAX could it matter.
>> Character literals that currently have a negative value would instead
>> have a positive value greater than INT_MAX.

>
> The only example I can envision a problem with is a character literal
> that today is negative. IIRC, the conversion to char is well-defined in
> that case. However, if character literals were char, it'd have a large
> positive value. Storing in a char would still be fine, but storing in


Up to this point, you're saying almost exactly what I just said, just
with slightly different wording.

> an int would require a possibly-problematic conversion.
>
> Is that the concern?


Almost. The conversion to 'int' would be guaranteed to produce exactly
the same value that the character literal would have had under the
current rules. In order to demonstrate the change, you have to convert
it to a signed type with a MAX value greater than INT_MAX. 'long int' is
likely to be such a type, but even intmax_t is not guaranteed to be such
a type.

>> Implementations where CHAR_MAX > INT_MAX are extremely rare,
>> probably non-existent, but if one does exist, it could be fully
>> conforming to the C standard. Such an implementation must have
>> CHAR_MIN == 0, CHAR_BIT >= 16, and unless there's a lot of padding
>> bits in an int, sizeof(int) == 1.

>
> If we also assume that int has no padding bits, that doesn't seem
> completely unreasonable, actually. There are probably DSPs like that.
>
> The problem could be solved by the implementation defining plain char to
> be signed, which is the only sane choice if a character literal can be
> negative (even as an int) in the first place.


You might consider it insane to have char be unsigned on such a
implementation, but such an implementation could be fully conforming. It
would violate some widely held expectations, but if it is fully
conforming, then those expectations were unjustified. Is there any
reason other than such expectations why you would consider such an
implementation insane?

> If all character literals are non-negative and plain char is unsigned,
> then there is no problem making them char on such a system. ...


For implementations where CHAR_MAX > INT_MAX, some character literals
must have a negative value, so that never applies.

> ... That is
> what a C++ implementation would have to do, ...


Why? What provision of the C++ standard would force them to do that?

>> Conclusion: Code which depends upon the assumption that character
>> literals have type 'int' (excluding cases involving sizeof or
>> _Gerneric()) must do so in one of two basic ways:
>>
>> 1. Assuming that character literals never have a value greater than
>> INT_MAX.
>> 2. Assuming that character literals are never promoted to 'unsigned
>> int'.
>>
>> Those assumptions could fail if character literals were changed to
>> 'char', but only on implementations where CHAR_MAX > INT_MAX. Are
>> you sure you want to claim that all such code is "broken"?

>
> I'm still wrapping my head around your (excellent, BTW) analysis, but my
> gut tells me that such code is indeed "broken". Perhaps if you could
> come up with a concrete example of code that you think is non-broken but
> would fail if character literals were char, rather than an abstract
> argument?


To me, the single strongest argument against considering such code to be
broken is the fact that the C standard guarantees that character
literals have 'int' type. You haven't explained why you consider such
code broken. My best guess is that you think that choosing 'int' rather
than 'char' was so obviously and so seriously wrong, that programmers
have an obligation to write their code so that it will continue to work
if the committee ever gets around to correcting that mistake. I agree
with you that the C++ rules are more reasonable, but I don't think it's
likely that the C committee will ever change that feature of C, and it's
even less likely that it will do so any time soon. Therefore, that
doesn't seem like a reasonable argument to me - so I'd appreciate
knowing what your actual argument against such code is.

Summarizing what I said earlier, as far as I have been able to figure
out, the behavior of C code can change as a result of character literals
changing from 'int' to 'char' in only a few ways:
1. sizeof('character_literal'), which is a highly implausible construct;
it's only plausible use that isn't redundant with #ifdef __cplusplus is
by someone who incorrectly expects it to be equivalent to sizeof(char);
and if someone did expect that, they should also have incorrectly
expected it to be a synonym for '1'; so why not write '1' instead?
2. _Generic() is too new and too poorly supported for code using it to
be a significant problem at this time.
3. Obscure, and possibly mythical, implementations where CHAR_MAX > INT_MAX.

I consider the third item to be overwhelmingly the most significant of
the three issues, even though the unlikelihood of such implementations
makes it an insignificant issue in absolute terms. Ignoring the other
two issues (and assuming that LONG_MAX > INT_MAX), consider the
following code:


char c = 'C';
long literal = 'C';
long variable = c;
int offset = -13;

Under the current rules, on an implementation where CHAR_MAX <= INT_MAX:
c+offset and 'C'+ offset both have the type 'int'. 'c', 'literal' and
'variable' are all guaranteed to be positive.

Under the current rules, on an implementation where CHAR_MAX > INT_MAX:
c+offset will have the type 'unsigned int', but 'C' + offset will have
the type 'int'. It is possible (though extremely implausible) that c >
INT_MAX. If it is, the same will be true of 'variable', but 'literal'
will be negative.

If character literals were changed to have the type 'char', on an
implementation where CHAR_MAX <= INT_MAX:
c+offset and 'C' + offset would both have the type 'int'. 'c',
'literal', and 'variable' would all be guaranteed to be positive.

If character literals were changed to have the type 'char', on an
implementation where CHAR_MAX > INT_MAX:
c + offset and 'C' + offset would both have the type 'unsigned int'. It
would be possible (though extremely implausible) that c > INT_MAX. If it
were, the same would be true for both 'literal' and 'variable'.

Therefore, the only implementations where code would have different
behavior if character literals were changed to 'char' are those where
CHAR_MAX > INT_MAX. And the only differences involve behavior that,
under the current rules, is different from the behavior for CHAR_MAX <=
INT_MAX. Therefore, the only code that will break if this rule is
changes is code that currently goes out of it's way to correctly deal
with the possibility that CHAR_MAX > INT_MAX. I cannot see how you could
justify labeling code as 'broken', just because it correctly (in terms
of the current standard) deals with such an extremely obscure side issue.

On the other hand, the simplest way to deal with the possibility that
CHAR_MAX > INT_MAX is to insert casts:

if(c == (char)'C')

or
long literal = (char)'C';

Such code would not be affected by such a change. Only code that copes
with the possibility by other methods (such as #if CHAR_MAX > INT_MAX)
would be affected. I suppose you could call such code broken - but only
if you can justify insisting that programmers have an obligation to deal
with the possibility that the committee might change this rule.

--
James Kuyper
 
Reply With Quote
 
Stephen Sprunk
Guest
Posts: n/a
 
      07-01-2013
On 01-Jul-13 07:56, James Kuyper wrote:
> On 06/27/2013 12:07 PM, Stephen Sprunk wrote:
>> On 27-Jun-13 07:28, James Kuyper wrote:
>>> On 06/26/2013 03:35 PM, Stephen Sprunk wrote:
>>>> IMHO, any code that relies on character literals being of type
>>>> int is _already_ broken, and the current mis-definition does
>>>> nothing but hide such bugs from the programmer, which is not
>>>> worth preserving.

> ....
>>> The value of a character literal will be same, whether it has
>>> type 'int' or type 'char', so long as char is signed, or is
>>> unsigned with CHAR_MAX <= INT_MAX. Only if CHAR_MAX > INT_MAX
>>> could it matter. Character literals that currently have a
>>> negative value would instead have a positive value greater than
>>> INT_MAX.

>>
>> The only example I can envision a problem with is a character
>> literal that today is negative. IIRC, the conversion to char is
>> well-defined in that case. However, if character literals were
>> char, it'd have a large positive value. Storing in a char would
>> still be fine, but storing in

>
> Up to this point, you're saying almost exactly what I just said,
> just with slightly different wording.


When I get confused, I tend to dump my current state in hopes that
someone can point out an error that led to said confusion.

>> an int would require a possibly-problematic conversion.
>>
>> Is that the concern?

>
> Almost. The conversion to 'int' would be guaranteed to produce
> exactly the same value that the character literal would have had
> under the current rules.


Why? I thought that, while converting a negative value to unsigned was
well-defined, converting an out-of-range unsigned value to signed was not.

>>> Implementations where CHAR_MAX > INT_MAX are extremely rare,
>>> probably non-existent, but if one does exist, it could be fully
>>> conforming to the C standard. Such an implementation must have
>>> CHAR_MIN == 0, CHAR_BIT >= 16, and unless there's a lot of
>>> padding bits in an int, sizeof(int) == 1.

>>
>> If we also assume that int has no padding bits, that doesn't seem
>> completely unreasonable, actually. There are probably DSPs like
>> that.
>>
>> The problem could be solved by the implementation defining plain
>> char to be signed, which is the only sane choice if a character
>> literal can be negative (even as an int) in the first place.

>
> You might consider it insane to have char be unsigned on such a
> implementation, but such an implementation could be fully conforming.
> It would violate some widely held expectations, but if it is fully
> conforming, then those expectations were unjustified. Is there any
> reason other than such expectations why you would consider such an
> implementation insane?


I consider it insane to have an unsigned plain char when character
literals can be negative.

>> If all character literals are non-negative and plain char is
>> unsigned, then there is no problem making them char on such a
>> system. ...

>
> For implementations where CHAR_MAX > INT_MAX, some character
> literals must have a negative value, so that never applies.


Granted, one can create arbitrary character literals, but doing so
ventures into "contrived" territory. I only mean to include real
characters, which I think means ones in the source or execution
character sets.

>> ... That is what a C++ implementation would have to do, ...

>
> Why? What provision of the C++ standard would force them to do that?


In C++, character literals have type char, so if char is unsigned, then
by definition no character literal can be negative.

>>> Conclusion: Code which depends upon the assumption that
>>> character literals have type 'int' (excluding cases involving
>>> sizeof or _Gerneric()) must do so in one of two basic ways:
>>>
>>> 1. Assuming that character literals never have a value greater
>>> than INT_MAX. 2. Assuming that character literals are never
>>> promoted to 'unsigned int'.
>>>
>>> Those assumptions could fail if character literals were changed
>>> to 'char', but only on implementations where CHAR_MAX > INT_MAX.
>>> Are you sure you want to claim that all such code is "broken"?

>>
>> I'm still wrapping my head around your (excellent, BTW) analysis,
>> but my gut tells me that such code is indeed "broken". Perhaps if
>> you could come up with a concrete example of code that you think is
>> non-broken but would fail if character literals were char, rather
>> than an abstract argument?

>
> To me, the single strongest argument against considering such code to
> be broken is the fact that the C standard guarantees that character
> literals have 'int' type. You haven't explained why you consider
> such code broken. My best guess is that you think that choosing 'int'
> rather than 'char' was so obviously and so seriously wrong,


Well, I'm not sure how much of a "choice" that really was, rather than
an accident of C's evolution from an untyped language and everything
becoming an "int" by default.

> that programmers have an obligation to write their code so that it
> will continue to work if the committee ever gets around to
> correcting that mistake.


I cannot recall having seen any code that would break if that mistake
were corrected, and I'm reasonably certain none of mine would because I
thought character literals _were_ of type char until many years after
first learning C--and I still code as if it were true because I want my
code to still work if compiled as C++.

> ...
> 3. Obscure, and possibly mythical, implementations where CHAR_MAX >
> INT_MAX.
>
> I consider the third item to be overwhelmingly the most significant
> of the three issues, even though the unlikelihood of such
> implementations makes it an insignificant issue in absolute terms.


We know there are systems where sizeof(int)==1; can we really assume
that plain char is signed on all such implementations, which is the only
way for them to _avoid_ CHAR_MAX > INT_MAX?

> Therefore, the only implementations where code would have different
> behavior if character literals were changed to 'char' are those
> where CHAR_MAX > INT_MAX. And the only differences involve behavior
> that, under the current rules, is different from the behavior for
> CHAR_MAX <= INT_MAX. Therefore, the only code that will break if this
> rule is changes is code that currently goes out of it's way to
> correctly deal with the possibility that CHAR_MAX > INT_MAX. I cannot
> see how you could justify labeling code as 'broken', just because it
> correctly (in terms of the current standard) deals with such an
> extremely obscure side issue.


My gut says more code would break on systems where CHAR_MAX > INT_MAX
than would break if character literals were chars; few programmers would
think about accommodating the former or even realize it could exist,
whereas most either mistakenly think the latter is true or are actually
coding for the C-like subset of C++ where it _is_ true.

S

--
Stephen Sprunk "God does not play dice." --Albert Einstein
CCIE #3723 "God is an inveterate gambler, and He throws the
K5SSS dice at every possible opportunity." --Stephen Hawking
 
Reply With Quote
 
James Kuyper
Guest
Posts: n/a
 
      07-01-2013
On 07/01/2013 12:36 PM, Stephen Sprunk wrote:
> On 01-Jul-13 07:56, James Kuyper wrote:
>> On 06/27/2013 12:07 PM, Stephen Sprunk wrote:

....
>>> an int would require a possibly-problematic conversion.
>>>
>>> Is that the concern?

>>
>> Almost. The conversion to 'int' would be guaranteed to produce
>> exactly the same value that the character literal would have had
>> under the current rules.

>
> Why? I thought that, while converting a negative value to unsigned was
> well-defined, converting an out-of-range unsigned value to signed was not.


I mentioned my argument for that conclusion earlier in this thread -
both you and Keith seem to have skipped over it without either accepting
it or explaining why you had rejected it. Here it is again.

The standard defines the behavior of fputc() in terms of the conversion
of int to unsigned char (7.21.7.3p2). It defines the behavior of fgetc()
in terms of the conversion from unsigned char to int (7.21.7.1p2). All
other I/O is defined in terms of the behavior of those two functions -
the other I/O functions don't have to actually call those functions, but
they are required to behave as if they did. It also requires that "Data
read in from a binary stream shall compare equal to the data that were
earlier written out to that stream, under the same implementation."
(7.21.2p3). While, in general, conversion to signed type of a value that
is too big to be represented by that type produces an
implementation-defined result or raises an implementation-defined
signal, for this particular conversion, I think that 7.21.2p3 implicitly
prohibits the signal, and requires that if 'c' is an unsigned char, then

(unsigned char)(int)c == c

If CHAR_MAX > INT_MAX, then 'char' must behave the same as 'unsigned
char'. Also, on such an implementation, there cannot be more valid 'int'
values than there are 'char' values, and the inversion requirement
implies that there cannot be more char values than there are valid 'int'
values. This means that we must also have, if 'i' is an int object
containing a valid representation, that

(int)(char)i == i

In particular, this applies when i==EOF, which is why comparing fgetc()
values with EOF is not sufficient to determine whether or not the call
was successful. Negative zero and positive zero have to convert to the
same unsigned char, which would make it impossible to meet both
inversion requirements, so it also follows that 'int' must have a 2's
complement representation on such a platform.

....
> I consider it insane to have an unsigned plain char when character
> literals can be negative.


You've already said that. What you haven't done so far is explained why.
I agree that there's a bit of conflict there, but 'insane' seems extreme.

....
>> For implementations where CHAR_MAX > INT_MAX, some character
>> literals must have a negative value, so that never applies.

>
> Granted, one can create arbitrary character literals, but doing so
> ventures into "contrived" territory. I only mean to include real
> characters, which I think means ones in the source or execution
> character sets.


There's no requirement that any member, not even of the basic execution
character set, have an encoding that is <= INT_MAX. It's pretty
unlikely for members of the basic execution set, but it seems a very
likely thing for members of the extended character set that are
represented by UCNs for code points that are greater than INT_MAX. All
such characters must have a character literal that is negative if
CHAR_MAX > INT_MAX.

>>> ... That is what a C++ implementation would have to do, ...

>>
>> Why? What provision of the C++ standard would force them to do that?

>
> In C++, character literals have type char, so if char is unsigned, then
> by definition no character literal can be negative.


I'd forgotten that C++ had a different rule for the value of a character
literal than C does. The C rule is defined in terms of conversion of a
char object's value to type 'int', which obviously would be
inappropriate given that C++ gives character literals a type of 'char'.
Somehow I managed to miss that "obvious" conclusion, and I didn't bother
to check. Sorry.

....
>> To me, the single strongest argument against considering such code to
>> be broken is the fact that the C standard guarantees that character
>> literals have 'int' type. You haven't explained why you consider
>> such code broken. My best guess is that you think that choosing 'int'
>> rather than 'char' was so obviously and so seriously wrong,

....
>> that programmers have an obligation to write their code so that it
>> will continue to work if the committee ever gets around to
>> correcting that mistake.

>
> I cannot recall having seen any code that would break if that mistake
> were corrected, and I'm reasonably certain none of mine would because I


The essence of what I've been saying is that it's fairly difficult to
write such code, except by relying upon sizeof() or _Generic(), and
almost impossible to do so accidentally.

>> 3. Obscure, and possibly mythical, implementations where CHAR_MAX >
>> INT_MAX.
>>
>> I consider the third item to be overwhelmingly the most significant
>> of the three issues, even though the unlikelihood of such
>> implementations makes it an insignificant issue in absolute terms.

>
> We know there are systems where sizeof(int)==1; can we really assume
> that plain char is signed on all such implementations, which is the only
> way for them to _avoid_ CHAR_MAX > INT_MAX?


Every time I've brought up the odd behavior of implementations which
have UCHAR_MAX > INT_MAX, it's been argued that they either don't exist
or are so rare that we don't need to bother worrying about them.
Implementations where CHAR_MAX>INT_MAX must be even rarer (since they
are a subset of implementations where UCHAR_MAX > INT_MAX), so I'm
surprised (and a bit relieved) to see someone actually arguing for the
probable existence of such implementations. I'd feel happier about it if
someone could actually cite one, but I don't remember anyone ever doing so.

>> Therefore, the only implementations where code would have different
>> behavior if character literals were changed to 'char' are those
>> where CHAR_MAX > INT_MAX. And the only differences involve behavior
>> that, under the current rules, is different from the behavior for
>> CHAR_MAX <= INT_MAX. Therefore, the only code that will break if this
>> rule is changes is code that currently goes out of it's way to
>> correctly deal with the possibility that CHAR_MAX > INT_MAX. I cannot
>> see how you could justify labeling code as 'broken', just because it
>> correctly (in terms of the current standard) deals with such an
>> extremely obscure side issue.

>
> My gut says more code would break on systems where CHAR_MAX > INT_MAX
> than would break if character literals were chars;


Well, that follows from what I said above. Almost all breakage that
would occur if character literals were changed to char would occur on
platforms where CHAR_MAX > INT_MAX, and would therefore count for both
categories. However, I'll go farther, and say that it's not only "more
code", but "a lot more code".
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Routing Between Two VLA Ns Bob Simon Cisco 5 02-06-2007 01:15 AM
VLA Question pemo C Programming 8 02-24-2006 01:04 PM
support of C99 VLA in compilers Ben Hinkle C Programming 6 12-15-2005 02:08 PM
Compound literals and VLA's William Ahern C Programming 6 08-24-2005 05:40 AM
[C99] local VLA with dimension given by a global var? MackS C Programming 15 02-21-2005 07:59 PM



Advertisments