"osmium" <> writes:
> James Kanze wrote:
>
>> On May 29, 3:08 pm, Gerhard Fiedler <geli...@gmail.com> wrote:
>>> James Kanze wrote:
>>>> (Given that ASCII is for all intents and purposes dead, it's
>>>> highly unlikely that they really want ASCII.)
>>
>>> I'm not sure, but I think in the USA there is quite a number
>>> of programmers who don't think beyond ASCII when thinking of
>>> text manipulation.
>>
>> In just about every country, there are quite a number of
>> programmers who don't think
. The fact remains that the
>> default encoding used by the system, even when configured for
>> the US, is not ASCII. Even if you're not "thinking" beyond
>> ASCII, your program must be capable of reading non-ASCII
>> characters (if only to recognize them and signal the error).
>
> Is it your point that an ASCII compliant environment would have to signal an
> error if the topmost bit in a byte was something other than 0? Or do you
> have something else in mind? I don't have the *actual* ASCII standard
> available but I would be surprised if that was expressed as a *requirement*.
> After all, the people that wrote the standard were well aware that there was
> no such thing as a seven-bit machine.
ASCII is a encoding designed for Information Interchange. The
availability of a seven-bit machine was irrelevant. But there were
indeed transfer protocols based on 7-bit data (plus 1-bit parity).
So when you considered octets holding ASCII bytes, the most
significant bit coult be always 0, always 1, or odd or even parity.
Even today, you can configure a terminal (such as xterm) to encode in
the most significant bit of an octet the Meta key, for use by programs
such as emacs, leaving only 7-bit for the ASCII code of the key
pressed. (But again, this is a transfer protocol thing, not relevant
to how emacs or any system or application encodes its characters).
Now, concerning these applications, and relevant to C and C++, the
point is that char may be signed or unsigned, and often it's signed.
Which means that non-ASCII octets are often interpreted as negative
values. Whether the application is able to handle such 'char' or not
is a matter of good practices. Very few are the programs who use
unsigned char consistently and comprehensively. Well some code
reviews occured when UTF-8 was introduced, so things have improved
slightly.
--
__Pascal Bourguignon__