Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > C Programming > Encoding of character literals

Reply
Thread Tools

Encoding of character literals

 
 
Lauri Alanko
Guest
Posts: n/a
 
      11-03-2011
Hello.

I find C99's language on internationalization features particularly
hard to decipher, so I'd appreciate some clarifications.

I'm particularly interested in the relationship of the execution
character set encoding of character literals and character string
literals, and its relationship with locale encodings and the wchar_t
encoding.

Firstly, is it possible for a locale to use a different encoding for
the basic execution character set than the compiler uses for the
literals? If not, doesn't this mean that the locale system (and the C
standard) are insufficient in an environment where both ASCII- and
EBCDIC-based encodings can be used?

If it is possible for a locale to use a completely different encoding,
then how can ordinary character literals and character string literals
be converted to the locale's encoding? It is of course possible to
convert between wide characters and the locale, but how do I convert
from the literal encoding into the wchar_t encoding?

Is it perhaps guaranteed that ((wchar_t) 'a' == L'a')? I haven't seen
any text to suggest this, and this would mean that an implementation
couldn't use EBCDIC for character literals and UCS-4 for wchar_t. But
maybe someone can give a definitive answer?

It is of course possible to define the mapping manually:

wchar_t char_to_wchar[] = {
['a'] = L'a',
['b'] = L'b',
// ... etc for all of the portable basic character set
};

But this seems like horrible redundant hack that I wouldn't like to
use except as a last resort. Is something like this really necessary
in order to print out character string literals correctly in all
locales?


Lauri
 
Reply With Quote
 
 
 
 
James Kuyper
Guest
Posts: n/a
 
      11-03-2011
On 11/03/2011 04:41 PM, Lauri Alanko wrote:
....
> Is it perhaps guaranteed that ((wchar_t) 'a' == L'a')?


I'm no expert on internationalization - as a US programmer I've never
had any need to worry about it. However, that question at least I can
answer:

C99 says, in effect, that the above expression is guaranteed to be true
if the implementation does not pre-define __STDC_MB_NEQ_WC__ (7.17p2).
6.10.8p1 seems to indicate that definition of the macro with a value of
1 is mandatory - but that might be an example of poor wording or a
misinterpretation on my part. It seems inconsistent with the "if" in 7.17p2.
 
Reply With Quote
 
 
 
 
Harald van Dijk
Guest
Posts: n/a
 
      11-03-2011
On Nov 3, 10:09*pm, James Kuyper <(E-Mail Removed)> wrote:
> C99 says, in effect, that the above expression is guaranteed to be true
> if the implementation does not pre-define __STDC_MB_NEQ_WC__ (7.17p2).
> 6.10.8p1 seems to indicate that definition of the macro with a value of
> 1 is mandatory - but that might be an example of poor wording or a
> misinterpretation on my part. It seems inconsistent with the "if" in 7.17p2.


It's supposed to be in 6.10.8p2, see DR #333, or a draft of C1x in
which this has been corrected.
 
Reply With Quote
 
Lauri Alanko
Guest
Posts: n/a
 
      11-10-2011
In article <(E-Mail Removed)>,
Harald van Dijk <(E-Mail Removed)> wrote:
> On Nov 3, 10:09*pm, James Kuyper <(E-Mail Removed)> wrote:
> > C99 says, in effect, that the above expression is guaranteed to be true
> > if the implementation does not pre-define __STDC_MB_NEQ_WC__ (7.17p2).
> > 6.10.8p1 seems to indicate that definition of the macro with a value of
> > 1 is mandatory - but that might be an example of poor wording or a
> > misinterpretation on my part. It seems inconsistent with the "if" in 7.17p2.

>
> It's supposed to be in 6.10.8p2, see DR #333, or a draft of C1x in
> which this has been corrected.


Thanks, that is useful. So C99 mandates that for the basic character
set, chars and the corresponding wchar_t's have the same integer
value, and C1x makes this guarantee conditional on the presence of the
macro.

But is btowc guaranteed to honor this equality in all locales? And, if
__STDC_MB_NEQ_WC__ is defined, and btowc is the only way to convert a
char to wchar_t, is it guaranteed to work correctly on integer
character constants (from the basic character set) in all locales?
That is, is (btowc('a') == L'a') going to be true in all
implementations in all legit locales? And if not, how

The corner case I'm thinking of is of course the situation where the
native encoding used by integer character literals is EBCDIC, but
wchar_t uses UCS-4, and the current locale is ASCII-based. So one
cannot cast from integer character literals to wchar_t, but one also
cannot use locale-dependent conversion functions. Is this a situation
that standard C is even able to support?


Lauri
 
Reply With Quote
 
lawrence.jones@siemens.com
Guest
Posts: n/a
 
      11-10-2011
Lauri Alanko <(E-Mail Removed)> wrote:
> In article <(E-Mail Removed)>,
> Harald van D??k <(E-Mail Removed)> wrote:
> >
> > It's supposed to be in 6.10.8p2, see DR #333, or a draft of C1x in
> > which this has been corrected.

>
> Thanks, that is useful. So C99 mandates that for the basic character
> set, chars and the corresponding wchar_t's have the same integer
> value, and C1x makes this guarantee conditional on the presence of the
> macro.


No, there was a production error in N1256 which put the macro in the
wrong paragraph; it was always supposed to have been conditional.
--
Larry Jones

I hate being good. -- Calvin
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Why No Supplemental Characters In Character Literals? Lawrence D'Oliveiro Java 76 02-27-2011 09:19 PM
character encoding +missing character sequence raavi Java 2 03-02-2006 05:01 AM
Java: byte literals and short literals John Goche Java 8 01-17-2006 11:12 PM
character literals and string Pete Elmgreen Java 3 11-24-2004 04:42 PM
Encoding.Default and Encoding.UTF8 Hardy Wang ASP .Net 5 06-09-2004 04:04 PM



Advertisments