Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > C Programming > French characters not recognised in C?

Reply
Thread Tools

French characters not recognised in C?

 
 
Ess355
Guest
Posts: n/a
 
      04-02-2004
Hi,

In the debugger at run time, characters like are not recognised by
their normal ASCII number, but something like -8615722... . I've seen
this number before, it means "rubbish" right?

So how can I possible modify my program so that french characters get
recognised?

Thanks in advance,
Ehsan.
 
Reply With Quote
 
 
 
 
Arthur J. O'Dwyer
Guest
Posts: n/a
 
      04-02-2004

On Thu, 1 Apr 2004, Ess355 wrote:
>
> In the debugger at run time, characters like are not recognised by
> their normal ASCII number, but something like -8615722... .


That doesn't make a whole lot of sense. What do you mean, "characters
....are not recognized by their normal ASCII number"? First of all,
é doesn't *have* an ASCII number. Second, assuming you've
picked an encoding somehow and you're expecting to see é displayed
correctly, what's going wrong?
Do you type é at the keyboard and your program doesn't recognize
it?
Do you type é in your source code and it doesn't display
correctly?
Do you type é in your source code and it refuses to compile at
all?

In general, the C programming language only deals with a very restricted
"basic character set," which doesn't contain things like é. If
you want to display or process that sort of input or output, you'll need
to either find a compiler with nice language support; find a library that
handles your national encoding(s) or Unicode; or roll your own library.
'wchar_t' and the wchar functions might be useful to you, too; read the
manpages for them or Google 'wchar_t manpage' for details.

> So how can I possible modify my program so that french characters get
> recognised?


Depending on what exactly your problem is, you might try:

* Posting to fr.comp.lang.c or another French-language group.
* Getting a better compiler.
* Using 'wchar_t' in place of 'char'.
* Using a translation library that can convert between French encodings
and a useful ASCII encoding of the same text, e.g.: é -> \'e

If you post a complete, compilable, minimal program that demonstrates
the problem, someone here might be able to help you more. But
fr.comp.lang.c sounds like a better bet to me.

HTH,
-Arthur

 
Reply With Quote
 
 
 
 
Michael B Allen
Guest
Posts: n/a
 
      04-02-2004
On Thu, 01 Apr 2004 21:21:06 -0500, Ess355 wrote:
> In the debugger at run time, characters like are not recognised by
> their normal ASCII number, but something like -8615722... . I've seen
> this number before, it means "rubbish" right?
>
> So how can I possible modify my program so that french characters get
> recognised?


By default, most platforms (all?) will execute programs in the "C"
locale which only supports ASCII. ASCII is a 7bit encoding/charset that
does not support european characters. You might try adding a call to
setlocale like:

setlocale(LC_CTYPE, "");

This will check some environment variables to determine the locale
your running in. You can force a specific locale like setlocal(LC_ALL,
"fr_FR") but you may or may not want to do that depending on the source
of the characters.

Or you might need to run the debugger in a different locale. For example
on Unix systems a very simple way to run a program in a different locale
is by preceeding the command with an environment variable like:

$ LC_CTYPE=fr_CA dbug ./myproggie

Mike
 
Reply With Quote
 
Dan Pop
Guest
Posts: n/a
 
      04-02-2004
In <(E-Mail Removed)> Michael B Allen <(E-Mail Removed)> writes:

>On Thu, 01 Apr 2004 21:21:06 -0500, Ess355 wrote:
>> In the debugger at run time, characters like are not recognised by
>> their normal ASCII number, but something like -8615722... . I've seen
>> this number before, it means "rubbish" right?
>>
>> So how can I possible modify my program so that french characters get
>> recognised?

>
>By default, most platforms (all?) will execute programs in the "C"
>locale which only supports ASCII.


Nope. By default most platforms will use one 8-bit extension to ASCII or
another in the "C" locale. The others will use one EBCDIC flavour (code
page) or another. In principle, one could attach a KSR-33 to a serial
port (and figure out how to set the speed of that port to 110 bps), just
to prove me wrong

This can be easily tested with a trivial program like this:

#include <stdio.h>

int main()
{
printf("\376\375\374 Hello world\n");
return 0;
}

>ASCII is a 7bit encoding/charset that
>does not support european characters. You might try adding a call to
>setlocale like:
>
> setlocale(LC_CTYPE, "");


You're really naive if you believe that this will change the character
set used by the implementation. It will merely change the behaviour of
certain functions that are affected by the current locale.

In practice, it is the user's job to select a character set suitable for
his locale and to set the default native locale accordingly.

>This will check some environment variables to determine the locale
>your running in. You can force a specific locale like setlocal(LC_ALL,
>"fr_FR") but you may or may not want to do that depending on the source
>of the characters.


1. Where did you get the idea that "fr_FR" is a valid locale name from?
May I have the chapter and verse?

2. If the user has a Russian terminal, selecting a French locale won't
make Latin-1 characters appear as intended.

>Or you might need to run the debugger in a different locale. For example
>on Unix systems a very simple way to run a program in a different locale
>is by preceeding the command with an environment variable like:
>
> $ LC_CTYPE=fr_CA dbug ./myproggie


Let's see:

fangorn:~ 171> LC_CTYPE=fr_CA dbug ./myproggie
LC_CTYPE=fr_CA: Command not found.

Doesn't Linux count as a Unix system any more?

The issue is very simple in practice, but extremely difficult to describe
in terms of what the C standard actually says. Each new C programmer
should to a bit of experimenting, using programs like the one shown above,
to see what happens when values above 127 (and, for pragmatic reasons,
the range 128 - 159 should be avoided) are used as (unsigned) character
values.

Dan
--
Dan Pop
DESY Zeuthen, RZ group
Email: http://www.velocityreviews.com/forums/(E-Mail Removed)
 
Reply With Quote
 
Michael B Allen
Guest
Posts: n/a
 
      04-02-2004
On Fri, 02 Apr 2004 07:20:32 -0500, Dan Pop wrote:
>>> In the debugger at run time, characters like are not recognised by
>>> their normal ASCII number, but something like -8615722... . I've seen
>>> this number before, it means "rubbish" right?
>>>
>>> So how can I possible modify my program so that french characters get
>>> recognised?

>>
>>By default, most platforms (all?) will execute programs in the "C"
>>locale which only supports ASCII.

>
> Nope. By default most platforms will use one 8-bit extension to ASCII
> or another in the "C" locale. The others will use one EBCDIC flavour
> (code page) or another. In principle, one could attach a KSR-33 to a
> serial port (and figure out how to set the speed of that port to 110
> bps), just to prove me wrong
>
> This can be easily tested with a trivial program like this:
>
> #include <stdio.h>
>
> int main()
> {
> printf("\376\375\374 Hello world\n"); return 0;
> }


Why do you think this will give you the default behavior? If you run
this on a fancy machine with extravagant libraries and locales available
it will likely give you different results depending on what the default
locale is. On my system this will print Latin1.

>>ASCII is a 7bit encoding/charset that does not support european
>>characters. You might try adding a call to setlocale like:
>>
>> setlocale(LC_CTYPE, "");

>
> You're really naive if you believe that this will change the character
> set used by the implementation. It will merely change the behaviour of
> certain functions that are affected by the current locale.


What do you mean by "used by the implementation"? The OP said "at run
time". On my system if I do:

$ LANG=en_US.UTF-8 ./myproggie

it indeed changes the behavior of how characters are interpreted
at runtime. I said nothing about the charset or encoding used by the
compiler or how string literal are stored in binaries.

> Let's see:
>
> fangorn:~ 171> LC_CTYPE=fr_CA dbug ./myproggie LC_CTYPE=fr_CA:
> Command not found.
>
> Doesn't Linux count as a Unix system any more?


Actually I meant LANG=fr_CA but this is clearly a shell feature so let's
not get too pedantic about it. You've embarrassed yourself enough by
acknowledging you use C shell :->

Mike
 
Reply With Quote
 
Richard Bos
Guest
Posts: n/a
 
      04-05-2004
Michael B Allen <(E-Mail Removed)> wrote:

> On Fri, 02 Apr 2004 07:20:32 -0500, Dan Pop wrote:


[ Quoting was buggered up-stream; the next bit is by Michael B Allen. ]

> >>By default, most platforms (all?) will execute programs in the "C"
> >>locale which only supports ASCII.

> >
> > Nope. By default most platforms will use one 8-bit extension to ASCII
> > or another in the "C" locale. The others will use one EBCDIC flavour
> > (code page) or another. In principle, one could attach a KSR-33 to a
> > serial port (and figure out how to set the speed of that port to 110
> > bps), just to prove me wrong
> >
> > This can be easily tested with a trivial program like this:
> >
> > #include <stdio.h>
> >
> > int main()
> > {
> > printf("\376\375\374 Hello world\n"); return 0;
> > }

>
> Why do you think this will give you the default behavior?


It must, if compiled in ISO C mode. All programs start in the "C"
locale. Even so...

> If you run this on a fancy machine with extravagant libraries and
> locales available it will likely give you different results depending
> on what the default locale is. On my system this will print Latin1.


....even so, the char types must be at least 8-bit, which means that
plain ASCII, being 7-bit, is out of the race from the start. Your
default character set _must_ be either an (at least 8-bit) extension to
ASCII, or something else entirely (most usually EBCDIC, which itself is
rare enough, but not entirely unheard of).
IOW, Dan's '\376' et al. must specify a valid member of the character
set, even though they are not part of ASCII.

> > fangorn:~ 171> LC_CTYPE=fr_CA dbug ./myproggie LC_CTYPE=fr_CA:
> > Command not found.
> >
> > Doesn't Linux count as a Unix system any more?

>
> Actually I meant LANG=fr_CA but this is clearly a shell feature so let's
> not get too pedantic about it. You've embarrassed yourself enough by
> acknowledging you use C shell :->


And what other shell did you expect to see used in _this_ newsgroup,
then <g>?

Richard
 
Reply With Quote
 
Dan Pop
Guest
Posts: n/a
 
      04-05-2004
In <(E-Mail Removed) > Michael B Allen <(E-Mail Removed)> writes:

>On Fri, 02 Apr 2004 07:20:32 -0500, Dan Pop wrote:
>>>> In the debugger at run time, characters like are not recognised by
>>>> their normal ASCII number, but something like -8615722... . I've seen
>>>> this number before, it means "rubbish" right?
>>>>
>>>> So how can I possible modify my program so that french characters get
>>>> recognised?
>>>
>>>By default, most platforms (all?) will execute programs in the "C"
>>>locale which only supports ASCII.

>>
>> Nope. By default most platforms will use one 8-bit extension to ASCII
>> or another in the "C" locale. The others will use one EBCDIC flavour
>> (code page) or another. In principle, one could attach a KSR-33 to a
>> serial port (and figure out how to set the speed of that port to 110
>> bps), just to prove me wrong
>>
>> This can be easily tested with a trivial program like this:
>>
>> #include <stdio.h>
>>
>> int main()
>> {
>> printf("\376\375\374 Hello world\n"); return 0;
>> }

>
>Why do you think this will give you the default behavior? If you run
>this on a fancy machine with extravagant libraries and locales available
>it will likely give you different results depending on what the default
>locale is.


Because this program runs in the "C" locale, reagrdless of what the
default locale is. It's the default font/character set that will
determine it's output, not the default locale. I can set the default
locale to an English locale using Latin1, but if the font currently
used by the terminal where the program generates its output is Latin2,
I'm not going to see Latin1 output.

>On my system this will print Latin1.


More likely, it will simply output some character codes and let an entity
external to the implementation to decide what character set to use.

On my system, I can switch between Latin1 and Latin2 fonts in an
xterm window with the mouse. Therefore, I can alter the program output
even *after* running the program, by selecting another font for that
window. The only invariant is the character codes output by the program.
This is *not* a locale issue at all.

>>>ASCII is a 7bit encoding/charset that does not support european
>>>characters. You might try adding a call to setlocale like:
>>>
>>> setlocale(LC_CTYPE, "");

>>
>> You're really naive if you believe that this will change the character
>> set used by the implementation. It will merely change the behaviour of
>> certain functions that are affected by the current locale.

>
>What do you mean by "used by the implementation"? The OP said "at run
>time". On my system if I do:
>
> $ LANG=en_US.UTF-8 ./myproggie
>
>it indeed changes the behavior of how characters are interpreted
>at runtime.


But does it have *any* effect on what appears on your screen?

>I said nothing about the charset or encoding used by the
>compiler or how string literal are stored in binaries.
>
>> Let's see:
>>
>> fangorn:~ 171> LC_CTYPE=fr_CA dbug ./myproggie LC_CTYPE=fr_CA:
>> Command not found.
>>
>> Doesn't Linux count as a Unix system any more?

>
>Actually I meant LANG=fr_CA but this is clearly a shell feature so let's
>not get too pedantic about it.


Confusing Unix features and shell features is quite embarrassing, for a
Unix user...

>You've embarrassed yourself enough by acknowledging you use C shell :->


I am NOT using C shell

Dan
--
Dan Pop
DESY Zeuthen, RZ group
Email: (E-Mail Removed)
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Learn to speak French with rocket French. jose perez NZ Computing 1 12-24-2006 04:25 AM
UrlEncode French characters - wrong encoding John C. ASP .Net 5 02-24-2006 05:00 PM
Request.QueryString() is stripping out French characters =?Utf-8?B?THU=?= ASP .Net 4 09-02-2005 09:02 AM
French characters messed up =?Utf-8?B?U2ltb24gV2FsbGlz?= ASP .Net 1 06-15-2004 03:47 AM
French characters and Perl gusmeister Perl 2 06-03-2004 03:32 AM



Advertisments