Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > C Programming > Using Unicode in C programs

Reply
Thread Tools

Using Unicode in C programs

 
 
Marco Iannaccone
Guest
Posts: n/a
 
      09-01-2005
I'd like to start using Unicod (especially UTF- in my C programs, and
would like some infos on how to start.
Can you tell me some documents (possibily online) explaining Unidoce
and UTF-8, and how I can use them in my programs (writing and reading
from file, from the console, processing Unicode strings and chars
inside the program, etc...)?

Thanx

 
Reply With Quote
 
 
 
 
Simon Biber
Guest
Posts: n/a
 
      09-01-2005
Marco Iannaccone wrote:
> I'd like to start using Unicod (especially UTF- in my C programs, and
> would like some infos on how to start.
> Can you tell me some documents (possibily online) explaining Unidoce
> and UTF-8, and how I can use them in my programs (writing and reading
> from file, from the console, processing Unicode strings and chars
> inside the program, etc...)?


C provides a concept of wide characters (arrays of wchar_t) and
multibyte characters (arrays of char where each character may take up
more than one byte). The C standard defines functions for converting
between wide and multibyte representations. The standard does not
specify what encoding these two representational forms take.

On at least one platform, depending on the current locale setting, the
wide characters built in to C represent Unicode characters, and the
multibyte characters represent the UTF-8 form.

The following program attempts to set the locale to en_AU.UTF-8, which
means Australian English in UTF-8 encoding. The language portion doesn't
matter, just the encoding does. It then takes a UTF-8 string (which
happens to contain Simplified Chinese characters), and converts it to
the wide character representation, which on my platform is equivalent to
Unicode.

#include <locale.h>
#include <stdlib.h>
#include <stdio.h>

int main(void)
{
wchar_t ucs2[5];
if(!setlocale(LC_ALL, "en_AU.UTF-8"))
{
printf("Unable to set locale to Australian English in UTF-8\n");
return 0;
}

/* The UTF-8 representation of string "水调*头"
(four Chinese characters pronounced shui3 diao4 ge1 tou2) */
char *utf8 = "\xE6\xB0\xB4\xE8\xB0\x83\xE6\xAD\x8C\xE5\xA4\xB4" ;

mbstowcs(ucs2, utf8, sizeof ucs2 / sizeof *ucs2);

printf("UTF-8: ");
for(char *p = utf8; *p; p++)
printf("%02X ", (unsigned)(unsigned char)*p);
printf("\n");

printf("Unicode: ");
for(wchar_t *p = ucs2; *p; p++)
printf("U+%04lX ", (unsigned long) *p);
printf("\n");

return 0;
}

[sbiber@eagle c]$ c99 -Wall utf8ucs2.c -o utf8ucs2
[sbiber@eagle c]$ ./utf8ucs2
UTF-8: E6 B0 B4 E8 B0 83 E6 AD 8C E5 A4 B4
Unicode: U+6C34 U+8C03 U+6B4C U+5934

I'd be interested to know how widespread this technique works. Is it
portable?

--
Simon.
 
Reply With Quote
 
 
 
 
Alexei A. Frounze
Guest
Posts: n/a
 
      09-01-2005
"Marco Iannaccone" <(E-Mail Removed)> wrote in message
news:(E-Mail Removed) oups.com...
> I'd like to start using Unicod (especially UTF- in my C programs, and
> would like some infos on how to start.
> Can you tell me some documents (possibily online) explaining Unidoce
> and UTF-8, and how I can use them in my programs (writing and reading
> from file, from the console, processing Unicode strings and chars
> inside the program, etc...)?


The best and the most authorative source of info on all aspects of Unicode
is www.unicode.org.
At least read the Unicode FAQ and the article on Unicode "To the BMP and
beyond!" by Eric Muller of Adobe Systems (the doc must be linked somewhere
at unicode.org -- or just google for it). Read that info with attention.
By default, Unicode isn't guaranteed to be supported by anything in every
compiler on every system, unlike ASCII. But, to the best of my knowledge
recent linux distros support UTF-8 in functions like printf() and fopen().
Once again, make use of www.unicode.org.

Alex


 
Reply With Quote
 
Michael B Allen
Guest
Posts: n/a
 
      09-02-2005
On Thu, 01 Sep 2005 02:53:57 -0700, Marco Iannaccone wrote:

> I'd like to start using Unicod (especially UTF- in my C programs, and
> would like some infos on how to start.
> Can you tell me some documents (possibily online) explaining Unidoce
> and UTF-8, and how I can use them in my programs (writing and reading
> from file, from the console, processing Unicode strings and chars
> inside the program, etc...)?


If you would like a quick intro into what your up against see this link:

http://www.io plex.com/~miallen/libmba/dl/docs/ref/text_details.html

It describes an api used to improve portability of code across different
platforms which you may or may not be concerned with but it does describe
the basics of working with Unicode in C.

Mike

 
Reply With Quote
 
Marco Iannaccone
Guest
Posts: n/a
 
      09-02-2005
Thanx a lot! (and thanx to everyone for helping! I'll start
studying...! )

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Can't delete certain programs using Add or Remove Programs rbhudson47 Computer Support 3 10-14-2007 12:54 AM
retrieveing programs deleted in add or remove programs in control panel truebluedave Computer Support 2 08-23-2005 10:32 PM
Programs take a long time to launch from A.Programs Me MCSE 9 01-20-2005 04:05 PM
compile C programs with UNIX system calls (= Unix Programs??) jrefactors@hotmail.com C++ 12 01-10-2005 03:35 AM
Re: How to see all programs after Start -> All Programs Ben Leal Computer Support 1 08-06-2003 01:58 AM



Advertisments