Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > C Programming > Unicode Initialization.

Reply
Thread Tools

Unicode Initialization.

 
 
Me
Guest
Posts: n/a
 
      06-08-2004
I am trying to compile some code Ive gotten from another and
I know I need a 16 bit unicode string, for he passes the pointer to
functions
that take a (uint16 *), however there are initializations that look like
this.

typedef unsigned short int ucs2_char;

....
....
....

static const ucs2_char form_feed[] = L"\f";

The above like in gcc give me the compiler error: 'invalid initializer'

When I change it to the following, everything works fine.

static const ucs2_char *form_feed = L"\f";


What is up with this error?




--
Using Opera's revolutionary e-mail client: http://www.opera.com/m2/
 
Reply With Quote
 
 
 
 
Stephen Sprunk
Guest
Posts: n/a
 
      06-08-2004
"Me" <> wrote in message
news ia.com...
> I am trying to compile some code Ive gotten from another and
> I know I need a 16 bit unicode string, for he passes the pointer to
> functions
> that take a (uint16 *), however there are initializations that look like
> this.
>
> typedef unsigned short int ucs2_char;


The correct type for UCS2 characters is wchar_t. Fix the code to use the
correct type.

> static const ucs2_char form_feed[] = L"\f";
>
> The above like in gcc give me the compiler error: 'invalid initializer'
>
> When I change it to the following, everything works fine.
>
> static const ucs2_char *form_feed = L"\f";
>
> What is up with this error?


What's up is you're using the wrong type; L"\f" is a wide character literal,
not an array of unsigned short ints. The latter should give you a warning
as well, since you're doing an implicit conversion between wchar_t[] and
unsigned short*, but your compiler may not be smart enough to catch that.


typedef unsigned short int ucs2_char;
static const ucs2_char form_feed[] = L"\f";
foo.c:2: warning: initialization from incompatible pointer type

typedef unsigned short int ucs2_char;
static const ucs2_char form_feed[] = L"\f";
foo.c:2: invalid initializer

#include <wchar.h>
static const wchar_t *form_feed = L"\f";
static const wchar_t form_feed[] = L"\f";
[ no compile warnings or errors ]

S

--
Stephen Sprunk "Stupid people surround themselves with smart
CCIE #3723 people. Smart people surround themselves with
K5SSS smart people who disagree with them." --Aaron Sorkin

 
Reply With Quote
 
those who know me have no need of my name
Guest
Posts: n/a
 
      06-08-2004
in comp.lang.c i read:
>"Me" <> wrote in message
>news kia.com...


>> I am trying to compile some code Ive gotten from another and I know I
>> need a 16 bit unicode string, for he passes the pointer to functions
>> that take a (uint16 *), however there are initializations that look like
>> this.
>>
>> typedef unsigned short int ucs2_char;

>
>The correct type for UCS2 characters is wchar_t.


wchar_t is something -- perhaps ucs-2 or utf-16, or something else entirely.
i agree wchar_t should be used, but if each character must be a ucs-2 code-
point then wchar_t is not appropriate, and neither should L"" be used for a
literal string.

--
a signature
 
Reply With Quote
 
Stephen Sprunk
Guest
Posts: n/a
 
      06-08-2004
"those who know me have no need of my name" <not-a-real->
wrote in message news:...
> in comp.lang.c i read:
> >"Me" <> wrote in message
> >news kia.com...
> >> I am trying to compile some code Ive gotten from another and I know I
> >> need a 16 bit unicode string, for he passes the pointer to functions
> >> that take a (uint16 *), however there are initializations that look

like
> >> this.
> >>
> >> typedef unsigned short int ucs2_char;

> >
> >The correct type for UCS2 characters is wchar_t.

>
> wchar_t is something -- perhaps ucs-2 or utf-16, or something else

entirely.
> i agree wchar_t should be used, but if each character must be a ucs-2

code-
> point then wchar_t is not appropriate, and neither should L"" be used for

a
> literal string.


Good point; whcar_t is UCS-2 on every platform I've used so I didn't
consider it might differ on another platform. Either way, I think it's what
the original author (and our poster) intended to use, and it's the simplest
and most portable solution for dealing with Unicode.

S

--
Stephen Sprunk "Stupid people surround themselves with smart
CCIE #3723 people. Smart people surround themselves with
K5SSS smart people who disagree with them." --Aaron Sorkin

 
Reply With Quote
 
Lew Pitcher
Guest
Posts: n/a
 
      06-08-2004
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Stephen Sprunk wrote:

> "those who know me have no need of my name" <not-a-real->
> wrote in message news:...
>
>>in comp.lang.c i read:
>>
>>>"Me" <> wrote in message
>>>news. nokia.com...
>>>
>>>>I am trying to compile some code Ive gotten from another and I know I
>>>>need a 16 bit unicode string, for he passes the pointer to functions
>>>>that take a (uint16 *), however there are initializations that look

>
> like
>
>>>>this.
>>>>
>>>>typedef unsigned short int ucs2_char;
>>>
>>>The correct type for UCS2 characters is wchar_t.

>>
>>wchar_t is something -- perhaps ucs-2 or utf-16, or something else

>
> entirely.
>
>>i agree wchar_t should be used, but if each character must be a ucs-2

>
> code-
>
>>point then wchar_t is not appropriate, and neither should L"" be used for

>
> a
>
>>literal string.

>
>
> Good point; whcar_t is UCS-2 on every platform I've used so I didn't
> consider it might differ on another platform.


FWIW, I believe that wchar_t can refer to one of the IBM
double-byte-character-set (DBCS) EBCDICs when used in IBM's C compiler
on the mainframe.


- --

Lew Pitcher, IT Consultant, Enterprise Application Architecture
Enterprise Technology Solutions, TD Bank Financial Group

(Opinions expressed here are my own, not my employer's)
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (MingW32)

iD8DBQFAxfSVagVFX4UWr64RAlt4AKDyngVYstrafTQ42C0mFI i3jdVo6gCfTrNf
RI88VNCppIvIsrV9LFTNPpk=
=uB9O
-----END PGP SIGNATURE-----
 
Reply With Quote
 
Me
Guest
Posts: n/a
 
      06-08-2004

Thanx for your input, but here is the problem.
First, I work for a big telecom company (you probably are using their
phone right now).
In my project I am porting the phone code to run in Linux so developers
can debug it.
The CDMA specification uses two byte unicode characters and much of the
code uses the L""
initializer.

They create a type called ucs2_char that is unsigned short.

I at first made the ucs2_char to be wchar_t but I found out that in linux
wchar_t is 4 bytes in size (4 byte unicode UTF-32).

What do I do....?.... any suggestions?

Also, is there a type in linux for a 2 byte unicode (UTF-16)?

And....is the L"" initializer, in Linux, only for 4 byte unicode or can I
configure this in gcc or linux?



On Tue, 08 Jun 2004 00:01:59 GMT, Stephen Sprunk <>
wrote:

> "Me" <> wrote in message
> news ia.com...
>> I am trying to compile some code Ive gotten from another and
>> I know I need a 16 bit unicode string, for he passes the pointer to
>> functions
>> that take a (uint16 *), however there are initializations that look like
>> this.
>>
>> typedef unsigned short int ucs2_char;

>
> The correct type for UCS2 characters is wchar_t. Fix the code to use the
> correct type.
>
>> static const ucs2_char form_feed[] = L"\f";
>>
>> The above like in gcc give me the compiler error: 'invalid initializer'
>>
>> When I change it to the following, everything works fine.
>>
>> static const ucs2_char *form_feed = L"\f";
>>
>> What is up with this error?

>
> What's up is you're using the wrong type; L"\f" is a wide character
> literal,
> not an array of unsigned short ints. The latter should give you a
> warning
> as well, since you're doing an implicit conversion between wchar_t[] and
> unsigned short*, but your compiler may not be smart enough to catch that.
>
>
> typedef unsigned short int ucs2_char;
> static const ucs2_char form_feed[] = L"\f";
> foo.c:2: warning: initialization from incompatible pointer type
>
> typedef unsigned short int ucs2_char;
> static const ucs2_char form_feed[] = L"\f";
> foo.c:2: invalid initializer
>
> #include <wchar.h>
> static const wchar_t *form_feed = L"\f";
> static const wchar_t form_feed[] = L"\f";
> [ no compile warnings or errors ]
>
> S
>




--
Using Opera's revolutionary e-mail client: http://www.opera.com/m2/
 
Reply With Quote
 
Stephen Sprunk
Guest
Posts: n/a
 
      06-08-2004
"Me" <> wrote in message
news ia.com...
> I at first made the ucs2_char to be wchar_t but I found out that in linux
> wchar_t is 4 bytes in size (4 byte unicode UTF-32).
>
> What do I do....?.... any suggestions?
>
> Also, is there a type in linux for a 2 byte unicode (UTF-16)?
>
> And....is the L"" initializer, in Linux, only for 4 byte unicode or can I
> configure this in gcc or linux?


-fshort-wchar will give you a 2-byte wchar_t (UTF-16, not UCS-2) with gcc
2.97 and later. I haven't tested whether this makes wide string literals
compatible with unsigned short *, but it seems likely.

Any further questions on gcc should be directed to gnu.gcc.help, but this
should get you started:
http://gcc.gnu.org/onlinedocs/gcc-3....0Gen%20Options

S

--
Stephen Sprunk "Stupid people surround themselves with smart
CCIE #3723 people. Smart people surround themselves with
K5SSS smart people who disagree with them." --Aaron Sorkin

 
Reply With Quote
 
kal
Guest
Posts: n/a
 
      06-09-2004
Lew Pitcher <> wrote in message news:<uwmxc.19904$ >...

> FWIW, I believe that wchar_t can refer to one of the IBM
> double-byte-character-set (DBCS) EBCDICs when used in IBM's C compiler
> on the mainframe.


Perhaps so in so far as size of characters (in bits) are concerned.
Even in this regard sometimes what are called DBCS are actually MBCS
(MultiByte Character Set.)

EBCDIC descended from punched cards. It went from 6-bit BCD to 8-bit
extended BCD (EBCDIC). But ASCII descended from telegraph. It went
from 5-bit telegraph codes to 7-bit ASCII to 8-bit ASCII etc. These
two schemes implement entirely different code points.

Now, wchar_t almost always refers to UCS-2 or UTF-16. The differences
between UCS-2 and UTF-16 have been worked out a few years ago and as
far as code values are concerned they are both the same at present.
The first 128 characters of these are the same as the 7-bit ASCII.
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are Off
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
value & default initialization, and copy initialization Taras_96 C++ 3 10-30-2009 10:51 AM
copy initialization and direct initialization from C++ Primer pauldepstein@att.net C++ 5 03-26-2009 06:32 PM
initialization of array as a member using the initialization list aaragon C++ 2 11-02-2008 04:57 PM
Initialization of non-integral type in initialization list anongroupaccount@googlemail.com C++ 6 12-11-2005 09:51 PM
TEXT macro for ascii and unicode initialization Blue C++ 7 02-13-2004 12:53 AM



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55