Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > C Programming > mbtowc - combining character

Reply
Thread Tools

mbtowc - combining character

 
 
Old Wolf
Guest
Posts: n/a
 
      04-03-2007
As far as I can see, mbtowc and mbstowcs assume that there is
exactly one wide character for each multi-byte sequence. How are
you meant to cope with MBS that correspond to two wide characters?

For example, if it is Unicode and the MBS represents a letter
with a combining diacritic.

 
Reply With Quote
 
 
 
 
klaushuotari@gmail.com
Guest
Posts: n/a
 
      04-03-2007
On 4 huhti, 02:18, "Old Wolf" <(E-Mail Removed)> wrote:
> As far as I can see, mbtowc and mbstowcs assume that there is
> exactly one wide character for each multi-byte sequence. How are
> you meant to cope with MBS that correspond to two wide characters?
>
> For example, if it is Unicode and the MBS represents a letter
> with a combining diacritic.


You aren't. That's purely implementation defined.

 
Reply With Quote
 
 
 
 
CBFalconer
Guest
Posts: n/a
 
      04-04-2007
Old Wolf wrote:
>
> As far as I can see, mbtowc and mbstowcs assume that there is
> exactly one wide character for each multi-byte sequence. How are
> you meant to cope with MBS that correspond to two wide characters?
>
> For example, if it is Unicode and the MBS represents a letter
> with a combining diacritic.


The same way you convert '\n' to a cr/lf output sequence.

--
Chuck F (cbfalconer at maineline dot net)
Available for consulting/temporary embedded and systems.
<http://cbfalconer.home.att.net>



--
Posted via a free Usenet account from http://www.teranews.com

 
Reply With Quote
 
=?utf-8?B?SGFyYWxkIHZhbiBExLNr?=
Guest
Posts: n/a
 
      04-04-2007
Old Wolf wrote:
> As far as I can see, mbtowc and mbstowcs assume that there is
> exactly one wide character for each multi-byte sequence.


There is exactly one wide character for each multi-byte sequence.

> How are
> you meant to cope with MBS that correspond to two wide characters?
>
> For example, if it is Unicode and the MBS represents a letter
> with a combining diacritic.


Those are two separate multi-byte sequences. The C functions work on
the character level, not on the glyph level.

 
Reply With Quote
 
Boudewijn Dijkstra
Guest
Posts: n/a
 
      04-04-2007
Op Wed, 04 Apr 2007 01:18:46 +0200 schreef Old Wolf
<(E-Mail Removed)>:
> As far as I can see, mbtowc and mbstowcs assume that there is
> exactly one wide character for each multi-byte sequence. How are
> you meant to cope with MBS that correspond to two wide characters?
>
> For example, if it is Unicode and the MBS represents a letter
> with a combining diacritic.


Perform canonical decomposition before converting.



--
Gemaakt met Opera's revolutionaire e-mailprogramma:
http://www.opera.com/mail/
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
mbtowc question Neil Booth C Programming 1 09-06-2007 06:30 PM
mbtowc recovery kyuupi C Programming 1 09-06-2007 06:02 PM
How do mbtowc() and wctomb() work? Ross C Programming 9 07-27-2006 08:16 PM
character encoding +missing character sequence raavi Java 2 03-02-2006 05:01 AM
getting the character code of a character in a string Velvet ASP .Net 9 01-19-2006 09:27 PM



Advertisments