"christian.bau" <> writes:
>> I'm using an utf8 state-machine I made to check and handle unicode
>> strings, and was wondering if strncmp could be used for comparing the
>> after check or if I should roll my own?
>
> strcmp will compare strings and return a result assuming that the data
> is signed char.
No, it won't.
strcmp's arguments are of type const char*; plain char may be either
signed or unsigned. But even if plain char is signed, 7.21.4p1 says:
The sign of a nonzero value returned by the comparison functions
memcmp, strcmp, and strncmp is determined by the sign of the
difference between the values of the first pair of characters
(both interpreted as unsigned char) that differ in the objects
being compared.
[...]
> The main problem is that with Unicode, just comparing code points
> isn't very meaningful. You'd have to put the code points into a
> canonical order at least to get any meaningful result. And when you do
> that, using strcmp is quite pointless.
I *think* that strcmp() returns correctly ordered results for UTF-8
strings. UTF-8 was carefully designed to make this work.
--
Keith Thompson (The_Other_Keith)
kst- <http://www.ghoti.net/~kst>
Nokia
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"