![]() |
Problems with UTF-8 on Windows
Hello Friends,
I am working on a project to support internationalization for a existing project. While supporting UTF-8 I am facing a problem , while doing POC. I have a C string which I have declared as const char* utf8buf = "Bienvenue à l'anglais "; I want to supporint UTF-8 for I/0 and wchat_t strings for internal manipulations. So I am setting locale to setlocale(LC_CTYPE,"UTF8"); before I start with the main code for string handling. Then I am using MultiByteToWideChar (using codepage as CP_UTF8) to convert it to wstring. Then again before output I am converting the string back to UTF8 format using WideCharToMultiByte. The problem is after getting back the UTF8 string after above conversion , when I am printing the string, I am getting "Bienvenue l'anglais" as output , which is not same as the input utfbuf. Does C++ string class support UTF-8 ? In real environment , we are planning to get the UTF8 strings from MySQL database. How can I correct this? Is there any other way in C/C++ to represent UTF8 strings? Thanks, Aman |
Re: Problems with UTF-8 on Windows
amandeep.bhat...@gmail.com skrev: > Hello Friends, > > I am working on a project to support internationalization for a > existing project. > > While supporting UTF-8 I am facing a problem , while doing POC. > > I have a C string > which I have declared as > const char* utf8buf = "Bienvenue à l'anglais "; The above is not valid utf-8. > > I want to supporint UTF-8 for I/0 and wchat_t strings for internal > manipulations. So I am setting locale to setlocale(LC_CTYPE,"UTF8"); > before I start with the main code for string handling. Now we enter implementation defined territory. > > Then I am using MultiByteToWideChar (using codepage as CP_UTF8) to > convert it to wstring. And this is not C++ but Windows and thus off-topic. > > Then again before output I am converting the string back to UTF8 format > using WideCharToMultiByte. Once again off-topic. > > The problem is after getting back the UTF8 string after above > conversion , when I am printing the string, I am getting "Bienvenue > l'anglais" as output , which is not same as the input utfbuf. > > Does C++ string class support UTF-8 ? Well.... the short answer is no. You will have no problem storing an utf-8 buffer in a std::string, but accesss to individual characters is off: string[n] might be a character, but it could also be part of an escape sequence. > > In real environment , we are planning to get the UTF8 strings from > MySQL database. There is no problem getting utf-8 from a MySQL database, but I doubt that there is any reason to store it in a std::string (but it will not lead to an incorrect program). > > How can I correct this? Correct what? The problem with the missing á above could very well be related to the fact that the string above is not valid utf8, but you should go to the platform specific group (perhaps something like microsoft.public.internationalization?) for that part. > > Is there any other way in C/C++ to represent UTF8 strings? You can store it in a variety of ways. The most natural way for many applications would be to convert at APIs - for instance at the point you get the data from your database. If you expect to keep large amounts of strings in memory and if you expect UTF-8 would be a smart internal format, you should look for a utf8-string class. Most probably there will already be some nice classes out there and I vaguely remember having read something about utf8-strings in boost (and that is always the first place I look). /Peter |
| All times are GMT. The time now is 08:36 AM. |
Powered by vBulletin®. Copyright ©2000 - 2013, vBulletin Solutions, Inc.
SEO by vBSEO ©2010, Crawlability, Inc.