On Friday, 22 February 2013 17:13:52 UTC+2, Andy Champ wrote:
> On 21/02/2013 21:48, Öö Tiib wrote:
> > What is the "this"? It should work. Currently most people use std::string
> > (that actually contains UTF-8 encoded text) for storing texts. I fully
> > agree with you that it is loose and unsafe thing. However it is unlikely
> > that some revolution is coming. Billions of lines of code and millions of
> > interfaces all over the world use that std::string and problems are
> > consistently elsewhere.
>
> std::string does not contain UTF-8 encoded text. It contains chars. If
> your implementation treats those chars as UTF-8 encoded characters, then
> fine - but that is NOT part of the standard, it's just something that
> *nix operating systems tend to do.
I did in fact describe most widespread practice. char is a byte by C++
standard keep there whatever encoding standard is silent. Other
possibility is to use std::wstring for texts if wchar_t can contain
UTF-16LE. It might help in Windows or with QT as GUI. That is anyway
minority maybe 20% of C++ code written.
> You might like to consider what happens when you resize a string to
> remove part of a multibyte character. There's nothing there to make it
> UTF safe...
There are no alternatives. Such and all other difficulties are normal
work. That is why developers are for.
> I suspect this is why fstream:
pen takes a char* - someone assumed that
> a char* was utf-8, and for those operating systems where a filename is
> unicode it's broken.
I repeat ... there are no serious support to Unicode in C++. fstream was
likely designed when no one thought that file names can be anything but
ASCII. UTF-8 is most popular encoding. Majority of HTML or other XML you
see in internet are in that. So it makes sense to use something what you
do not have to convert.