Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > C++ > Re: Best way to handle UTF-8 in C++

Reply
Thread Tools

Re: Best way to handle UTF-8 in C++

 
 
thomas
Guest
Posts: n/a
 
      05-18-2010
On May 18, 11:39*am, Peter Olcott <NoS...@OCR4Screen.com> wrote:
> On 5/17/2010 10:19 PM, thomas wrote:
>
> >>> wstring uses 16bit characters. string uses 8bit
> >>> characters. Both are not suitable for variable size
> >>> characters like utf8 char.

>
> >>> Regards

>
> >>> Marek- Hide quoted text -

>
> >> - Show quoted text -- Hide quoted text -

>
> >> - Show quoted text -

>
> > In addition, the basic_ios class can control formatting, *locale*,
> > access to buffers. So the support is complete IMHO.

>
> Can it validate a string of bytes representing UTF-8 and then correctly
> translate them to UTF-32?


codecvt() can perfectly meet your needs.
Check the link: http://www.cplusplus.com/reference/std/locale/codecvt/
 
Reply With Quote
 
 
 
 
Joshua Maurice
Guest
Posts: n/a
 
      05-18-2010
On May 17, 8:39*pm, thomas <freshtho...@gmail.com> wrote:
> On May 9, 10:07*pm, "Peter Olcott" <NoS...@OCR4Screen.com> wrote:
>
>
>
> > "James Kanze" <james.ka...@gmail.com> wrote in message

>
> >news:5b6ec73a-f863-4f41-915f-....

>
> > > On May 8, 6:15 am, "DaveB" <DBu...@usenet.net> wrote:
> > >> Victor Bazarov wrote:

>
> > > * *[...]
> > >> Victor, why not just "see" what he is asking and give him
> > >> what
> > >> he needs: the answer!

>
> > > Because he can't give an answer if the question isn't
> > > clear.

>
> > >> Most things do not need a "dancing around it" style.
> > >> Assess it
> > >> as best you can in the first message, then blurt out an
> > >> "answer"!

>
> > > That would be rather irresponsible, don't you think? *Make
> > > a
> > > random guess as to what is being asked, and then answer
> > > that?

>
> > > The "obvious" answer is that std::string does support
> > > UTF-8.
> > > Until we know what is actually meant by "support" UTF-8,
> > > that's
> > > the only possible answer. *I rather suspect that the
> > > original
> > > poster wanted more support than std::string gives, but
> > > until he
> > > specifies what, it's impossible to give an answer.

>
> > > --
> > > James Kanze

>
> > My original question was sufficiently complete. *I said that
> > I wanted a string class that provided the std::string
> > interface and had an underlying utf-8 representation. It
> > doesn't take psychic powers to know that substr() must be
> > implemented differently. I took Victor's request for more
> > information to simply be head games so I ignored the
> > request.- Hide quoted text -

>
> > - Show quoted text -

>
> You'd better provide an example I guess.


Can it properly take substrings based on grapheme clusters? Can it
properly sort strings using the German phonebook sort rule or the
French sort rule? Can this sort be configured to eliminate duplicate
strings which may differ in encoded Unicode code points but be
equivalent by the collation rule? How fast are these operations?

Also, how portable are these operations? From my experience, locales
are not terribly portable, and it's not terribly practical to just
tell your customer to "install X and Y locales", especially when you
want to work on Unix-like systems and Windows.
 
Reply With Quote
 
 
 
 
Marek Borowski
Guest
Posts: n/a
 
      05-18-2010
On 2010-05-17 06:44, DaveB wrote:
> Marek Borowski wrote:
>> On 2010-05-14 06:00, DaveB wrote:
>>> Marek Borowski wrote:
>>>> On 09-05-2010 16:36, Sam wrote:
>>>>> Marek Borowski writes:
>>>>>
>>>
>>>> Character is representation of number
>>>
>>> You were right before you spoke a second time. The immediately above
>>> is just wrong (of course, but following your preceding statement, it
>>> is inexcusibly wrong).
>>>
>>>> - byte or
>>>> set of bytes.
>>>
>>> Make up your mind!!!
>>>

>> You divided my sentence and changed meaning.

>
> Really? Do explain if it is important, else realize I hardly ever post in
> here and you may be waiting for a reply for a long time.
>

What a pity. I don't care. You write nothing useful.

>> I mean that in modern
>> world character as a type is a treated as abstract type, not machine
>> type.

>
> You're not doing much better in trying to convince me of that either. Are
> you surrrre? I know this answer, I just don't know that you do or don't,
> but I'm leaning toward you don't and are just accepting of the status
> quo. I'm not having fun at your expense and hopefully everyone (including
> you) is not so fast to jump to the conclusion that their livelihood (as a
> consultant for an obsolete programming language perhaps) is over and that
> they cannot retrain or retire or something. I've wrote entire programs
> and been on system development projects large and small doing only parts
> and sometimes leading sometimes following. That is the past. You couldn't
> pay me enough money to do that or go back there again (I don't care how
> much the offer is). How much time does it take before someone figures
> stuff out? For some apparently, a lifetime! And then they diss everything
> as a threat to their secure position with prowess with the obsolete
> thing? I have no advice for them, nor for anyone else here. I'm not that
> kind of consultant. Use your "social network" or something.
>

I wouldn't pay you even 1$, it's hard to understand what you are talking
about (in almost every you post). You are frustrated man.
You write too much about yourself not C++ or programing. Most people are
probably not interested about you private problems. Go to psychologist.

>> The same like Point, Figure or whatever you implement.
>>

If you were more factual you would comment that.

>>>> Logically string is collection of characters not bytes.
>>>
>>> Consider thinking before you speak next time instead of just
>>> blurting out an answer and then thinking aloud in USENET text. (God,
>>> that's annoying).

>> Look at first word. LOGICALLY! string is a collection of any type
>> which can be interpreted as character.

>
> I was being facetious having responded to a number of posts in the thread
> with the assumption that people would read the thread and listen to the
> harmonies instead of just the individual notes. My bad? "string is...
> blah, blah", is subjective semantics.
>
>>
>>>>
>>>> You could say that there is no such thing as "unicode char".
>>>
>>> Don't even tempt me (OK, not temptation required: I pretty much
>>> already asserted such).
>>> (See, I knew this was going to be a fun thread! Don't diss me so
>>> soon next time, eh, go ahead, it's all good.)
>>>

>> Yes, really funny, I see that for some It's really hard to understand
>> that it world is moving forward.

>
> You see "funny", and I see "sad" that no one is serious about engineering
> anymore.
>

Nationals characters are still real problem for software engineering.
I know what I am saying because I have to use it. You could add some
value but you don't want and you prefer to write boring stories.

>> Wake up. OS/360 times have gone.

>
> Why don't YOU teach me here about this new fangled Unicode thing and why
> it is so good and of course the end all of all things large and small?
> Hmm.
>
>

Fist you will have to understand that English in not the only one
language. Second ask a question. Unicode is a must. I am not happy with
that but it's better to have Unicode than 10 code pages for one language
- I was getting data for my system in 10 code pages without any
information which one was used. UFT-8 is a good compromise between
having all characters and waste of space. I wish C++ did not support it.


Regards

Marek
 
Reply With Quote
 
Marek Borowski
Guest
Posts: n/a
 
      05-18-2010
On 2010-05-18 05:17, thomas wrote:
>>> wstring uses 16bit characters. string uses 8bit
>>> characters. Both are not suitable for variable size
>>> characters like utf8 char.

>
> Haven't finished all the threads. But the stream library of STL
> already provides wide character streams like basic_ostream<wchar_t>.
> So why bother inventing a new wheel?

To invent a track . wstring is good, but there is need to add
conversion from utf8 to wstring and back again. Potentially, it may be a
performance issue. 2nd to save a memory. utf8 string are smaller than
wchat_t strings.


Regards

Marek


 
Reply With Quote
 
Joshua Maurice
Guest
Posts: n/a
 
      05-18-2010
On May 18, 1:31*pm, Marek Borowski <marek_remo...@borowski.com> wrote:
> On 2010-05-18 05:17, thomas wrote:>>> wstring uses 16bit characters. string uses 8bit
> >>> characters. Both are not suitable for variable size
> >>> characters like utf8 char.

>
> > Haven't finished all the threads. But the stream library of STL
> > already provides wide character streams like basic_ostream<wchar_t>.
> > So why bother inventing a new wheel?

>
> To invent a track . wstring is good, but there is need to add
> conversion from utf8 to wstring and back again. Potentially, it may be a
> performance issue. 2nd to save a memory. utf8 string are smaller than
> wchat_t strings.


Could people stop implying that wchar_t and wstrings actually mean a
specific size or encoding? Their size is entirely platform dependent.
Either you are proposing to not write portable code, or you are
proposing to use 16 bit "chars" on some systems and 23 bit "chars" on
other systems, which is also just as silly.

utf8 strings are smaller than utf32 strings always for all spoken
languages today. utf16 is also always smaller than utf32 strings for
all spoken languages today. However, for not Latin scripts, utf16 can
be, and for some languages utf16 always is, a more space efficient
encoding than utf8.
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
best way to handle sql decimal fields Steve Richter ASP .Net 3 03-31-2005 02:55 PM
What's the best way to handle showing/editing this data? Alan Silver ASP .Net 4 02-16-2005 06:23 PM
Best way to handle documents in ASP.NET Thomas Scheiderich ASP .Net 11 05-20-2004 05:57 PM
Question: Best way to handle DBNULL in datareaders Ravikanth[MVP] ASP .Net 6 07-18-2003 10:51 AM
Re: Best way to handle a mutually exclusive situation gabriel XML 0 06-25-2003 08:08 AM



Advertisments
 



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57