Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > C++ > Re: Best way to handle UTF-8 in C++

Reply
Thread Tools

Re: Best way to handle UTF-8 in C++

 
 
DaveB
Guest
Posts: n/a
 
      05-08-2010
Peter Olcott wrote:
> "Victor Bazarov" <(E-Mail Removed)> wrote in
> message news:hruhqu$hqt$(E-Mail Removed)-september.org...
>> On 5/6/2010 9:45 AM, Peter Olcott wrote:
>>> I am looking for a way to handle UTF-8 text in my C++
>>> application. The ideal case would be an STL class that
>>> handles UTF-8. What is the next best thing?

>>
>> What do you mean by "handle"? STL class? Don't you have
>> the compiler documentation? If there is one, you already
>> have all information you need. Want more? Buy a book on
>> the Standard library. There are several that many
>> consider decent. Next best thing? Google.

>
> I must be able to use UTF-8 strings in my C++ application. I
> want to know the best way to do this. I prefer an interface
> that works the same way as the STL interface.


"the same way"? Really? What way is that (obfuscation)? Do explain
please.



 
Reply With Quote
 
 
 
 
DaveB
Guest
Posts: n/a
 
      05-08-2010
Victor Bazarov wrote:
> On 5/6/2010 10:11 AM, Peter Olcott wrote:
>> "Victor Bazarov"<(E-Mail Removed)> wrote in
>> message news:hruhqu$hqt$(E-Mail Removed)-september.org...
>>> On 5/6/2010 9:45 AM, Peter Olcott wrote:
>>>> I am looking for a way to handle UTF-8 text in my C++
>>>> application. The ideal case would be an STL class that
>>>> handles UTF-8. What is the next best thing?
>>>
>>> What do you mean by "handle"? STL class? Don't you have
>>> the compiler documentation? If there is one, you already
>>> have all information you need. Want more? Buy a book on
>>> the Standard library. There are several that many
>>> consider decent. Next best thing? Google.

>>
>> I must be able to use UTF-8 strings in my C++ application. I
>> want to know the best way to do this. I prefer an interface
>> that works the same way as the STL interface.

>
> What do you mean by "use" and in what way can't you "use" the UTF-8
> strings already? There is no such thing as "STL interface", perhaps
> you can explain what you mean by "the same way". I can start
> guessing, but it's much better if you just specify what exactly
> you're trying to accomplish. Try to refrain from using such generic
> terms as "STL interface" or "use". For example, you can say, "I need
> to be able to figure out whether there are uppercase characters in my
> 'string', like the standard function 'isupper' does"...


I just responded with the same to the OP. My post was much more concise
though. But then I didn't have the side goal of seeking employment from
the question that you undoubtedly do Victor Borza?


 
Reply With Quote
 
 
 
 
DaveB
Guest
Posts: n/a
 
      05-08-2010
Peter Olcott wrote:
> "Victor Bazarov" <(E-Mail Removed)> wrote in
> message news:hruqhc$lo6$(E-Mail Removed)-september.org...
>> On 5/6/2010 10:11 AM, Peter Olcott wrote:
>>> "Victor Bazarov"<(E-Mail Removed)> wrote in
>>> message news:hruhqu$hqt$(E-Mail Removed)-september.org...
>>>> On 5/6/2010 9:45 AM, Peter Olcott wrote:
>>>>> I am looking for a way to handle UTF-8 text in my C++
>>>>> application. The ideal case would be an STL class that
>>>>> handles UTF-8. What is the next best thing?
>>>>
>>>> What do you mean by "handle"? STL class? Don't you
>>>> have
>>>> the compiler documentation? If there is one, you
>>>> already
>>>> have all information you need. Want more? Buy a book
>>>> on
>>>> the Standard library. There are several that many
>>>> consider decent. Next best thing? Google.
>>>
>>> I must be able to use UTF-8 strings in my C++
>>> application. I
>>> want to know the best way to do this. I prefer an
>>> interface
>>> that works the same way as the STL interface.

>>
>> What do you mean by "use" and in what way can't you "use"
>> the UTF-8 strings already? There is no such thing as "STL
>> interface", perhaps you can explain what you mean by "the
>> same way". I can start guessing, but it's much better if
>> you just specify what exactly you're trying to accomplish.
>> Try to refrain from using such generic terms as "STL
>> interface" or "use". For example, you can say, "I need to
>> be able to figure out whether there are uppercase
>> characters in my 'string', like the standard function
>> 'isupper' does"...

>
> I want a string class that works exactly the same way as
> std::string, except implements UTF-8. This means that the
> interface can remain the same, (all of the member functions
> have the same name and same parameters) but the underlying
> meaning may be different.
>


One can only hope that there will be support and help for you when the
war is over. (If you survive it, that is).


 
Reply With Quote
 
DaveB
Guest
Posts: n/a
 
      05-08-2010
Victor Bazarov wrote:
> On 5/6/2010 1:39 PM, Peter Olcott wrote:
>> "Victor Bazarov"<(E-Mail Removed)> wrote in
>> message news:hruqhc$lo6$(E-Mail Removed)-september.org...
>>> On 5/6/2010 10:11 AM, Peter Olcott wrote:
>>>> "Victor Bazarov"<(E-Mail Removed)> wrote in
>>>> message news:hruhqu$hqt$(E-Mail Removed)-september.org...
>>>>> On 5/6/2010 9:45 AM, Peter Olcott wrote:
>>>>>> I am looking for a way to handle UTF-8 text in my C++
>>>>>> application. The ideal case would be an STL class that
>>>>>> handles UTF-8. What is the next best thing?
>>>>>
>>>>> What do you mean by "handle"? STL class? Don't you
>>>>> have
>>>>> the compiler documentation? If there is one, you
>>>>> already
>>>>> have all information you need. Want more? Buy a book
>>>>> on
>>>>> the Standard library. There are several that many
>>>>> consider decent. Next best thing? Google.
>>>>
>>>> I must be able to use UTF-8 strings in my C++
>>>> application. I
>>>> want to know the best way to do this. I prefer an
>>>> interface
>>>> that works the same way as the STL interface.
>>>
>>> What do you mean by "use" and in what way can't you "use"
>>> the UTF-8 strings already? There is no such thing as "STL
>>> interface", perhaps you can explain what you mean by "the
>>> same way". I can start guessing, but it's much better if
>>> you just specify what exactly you're trying to accomplish.
>>> Try to refrain from using such generic terms as "STL
>>> interface" or "use". For example, you can say, "I need to
>>> be able to figure out whether there are uppercase
>>> characters in my 'string', like the standard function
>>> 'isupper' does"...

>>
>> I want a string class that works exactly the same way as
>> std::string, except implements UTF-8.

>
> ...as opposed to *what*? UTF-8 is an encoding scheme. 'std::string'
> does *not* have an encoding scheme, it's a mere container of 'char'.
> Nothing more, nothing less. What *exactly* in it doesn't work NOW for
> you? Have you tried making the default 'char' unsigned? If your
> platform has 8-bit chars, and you make them unsigned, you got yourself
> UTF-8 storage type. And 'std::string' will provide functionality for
> storing elements of that type (by virtue of being defined as
> 'std::basic_string<char>'), and operations to manipulate that storage
> (append to, erase from, enumerate, etc.)
>
> So, once again, what do you mean by "implements UTF-8"?
>
>> This means that the
>> interface can remain the same, (all of the member functions
>> have the same name and same parameters) but the underlying
>> meaning may be different.

>
> "May be different"? If I rewrite 'std::string' for you and just make
> all functions return 0 and do nothing, would that be acceptable? That's
> a rhetorical question, BTW. If you just allow the "meaning"
> to be different, you still haven't specified anything. Does it *have
> to* be different? In what way?
>
> Could it be that you're don't know yet what you *need* from your
> class, which you hope will "handle" UTF-8? What *operations* do you
> hope it will help you perform on your "UTF-8" strings?
>


Victor, why not just "see" what he is asking and give him what he needs:
the answer! Most things do not need a "dancing around it" style. Assess
it as best you can in the first message, then blurt out an "answer"! Who
cares if it is or you are wrong? Make some headway fast. For example, if
you think UTF-8 sucks, reply "UTF-8 sucks." and move on and let him
figure it out. This dreary milking of syntactical and even semantical
nothingness is ... well it sucks!


 
Reply With Quote
 
DaveB
Guest
Posts: n/a
 
      05-08-2010
Joshua Maurice wrote:
>
> Let me try to explain. std::string has member functions like find and
> substring. When used to store UTF-8 for an 8 bit char, the indexes are
> in terms of 8 bit encoding units. However, generally a user does not
> want to work with indexes in terms of encoding units. They want to
> work with indexes in terms of encoded Unicode code points, or more
> probably Unicode grapheme clusters.


Good thing I'm not relevant to evaluating your resume. JK, your
corporate-coding "experience" shows up in the stock tickers though. I
dunno what to think anymore. I'll be cliche: you get what you pay for,
and easy come/easy go.


 
Reply With Quote
 
DaveB
Guest
Posts: n/a
 
      05-08-2010
Peter Olcott wrote:
> "Joshua Maurice" <(E-Mail Removed)> wrote in message
> news:(E-Mail Removed)...
> On May 6, 11:09 am, Victor Bazarov
> <(E-Mail Removed)> wrote:
>> On 5/6/2010 1:39 PM, Peter Olcott wrote:
>> Could it be that you're don't know yet what you *need*
>> from your class,
>> which you hope will "handle" UTF-8? What *operations* do
>> you hope it
>> will help you perform on your "UTF-8" strings?

> Preferably, our product would work with a Unicode string
> abstraction


In reality, someone with stake in the actual problem would seek to find
someone to state their problem if they could not instead of getting into
their current situation with you? (No offense, I'm sure you mean well,
but you have to understand your limitations). Your question has all the
warnings of "programming project gone awry".



 
Reply With Quote
 
DaveB
Guest
Posts: n/a
 
      05-08-2010
Ian Collins wrote:
> On 05/ 7/10 10:13 AM, Peter Olcott wrote:
>
> Peter, please do us all a favour and fix your quoting!
>
> How can you develop a complex application if to can't fix your
> (admittedly brain dead) news client??


He's not the stakeholder. Duh. Hopefully he is an employee and not a
consultant, because the latter, he is not!


 
Reply With Quote
 
DaveB
Guest
Posts: n/a
 
      05-08-2010
Peter Olcott wrote:
> "Ian Collins" <(E-Mail Removed)> wrote in message
> news:(E-Mail Removed)...
>> On 05/ 7/10 11:44 AM, Peter Olcott wrote:
>>> "Ian Collins"<(E-Mail Removed)> wrote in message
>>> news:(E-Mail Removed)...
>>>> On 05/ 7/10 10:13 AM, Peter Olcott wrote:
>>>>
>>>> Peter, please do us all a favour and fix your quoting!
>>>>
>>>> How can you develop a complex application if to can't
>>>> fix
>>>> your (admittedly brain dead) news client??
>>>
>>> I briefly tried Thunderbird and it was far too sluggish.
>>> I
>>> tried the recommended patch and it didn't work. I have no
>>> more time for these trivial aesthetic things.

>>
>> Then I'm sure I'm not the only one who has no more time
>> for deciphering your mangled posts.
>>
>> Remember, Usenet is a write once, read many medium.
>>
>> --
>> Ian Collins

>
> Which newsgroup reader works the best?
>
> Also I have many years worth of newsgroup posts stored on my
> hard drive using Outlook Express.
>
> When I briefly tried Thunderbird it looked like it suffered
> the same sort of problems as Open Office word. Open Office
> word, sometimes took several minutes to page up to the
> previous page. When you add up the cost of this (over a
> lifetime months of one's life are wasted) the "free" open
> office is far too expensive.
>
> There is also the learning curve cost. I also don't want to
> spend dozens of hours evaluating alternatives just because
> of inconsequential aesthetics.
>
> I mark all of the threads that I create so that Outlook
> express filters these messages to sort to the top. I don't
> want to wade through hundreds of irrelevant posts just to
> see my replies. This feature is essential to me, not wanting
> to burn up months of my life doing unnecessary work.


Wow, this thread is a good case study. Stakeholders, beware!


 
Reply With Quote
 
Paul Bibbings
Guest
Posts: n/a
 
      05-08-2010
"Peter Olcott" <(E-Mail Removed)> writes:

> The absence of other answers leads to the answer of build it
> myself.


Careful. That means `work', and work requires `time' - /valuable/ time!

Regards

Paul Bibbings
 
Reply With Quote
 
James Kanze
Guest
Posts: n/a
 
      05-08-2010
On May 6, 6:39 pm, "Peter Olcott" <(E-Mail Removed)> wrote:
> "Victor Bazarov" <(E-Mail Removed)> wrote in
> messagenews:hruqhc$lo6$(E-Mail Removed)-september.org...


[...]
> I want a string class that works exactly the same way as
> std::string, except implements UTF-8.


I think Victor's point is that std::string does implement UTF-8.
And ISO 8859-1, and EBCDIC, and any other encoding which uses
char (as opposed to UTF32, for example, which requires 32 bit
entities).

And I think he's only right to a point: in the end, an
std::string doesn't handle characters, it handles small
integers. In a single byte encoding, however, those small
integers are the same as your characters, with one character per
integer. So to advance one character, you can simply use ++ on
an std::string::iterator. UTF-8 does require more. And there's
no support for that "more" in C++ (including, as far as I know,
C++0x---in C++0x, you can have UTF-8 string literals, but you
can't take an std::string::iterator and advance it one UTF-8
character).

> This means that the interface can remain the same, (all of the
> member functions have the same name and same parameters) but
> the underlying meaning may be different.


It's not that easy. You can't simply implement something like
utf8_string_iterator:perator++()
{
underlying_iter += size(*underlying_iter);
}
since there might not be enough bytes in the string pointed to
by underlying_iter.

--
James Kanze
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
best way to handle sql decimal fields Steve Richter ASP .Net 3 03-31-2005 02:55 PM
What's the best way to handle showing/editing this data? Alan Silver ASP .Net 4 02-16-2005 06:23 PM
Best way to handle documents in ASP.NET Thomas Scheiderich ASP .Net 11 05-20-2004 05:57 PM
Question: Best way to handle DBNULL in datareaders Ravikanth[MVP] ASP .Net 6 07-18-2003 10:51 AM
Re: Best way to handle a mutually exclusive situation gabriel XML 0 06-25-2003 08:08 AM



Advertisments