Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > C++ > What to prefer - TCHAR arrays, std::string or std::wstring ?

Reply
Thread Tools

What to prefer - TCHAR arrays, std::string or std::wstring ?

 
 
rohitpatel9999@yahoo.com
Guest
Posts: n/a
 
      08-02-2006
Hi

While developing any software, developer need to think about it's
possible enhancement for international usage and considering UNICODE.

I have read many nice articles/items in advanced C++ books (Effective
C++, More Effective C++, Exceptional C++, More Exceptional C++, C++
FAQs, Addison Wesley 2nd Edition)

Authors of these books have not considered UNICODE. So many of their
suggestions/guidelines confuse developers regarding what to use for
character-string members of class (considering exception-safety,
reusability and maintenance of code).

Many books have stated that:
Instead of using character arrays, always prefer using std::string.

My Questions is:

While developing generic Win32 app using C++ for Windows
(98/NT/2000/2003/XP), considering unicode for Windows NT/2000/2003/XP,
What to prefer - TCHAR arrays, std::string or std::wstring
for character-string members (name, address, city, state, country etc.)

of classes like Address, Customer, Vendor, Employee ?

What to prefer - TCHAR arrays, std::string or std::wstring ?

I truly appreciate any help or guideline.
Anand

 
Reply With Quote
 
 
 
 
Marcus Kwok
Guest
Posts: n/a
 
      08-02-2006
http://www.velocityreviews.com/forums/(E-Mail Removed) wrote:
> My Questions is:
>
> While developing generic Win32 app using C++ for Windows
> (98/NT/2000/2003/XP), considering unicode for Windows NT/2000/2003/XP,
> What to prefer - TCHAR arrays, std::string or std::wstring
> for character-string members (name, address, city, state, country etc.)
>
> of classes like Address, Customer, Vendor, Employee ?
>
> What to prefer - TCHAR arrays, std::string or std::wstring ?
>
> I truly appreciate any help or guideline.


Standard C++ does not know about the TCHAR type (I know what it
represents, but it is not a standard language feature), and formally
also does not know about Unicode (std::wstring isn't quite Unicode).
Handling Unicode can be a complex topic, and one on which I cannot claim
to be well versed in.

Your question is probably better suited for a Windows newsgroup.

--
Marcus Kwok
Replace 'invalid' with 'net' to reply
 
Reply With Quote
 
 
 
 
Phlip
Guest
Posts: n/a
 
      08-02-2006
rohitpatel9999 wrote:

> While developing any software, developer need to think about it's
> possible enhancement for international usage and considering UNICODE.


Negative. Programmers must prepare for _anything_. The requirement for
Unicode may or may not come next.

Prepare for anything by writing copious unit tests, and by folding as much
duplication as possible. If you duplicate the word "the" in two strings,
fold them into one.

If you then need to localize, read this:

http://flea.sourceforge.net/TFUI_localization.doc

Then incrementally move your strings into a pluggable resource, and
incrementally widen or convert your string variables. "Incrementally" means
one at a time, passing all tests after each small edit.

The myth that some important decisions must be made early, to avoid the cost
of a late change, is a self-fulfilling prophecy of defeat.

> Authors of these books have not considered UNICODE. So many of their
> suggestions/guidelines confuse developers regarding what to use for
> character-string members of class (considering exception-safety,
> reusability and maintenance of code).


Right. They all use std::string, because many programmers learned C first,
where a character array is still the simplest and most robust way to
represent a fixed-length string. So std::string should be the default,
without a real reason to use anything else. Such a reason could then switch
you to TCHAR, or to std::wstring, or to something else.

> My Questions is:
>
> While developing generic Win32 app using C++ for Windows
> (98/NT/2000/2003/XP), considering unicode for Windows NT/2000/2003/XP,
> What to prefer - TCHAR arrays, std::string or std::wstring
> for character-string members (name, address, city, state, country etc.)


Tell your "customer liaison", the person authorized to request features, if
you should spend 9 days working on their next feature, or 18 days working on
that feature + internationalization.

If they need only English, then use std::string everywhere you possibly can,
and something like CString for the remainder.

When they schedule a port to another language, you obtain a glossary for
that language _first_. Then you refactor your code to use something like
std::basic_string<TCHAR>.

If you truly need TCHAR in its WCHAR mode, then you must configure your
tests to run (and pass) with the _UNICODE version of your binary. You should
always pass all such tests, each time you change anything. Otherwise you
might make an innocent change that works in one mode, but breaks in another.

Further, not all code-pages can use WCHAR or wchar_t. Spanish, for example,
is the same code-page as English. Greek is a different code-page, but it
still uses 8-bit bytes. So you should only enable the few features you need
to support another language, and not all those languages need Unicode. Some
versions of Chinese don't need it.

If you truly need "one binary that presents all languages, mixed together",
then you need Unicode. And if you need a rare language like Sanskrit or
Inuit, that has no independent 8-bit code-page, then you will need Unicode.
Otherwise you probably don't.

From here, you must read a book on internationalization. Yet you don't do
_any_ of that research until your business side has selected a target
language. Otherwise you will just be writing speculative features that
_might_ work with any language.

So default to std::string, and keep your programming velocity high. That
helps ensure that your clients will be _able_ to eventually target the
international markets...

--
Phlip
http://c2.com/cgi/wiki?ZeekLand <-- NOT a blog!!!


 
Reply With Quote
 
rohitpatel9999@yahoo.com
Guest
Posts: n/a
 
      08-03-2006
Thank you for helpful suggestions.
Suggestion of using std::basic_string<TCHAR> is also good.

Client is sure that they will need UNICODE for few languages (e.g.
Japanese).
Client req. document did specify to make code C++ generic for UNICODE
consideration (but should not use MFC specific CString).

So (in Microfost Visual C++)
application build for Win98/ME will have MBCS defined
application build for Win2000/NT/2003/XP will have UNICODE and _UNICODE
defined.

Please guide me, (considering exception-safety, reusability and
maintenance of code).

What to prefer - TCHAR arrays, std::string or std::wstring ?

or Which of the following three classes is preferable ?

e.g.

/* Option 1 */
class Address
{
_TCHAR name[30];
_TCHAR addressline1[30];
_TCHAR addressline2[30];
_TCHAR city[30];
}


/* Option 2 */
class Address
{
std::basic_string<TCHAR> name;
std::basic_string<TCHAR> addressline1;
std::basic_string<TCHAR> addressline2;
std::basic_string<TCHAR> city;
}


/* Option 3 */
#ifdef UNICODE
typedef std::wstring tstring
#else
typedef std::string tstring
#endif
class Address
{
tstring name;
tstring addressline1;
tstring addressline2;
tstring city;
}

Thanks again.
Anand (Rohit)

 
Reply With Quote
 
=?iso-8859-1?q?Kirit_S=E6lensminde?=
Guest
Posts: n/a
 
      08-03-2006

(E-Mail Removed) wrote:
> Hi
>
> While developing any software, developer need to think about it's
> possible enhancement for international usage and considering UNICODE.
>
> I have read many nice articles/items in advanced C++ books (Effective
> C++, More Effective C++, Exceptional C++, More Exceptional C++, C++
> FAQs, Addison Wesley 2nd Edition)
>
> Authors of these books have not considered UNICODE. So many of their
> suggestions/guidelines confuse developers regarding what to use for
> character-string members of class (considering exception-safety,
> reusability and maintenance of code).
>
> Many books have stated that:
> Instead of using character arrays, always prefer using std::string.
>
> My Questions is:
>
> While developing generic Win32 app using C++ for Windows
> (98/NT/2000/2003/XP), considering unicode for Windows NT/2000/2003/XP,
> What to prefer - TCHAR arrays, std::string or std::wstring
> for character-string members (name, address, city, state, country etc.)
>
> of classes like Address, Customer, Vendor, Employee ?
>
> What to prefer - TCHAR arrays, std::string or std::wstring ?
>
> I truly appreciate any help or guideline.
> Anand


I don't use TCHAR as it's a horrid kludge and has problems of its own.
Although it pretends to support both wchar_t and char it's slightly
broken. The _T macro that may or may not put the L in front of string
literals is even more broken.

As you're developing on Windows then just use wchar_t (and tell MSVC to
define it as a base type, not a typedef to short). You will get exactly
zero benefit from trying to compile the same program with and without
Unicode support.

It is normally much better to just use Unicode internally and then
convert to eight bit in whatever localised form you need when you have
to do so. You will find that you have to do all of this anyway for any
non-trivial program.


K

 
Reply With Quote
 
Phlip
Guest
Posts: n/a
 
      08-03-2006
rohitpatel9999 wrote:

> Client is sure that they will need UNICODE for few languages (e.g.
> Japanese).


There are requirements and then there are requirements.

I once ported an application to Greek. The original author had added lots of
calls to convert between code-pages. Then the program never converted to any
code pages - it all worked in Western Europe with just one code-page.

I had a lot of fun diagnosing and fixing each bug, the first time any of
these conversion functions ever got called. Oh, and I was implicitly blamed
for the slow velocity, not the original programmer.

So, has this client arranged to provide a real Japanese locale, with a
glossary, for you to port the app to _now_?

Without the critical step of actually using this speculative code, the
client will instead order you to waste time twice, now when you proactively
code for Unicode, and later when you actually provide a new locale.

> Client req. document did specify to make code C++ generic for UNICODE
> consideration (but should not use MFC specific CString).
>
> So (in Microfost Visual C++)
> application build for Win98/ME will have MBCS defined
> application build for Win2000/NT/2003/XP will have UNICODE and _UNICODE
> defined.
>
> Please guide me, (considering exception-safety, reusability and
> maintenance of code).


From here on, I can't. The question is now only on-topic for, roughly,
news:microsoft.public.vc.language , or possibly a localization forum
thereof. However, MBCS might provide for as much Japanese as UNICODE would.
You need to ask your client for a real Japanese locale, and then you need to
match your work to it. (And don't get me started about UCS.)

If they give you a glossary in the JIS201 code-page, then an 8-bit non-MBCS
would work for both the Win95s and the WinNTs. If you first enabled UNICODE,
and only then discover your glossary is in JIS201, then you would have
wasted that effort.

(You could use iconv to convert the glossary to UNICODE or back. The goal is
to match which code-page Japanese customers will accept. Has your client
actually researched this?)

> What to prefer - TCHAR arrays, std::string or std::wstring ?


Joel Spolky sez "there's no such thing as raw text". The rejoinder is that
wchar_t does not a localized application make.

If you need UNICODE, and if you truly need to pack all kinds of text into
any string, then you need a kind of UTF to encode it. UNICODE is a character
set, not an encoding. And if you can go with UTF-8, even on a Win95 machine,
then you don't need std::wstring.

> _TCHAR name[30];


Never. The fixed-length string itself will cause untold horror.

> std::basic_string<TCHAR> name;


Only if you actually test both modes, as you program.

And please introduce a typedef:

typedef std::basic_string<TCHAR> tstring;

> /* Option 3 */
> #ifdef UNICODE
> typedef std::wstring tstring


This is a clumsy version of Option 2.

The next complaint is that neither wchar_t or WCHAR are "UNICODE". Sometimes
they are UTF-16. (And on some compilers wchar_t is UTF-32.)

The more you seek a simple answer, the harder this problem will get. The
answer would be simple if you had enough evidence to back up your decision.
Always get as much evidence as possible - preferrably from live deployed
code - before making hard and irreversible decisions. Your client clearly
has experience with source code that created problems when it localized.
They _cannot_ fix this by just guessing you will need the _UNICODE flag
turned on. You must work with them to either defer the requirement, and
write clean code, or promote the requirement, targetting a real release
candidate that a real international user will accept.

--
Phlip
http://c2.com/cgi/wiki?ZeekLand <-- NOT a blog!!!


 
Reply With Quote
 
Phlip
Guest
Posts: n/a
 
      08-03-2006
Kirit Sælensminde wrote:

> As you're developing on Windows then just use wchar_t (and tell MSVC to
> define it as a base type, not a typedef to short). You will get exactly
> zero benefit from trying to compile the same program with and without
> Unicode support.


Except that turning on _UNICODE will automagically make the compiler and
program interpret your RC file in UTF-16 instead of a code-paged 8-bit
encoding.

> It is normally much better to just use Unicode internally and then
> convert to eight bit in whatever localised form you need when you have
> to do so. You will find that you have to do all of this anyway for any
> non-trivial program.


The OP also has the requirement to target the Win95s, which can't run in
Wide mode.

Aren't there strap-on DLL sets that provide a kind of Wide mode for the
Win95s? If so, the OP could deploy these with the application, build
everything for UNICODE, and safely neglect to enable any other code-pages.

--
Phlip
http://c2.com/cgi/wiki?ZeekLand <-- NOT a blog!!!


 
Reply With Quote
 
loufoque
Guest
Posts: n/a
 
      08-03-2006
(E-Mail Removed) wrote :

> What to prefer - TCHAR arrays, std::string or std::wstring ?


Just make anything Unicode-aware without using any specific stupidity
from the win32 API.
However, if you rely heavily on that API it may be annoying to interface
with it if you don't follow its internationalization concepts.
But anyway if you rely that much on it you're coding something so
specific that you should ask in another group.

std::wstring will allow UCS-2 (on win32) and UCS-4 (on most unices).
You can use std::string for 'unsafe' utf-8, which is in most of the
cases enough.

Or you could use ICU or glibmm for advanced Unicode support.
 
Reply With Quote
 
Bo Persson
Guest
Posts: n/a
 
      08-03-2006

"Phlip" <(E-Mail Removed)> skrev i meddelandet
news:mdnAg.1984$(E-Mail Removed) ...
> Kirit Sælensminde wrote:
>
>> As you're developing on Windows then just use wchar_t (and tell
>> MSVC to
>> define it as a base type, not a typedef to short). You will get
>> exactly
>> zero benefit from trying to compile the same program with and
>> without
>> Unicode support.

>
> Except that turning on _UNICODE will automagically make the compiler
> and program interpret your RC file in UTF-16 instead of a code-paged
> 8-bit encoding.


You can turn that option on as well, if it has any advantage. Using
wchar_t and std::wstring in your application makes it independent of
those settings.

>
>> It is normally much better to just use Unicode internally and then
>> convert to eight bit in whatever localised form you need when you
>> have
>> to do so. You will find that you have to do all of this anyway for
>> any
>> non-trivial program.

>
> The OP also has the requirement to target the Win95s, which can't
> run in Wide mode.


Windows 95, 98, and NT are officially unsupported both as OSs and as
targets for the present compiler. All currently supported Windows
versions use wchar_t internally. New applications could do that as
well.

Using TCHAR to optionally compile a new application for a dead OS
doesn't seem very useful to me.

>
> Aren't there strap-on DLL sets that provide a kind of Wide mode for
> the Win95s? If so, the OP could deploy these with the application,
> build everything for UNICODE, and safely neglect to enable any other
> code-pages.


Except that these are as dead as their OSs. Can't be distributed after
their end-of-life.


Bo Persson


 
Reply With Quote
 
loufoque
Guest
Posts: n/a
 
      08-03-2006
Phlip wrote :

> The OP also has the requirement to target the Win95s, which can't run in
> Wide mode.


Actually, you can probably do it with MSLU (the Microsoft Layer for
Unicode on Windows 95, 98, and Me systems)

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Trim Left char tchar[64] muroogan@gmail.com C++ 4 02-02-2005 10:09 PM
TCHAR in c++ to string in java dwurity@yahoo.com Java 7 12-14-2004 05:49 PM
TCHAR in c++ to string in java Bobby Java 0 12-14-2004 01:21 PM
std::wstring, TCHAR, wchar_t and LPTSTR sorty C++ 4 11-25-2003 11:05 AM
string to const TCHAR problems ree C++ 7 10-19-2003 10:17 AM



Advertisments