Velocity Reviews

Velocity Reviews (http://www.velocityreviews.com/forums/index.php)
-   C++ (http://www.velocityreviews.com/forums/f39-c.html)
-   -   Oh! Unicode, console windows, Windows! That must be fun! :-) (http://www.velocityreviews.com/forums/t806196-oh-unicode-console-windows-windows-that-must-be-fun.html)

Alf P. Steinbach 11-22-2011 06:28 PM

Oh! Unicode, console windows, Windows! That must be fun! :-)
 
After about a year of non-blogging I just posted about this: Unicode
console programs.

http://alfps.wordpress.com/2011/11/2...io-approaches/

It's interesting that there is not much whining about how Windows
consoles do not properly support international programs. Perhaps console
programs are not as popular as they once were? Perhaps students nowadays
start directly with GUI programs, in some other language?

Anyway, here's the summary in the posting:

<summary>
Above I introduced two approaches to Unicode handling in small Windows
console programs:

* The all UTF-8 approach where everything is encoded as UTF-8, and
where there are no BOM encoding markers.

* The wide string approach where all external text (including the
C++ source code) is encoded as UTF-8, and all internal text is encoded
as UTF-16.

The all UTF-8 approach is the approach used in a typical Linux
installation. With this approach a novice can remain unaware that he is
writing code that handles Unicode: it Just Works™ – in Linux. However,
we saw that it mass-failed in Windows:

* Input with active codepage 65001 (UTF-8) failed due to various bugs.

* Console output with Visual C++ produced gibberish due to the
runtime library’s attempt to help by using direct console output.

* I mentioned how wide string literals with non-ASCII characters
are incorrectly translated to UTF-16 by Visual C++ due to the necessary
lying to Visual C++ about the source code encoding (which is
accomplished by not having a BOM at the start of the source code file).

The wide string approach, on the other hand, was shown to have special
support in Visual C++, via the _O_U8TEXT file mode, which I called an
UTF-8 stream mode. But I mentioned that as of Visual C++ 10 this special
file mode is not fully implemented and/or it has some bugs: it cannot be
used directly but needs some scaffolding and fixing. That’s what part 2
is about.
</summary>


Cheers,

- Alf

Liviu 11-22-2011 07:46 PM

Re: Oh! Unicode, console windows, Windows! That must be fun! :-)
 
"Alf P. Steinbach" <alf.p.steinbach+usenet@gmail.com> wrote...
>
> After about a year of non-blogging I just posted about this: Unicode
> console programs.
>
> http://alfps.wordpress.com/2011/11/2...io-approaches/
>
> It's interesting that there is not much whining about how Windows
> consoles do not properly support international programs. [...]


Off-topicness aside, but the Windows console itself is as
"international" as one cares to make use of.

Console WinRar (rar) supports UTF-16 response files, and console
7-Zip (7za) takes UTF-8 ones. Console mode variant of the Epsilon
editor (epsilonC) handles a bunch of encodings, including all flavors
of Unicode, just fine. The ZTree file manager (a console app) is fully
Unicode enabled for both browsing and viewing files.

So I think what you meant is that the C/C++ runtime libraries of the
popular Windows compilers do not offer particularly easy or complete
coverage of the Windows console built-in "international" capabilities.
That is (unfortunately) correct, indeed.

Cheers,
Liviu



Alf P. Steinbach 11-22-2011 08:16 PM

Re: Oh! Unicode, console windows, Windows! That must be fun! :-)
 
On 22.11.2011 20:46, Liviu wrote:
> "Alf P. Steinbach"<alf.p.steinbach+usenet@gmail.com> wrote...
>>
>> After about a year of non-blogging I just posted about this: Unicode
>> console programs.
>>
>> http://alfps.wordpress.com/2011/11/2...io-approaches/
>>
>> It's interesting that there is not much whining about how Windows
>> consoles do not properly support international programs. [...]

>
> Off-topicness aside, but the Windows console itself is as
> "international" as one cares to make use of.
>
> Console WinRar (rar) supports UTF-16 response files, and console
> 7-Zip (7za) takes UTF-8 ones. Console mode variant of the Epsilon
> editor (epsilonC) handles a bunch of encodings, including all flavors
> of Unicode, just fine. The ZTree file manager (a console app) is fully
> Unicode enabled for both browsing and viewing files.


Well, what encodings programs can deal with in files has nothing to do
with the Windows console subsystem's Unicode support.

Given any program that deals with Unicode files, such as Microsoft Word,
if you have the source you can always link it as a console subsystem
program, and that changes nothing except that as a console program it
gets an automatic console window if it's not started from one. I think
you'll agree that Microsoft Word's ability to deal with Unicode files
has nothing to do with the Windows console subsystem. In spite of the
possibility of linking Word as a console program.

To see a bit of the limitations of the console subsystem, try to issue
these two commands (where 65001 is the UTF-8 codepage) in sequence:

<example>
W:\> chcp 65001
Active code page: 65001

W:\> more
Not enough memory.

W:\> _
</example>

I hope this helps you.

Maybe you'll even read my blog posting, heh.


> So I think what you meant is that the C/C++ runtime libraries of the
> popular Windows compilers do not offer particularly easy or complete
> coverage of the Windows console built-in "international" capabilities.
> That is (unfortunately) correct, indeed.


That too, that too.


Cheers & hth.,

- Alf

Liviu 11-22-2011 09:36 PM

Re: Oh! Unicode, console windows, Windows! That must be fun! :-)
 
"Alf P. Steinbach" <alf.p.steinbach+usenet@gmail.com> wrote...
> On 22.11.2011 20:46, Liviu wrote:
>> "Alf P. Steinbach"<alf.p.steinbach+usenet@gmail.com> wrote...
>>>
>>> It's interesting that there is not much whining about how Windows
>>> consoles do not properly support international programs. [...]

>>
>> Off-topicness aside, but the Windows console itself is as
>> "international" as one cares to make use of.
>>
>> Console WinRar (rar) supports UTF-16 response files, and console
>> 7-Zip (7za) takes UTF-8 ones. Console mode variant of the Epsilon
>> editor (epsilonC) handles a bunch of encodings, including all flavors
>> of Unicode, just fine. The ZTree file manager (a console app) is
>> fully Unicode enabled for both browsing and viewing files.

>
> Well, what encodings programs can deal with in files has nothing to do
> with the Windows console subsystem's Unicode support.
>
> Given any program that deals with Unicode files, such as Microsoft
> Word [...]


Sorry, but not sure what the point of your argument is here.

The first two programs I listed (rar, 7za) are console apps, and I
included them as (counter)examples to the notion that the "active
codepage" somehow limits what encodings console apps can use.

The last two programs (epsilonC, ZTree) are fully fledged interactive
console apps, as in "can be started at the cmd prompt, and run in
the parent console". Both are blisfully Unicode. Both take advantage
of the Win32 console subsystem builtin support for Unicode. Hope
you'll agree that once it provably happens, it most likely exists ;-)
If in doubt feel free to inspect both of them closely, they are genuine
console apps, not GUIs disguised to mimic a text mode window.

> To see a bit of the limitations of the console subsystem, try to issue
> these two commands (where 65001 is the UTF-8 codepage) in sequence:
>
> <example>
> W:\> chcp 65001
> Active code page: 65001
>
> W:\> more
> Not enough memory.
>
> W:\> _
> </example>


Not all codepages are valid with 'chcp' and 65001 is one of those that
aren't (same goes for 65000 btw), so you are invoking what's essentially
UB in the CLI. See also
http://www.microsoft.com/resources/d...n-us/chcp.mspx.

> Maybe you'll even read my blog posting, heh.


I did, in fact, and remain of the opinion that...

>> So I think what you meant is that the C/C++ runtime libraries of the
>> popular Windows compilers do not offer particularly easy or complete
>> coverage of the Windows console built-in "international"
>> capabilities. That is (unfortunately) correct, indeed.

>
> That too, that too.


....it's not "too", it's "only" ;-)

Cheers,
Liviu



Alf P. Steinbach 11-22-2011 11:18 PM

Re: Oh! Unicode, console windows, Windows! That must be fun! :-)
 
On 22.11.2011 22:36, Liviu wrote:
> "Alf P. Steinbach"<alf.p.steinbach+usenet@gmail.com> wrote...
>> On 22.11.2011 20:46, Liviu wrote:
>>> "Alf P. Steinbach"<alf.p.steinbach+usenet@gmail.com> wrote...
>>>>
>>>> It's interesting that there is not much whining about how Windows
>>>> consoles do not properly support international programs. [...]
>>>
>>> Off-topicness aside, but the Windows console itself is as
>>> "international" as one cares to make use of.
>>>
>>> Console WinRar (rar) supports UTF-16 response files, and console
>>> 7-Zip (7za) takes UTF-8 ones. Console mode variant of the Epsilon
>>> editor (epsilonC) handles a bunch of encodings, including all flavors
>>> of Unicode, just fine. The ZTree file manager (a console app) is
>>> fully Unicode enabled for both browsing and viewing files.

>>
>> Well, what encodings programs can deal with in files has nothing to do
>> with the Windows console subsystem's Unicode support.
>>
>> Given any program that deals with Unicode files, such as Microsoft
>> Word [...]

>
> Sorry, but not sure what the point of your argument is here.


How a program deals with files has nothing to do with Unicode support in
the console system.

The console system Unicode support has five main aspects:

* Text buffer. This is limited to UCS2, that is, the Basic
Multilingual Plane of Unicode.

* Presentation. Console windows don't deal with glyphs that
need two cells. E.g. some Chinese ideograms.

* Command line. This works nicely, it's UTF-16 encoded.

* Standard i/o streams. They're broken. E.g. below you state
that console windows do not support conversion to/from UTF-8
in the standard streams (that's what the active codepage is,
the assumed encoding of those streams). In fact your
statement refers to documentation saying that one cannot
even get a console window to translate to/from Windows ANSI,
which happily is incorrect, but that's what you maintain.

* Support commands. Most of them fail to handle UTF-8. Some
of them, like 'more' and 'csc' (the C# compiler), crash.


> The first two programs I listed (rar, 7za) are console apps, and I
> included them as (counter)examples to the notion that the "active
> codepage" somehow limits what encodings console apps can use.


Do you seriously think that I'm writing a series of blog articles about
how to do something that I think is impossible?


> The last two programs (epsilonC, ZTree) are fully fledged interactive
> console apps, as in "can be started at the cmd prompt, and run in
> the parent console". Both are blisfully Unicode. Both take advantage
> of the Win32 console subsystem builtin support for Unicode. Hope
> you'll agree that once it provably happens, it most likely exists ;-)


You seem to be arguing against a statement of impossibility.

The only alternative I can see to you being an idiot, is that you're
trying to convey a false impression.

I believe the latter.


> If in doubt feel free to inspect both of them closely, they are genuine
> console apps, not GUIs disguised to mimic a text mode window.


Again, this sounds only like misdirection and reader manipulation.

You are arguing against something that nobody's argued for.

You are doing that in order to misdirect and deceive.


>> To see a bit of the limitations of the console subsystem, try to issue
>> these two commands (where 65001 is the UTF-8 codepage) in sequence:
>>
>> <example>
>> W:\> chcp 65001
>> Active code page: 65001
>>
>> W:\> more
>> Not enough memory.
>>
>> W:\> _
>> </example>

>
> Not all codepages are valid with 'chcp' and 65001 is one of those that
> aren't (same goes for 65000 btw), so you are invoking what's essentially
> UB in the CLI. See also
> http://www.microsoft.com/resources/d...n-us/chcp.mspx.


The Microsoft documentation has lots of bugs. For example, the list
you're referring to incorrectly lists only OEM codepages.


Cheers & hth.,

- Alf

Liviu 11-23-2011 12:13 AM

Re: Oh! Unicode, console windows, Windows! That must be fun! :-)
 
"Alf P. Steinbach" <alf.p.steinbach+usenet@gmail.com> wrote...
> On 22.11.2011 22:36, Liviu wrote:
>> "Alf P. Steinbach"<alf.p.steinbach+usenet@gmail.com> wrote...
>>> On 22.11.2011 20:46, Liviu wrote:
>>>> "Alf P. Steinbach"<alf.p.steinbach+usenet@gmail.com> wrote...
>>>>>
>>>>> It's interesting that there is not much whining about how Windows
>>>>> consoles do not properly support international programs. [...]
>>>>
>>>> Off-topicness aside, but the Windows console itself is as
>>>> "international" as one cares to make use of.
>>>>
>>>> Console WinRar (rar) supports UTF-16 response files, and console
>>>> 7-Zip (7za) takes UTF-8 ones. Console mode variant of the Epsilon
>>>> editor (epsilonC) handles a bunch of encodings, including all
>>>> flavors of Unicode, just fine. The ZTree file manager (a console
>>>> app) is fully Unicode enabled for both browsing and viewing files.
>>>
>>> Well, what encodings programs can deal with in files has nothing to
>>> do with the Windows console subsystem's Unicode support.
>>>
>>> Given any program that deals with Unicode files, such as Microsoft
>>> Word [...]

>>
>> Sorry, but not sure what the point of your argument is here.

>
> How a program deals with files has nothing to do with Unicode support
> in the console system.


OK, maybe (+), but this has little to do with what your _original_ post
stated, which is what I was replying to.

(+) Except filenames themselves can be Unicode, too. There are apps
which support Unicode filenames, contents, and both. The 3 categories
overlap, but are not one and the same.

> The console system Unicode support has five main aspects:
>
> * Text buffer. This is limited to UCS2, that is, the Basic
> Multilingual Plane of Unicode.


True as of Windows 2000, no longer since XP. See for example
http://blogs.msdn.com/b/michkap/arch...11/416552.aspx

> * Presentation. Console windows don't deal with glyphs that
> need two cells. E.g. some Chinese ideograms.


Yes, they do. See for example
http://blogs.msdn.com/b/buckh/archiv...11/463427.aspx

> * Command line. This works nicely, it's UTF-16 encoded.


Generally yes, but depends on the caller. For example, piping output
from another program at a cmd prompt started without "/u" incurs a
double translation.

> * Standard i/o streams. They're broken. E.g. below you state
> that console windows do not support conversion to/from UTF-8
> in the standard streams (that's what the active codepage is,
> the assumed encoding of those streams). In fact your
> statement refers to documentation saying that one cannot
> even get a console window to translate to/from Windows ANSI,
> which happily is incorrect, but that's what you maintain.


You either misunderstood or misrepresent what I wrote. And sorry,
but "get a console window to translate to/from Windows ANSI" makes
no sense whatsoever.

> * Support commands. Most of them fail to handle UTF-8. Some
> of them, like 'more' and 'csc' (the C# compiler), crash.


Right. Can you point to documentation stating that they are/should
support UTF-8?

Note that the 1st is a CLI builtin, and the 2nd is a standalone app.
Just because they chose not to support UTF-8 has no bearing at all
on whether the Win32 console subsystem supports Unicode or not.

>> The first two programs I listed (rar, 7za) are console apps, and I
>> included them as (counter)examples to the notion that the "active
>> codepage" somehow limits what encodings console apps can use.

>
> Do you seriously think that I'm writing a series of blog articles
> about how to do something that I think is impossible?


No, I think you are just confused ;-)

>> The last two programs (epsilonC, ZTree) are fully fledged interactive
>> console apps, as in "can be started at the cmd prompt, and run in
>> the parent console". Both are blisfully Unicode. Both take advantage
>> of the Win32 console subsystem builtin support for Unicode. Hope
>> you'll agree that once it provably happens, it most likely exists ;-)

>
> You seem to be arguing against a statement of impossibility.
>
> The only alternative I can see to you being an idiot, is that you're
> trying to convey a false impression.
>
> I believe the latter.


You seem to have conveniently forgot your opening assertion:

|| It's interesting that there is not much whining about how Windows
|| consoles do not properly support international programs.

And you seem to be confusing what the Win32 console subsystem
("Windows consoles") supports with what Microsoft's CLI (command
line interpreter, lest one misreads that, too) actually implements.

Consider this my closing post in this thread.

Cheers,
Liviu



Alf P. Steinbach 11-23-2011 01:16 AM

Re: Oh! Unicode, console windows, Windows! That must be fun! :-)
 
On 23.11.2011 01:13, Liviu wrote:
> "Alf P. Steinbach"<alf.p.steinbach+usenet@gmail.com> wrote...
>> On 22.11.2011 22:36, Liviu wrote:
>>> "Alf P. Steinbach"<alf.p.steinbach+usenet@gmail.com> wrote...
>>>> On 22.11.2011 20:46, Liviu wrote:
>>>>> "Alf P. Steinbach"<alf.p.steinbach+usenet@gmail.com> wrote...
>>>>>>
>>>>>> It's interesting that there is not much whining about how Windows
>>>>>> consoles do not properly support international programs. [...]
>>>>>
>>>>> Off-topicness aside, but the Windows console itself is as
>>>>> "international" as one cares to make use of.
>>>>>
>>>>> Console WinRar (rar) supports UTF-16 response files, and console
>>>>> 7-Zip (7za) takes UTF-8 ones. Console mode variant of the Epsilon
>>>>> editor (epsilonC) handles a bunch of encodings, including all
>>>>> flavors of Unicode, just fine. The ZTree file manager (a console
>>>>> app) is fully Unicode enabled for both browsing and viewing files.
>>>>
>>>> Well, what encodings programs can deal with in files has nothing to
>>>> do with the Windows console subsystem's Unicode support.
>>>>
>>>> Given any program that deals with Unicode files, such as Microsoft
>>>> Word [...]
>>>
>>> Sorry, but not sure what the point of your argument is here.

>>
>> How a program deals with files has nothing to do with Unicode support
>> in the console system.

>
> OK, maybe (+), but this has little to do with what your _original_ post
> stated, which is what I was replying to.


You should have referred to something real here, and quoted it.


> (+) Except filenames themselves can be Unicode, too. There are apps
> which support Unicode filenames, contents, and both. The 3 categories
> overlap, but are not one and the same.
>
>> The console system Unicode support has five main aspects:
>>
>> * Text buffer. This is limited to UCS2, that is, the Basic
>> Multilingual Plane of Unicode.

>
> True as of Windows 2000, no longer since XP.


That's incorrect. The console text buffer is UCS2. There is exactly 16
bits per character cell.

The memory layout for one cell of a console text buffer is specified at

http://msdn.microsoft.com/en-us/libr...=vs.85%29.aspx


> See for example
> http://blogs.msdn.com/b/michkap/arch...11/416552.aspx


That's an incorrect associative interpretation. The article does not
support your contention.


>> * Presentation. Console windows don't deal with glyphs that
>> need two cells. E.g. some Chinese ideograms.

>
> Yes, they do. See for example
> http://blogs.msdn.com/b/buckh/archiv...11/463427.aspx


That's again an incorrect pure associative interpretation.

You seem to be conflating two different issues, namely UTF-16 surrogate
pairs, and glyphs that are wider than one cell.

The article you're linking to describes a bug where a single character
represented as a surrogate pair (two 16-bit values) is displayed using
two cells, even if it would need just one, e.g. like in the program
below. I am not at all sure that the g-clef character represented by the
surrogate pair in the code below should display in a single cell, but
the program just illustrates that with a valid surrogate pair that
logically is one character, the console subsystem incorrectly treats it
as two characters -- because consoles are UCS2, not UTF-16.

<code>
#undef UNICODE
#define UNICODE
#include <windows.h>

int main()
{
HANDLE const out = GetStdHandle( STD_OUTPUT_HANDLE );
wchar_t const gClefManual[] = L"\xD834\xDD1E\n"; // L"\u0001D11E\n"

DWORD nCharsWritten;
WriteConsole( out, gClefManual, 3, &nCharsWritten, nullptr );
}
</code>

That bug is not the feature of displaying a single glyph over two cells.


>> * Command line. This works nicely, it's UTF-16 encoded.

>
> Generally yes, but depends on the caller. For example, piping output
> from another program at a cmd prompt started without "/u" incurs a
> double translation.


That's incorrect: the command line is not translated. It is not
translated because it is not passed via the i/o streams. It is instead
(ultimately) passed to the new process as an UTF-16 encoded argument of
the CreateProcess API function, and it is available to the process as a
UTF-16 encoded string via the GetCommandLine API function.


>> * Standard i/o streams. They're broken. E.g. below you state
>> that console windows do not support conversion to/from UTF-8
>> in the standard streams (that's what the active codepage is,
>> the assumed encoding of those streams). In fact your
>> statement refers to documentation saying that one cannot
>> even get a console window to translate to/from Windows ANSI,
>> which happily is incorrect, but that's what you maintain.

>
> You either misunderstood or misrepresent what I wrote. And sorry,
> but "get a console window to translate to/from Windows ANSI" makes
> no sense whatsoever.


Maybe you did not understand what you wrote about the `chcp` command.

The translation I referred to is between the narrow character encoding
employed for a process' standard i/o streams such as standard output,
and the console window's UCS2 encoded text buffer.

This translation is specified by the console window's active codepage,
or if you want get really detailed, by the active codepages for input
and output (but at the ordinary user level one thinks of just one).


>> * Support commands. Most of them fail to handle UTF-8. Some
>> of them, like 'more' and 'csc' (the C# compiler), crash.

>
> Right. Can you point to documentation stating that they are/should
> support UTF-8?


That's meaningless.

You don't need documentation stating that a program should not crash.

It is an implicit requirement of any program that it should not crash.


>> ... like 'more' and 'csc' (the C# compiler), crash.>
> >

> Note that the 1st is a CLI builtin, and the 2nd is a standalone app.
> Just because they chose not to support UTF-8 has no bearing at all
> on whether the Win32 console subsystem supports Unicode or not.


That's incorrect. As of Windows 7 "the 1st", namely the 'more' command,
is the program [more.com] residing in the system folder. Despite the
name it's not a COM format executable but an ordinary PE format executable.

If you're interested in academic word games then it's true that there
exists an interpretation of "Win32 console subsystem" where such support
programs are not part of it, which is why I omitted the "sub", so for
the purposes of word gaming the crashes don't matter, I guess.

But for any practical consideration the fact that Windows' standard
programs crash, is pretty significant.


>>> The first two programs I listed (rar, 7za) are console apps, and I
>>> included them as (counter)examples to the notion that the "active
>>> codepage" somehow limits what encodings console apps can use.

>>
>> Do you seriously think that I'm writing a series of blog articles
>> about how to do something that I think is impossible?

>
> No, I think you are just confused ;-)


Everything you've written has so far been nearly void of real meaning,
and mostly incorrect technically.

But it has been full to the brim of misleading, manipulative and
deceptive wordage.

Thus, I know you for a liar.


Cheers & hth.,

- Alf

Alf P. Steinbach 11-23-2011 01:19 AM

Re: Oh! Unicode, console windows, Windows! That must be fun! :-)
 
On 23.11.2011 01:23, Leigh Johnston wrote:
>
> What a surprise: Alf trolls the newsgroup again hoping that his latest
> blog gives him some legitimacy just as with his "The unsigned types are
> for bit-level operations (only)," manifesto.


I do not discount the notion that "Liviu" is a sock puppet of yours.

Just for other readers:

Leigh is a known troll. I killfiled him when he posted rather negative
opinions about the secretary of the standardization committee's library
group. Now on a new machine I can see his postings again.


- Alf


Liviu 11-23-2011 03:01 AM

Re: Oh! Unicode, console windows, Windows! That must be fun! :-)
 
A little knowledge is a dangerous thing.
And a big ego only makes it worse ;-)

P.S. I neither play nor appreciate sock puppets. Better save your cheap
shots for cases where you have at least the shade of a leg to stand on.

"Alf P. Steinbach" <alf.p.steinbach+usenet@gmail.com> wrote...
> On 23.11.2011 01:13, Liviu wrote:
>> "Alf P. Steinbach"<alf.p.steinbach+usenet@gmail.com> wrote...
>>> On 22.11.2011 22:36, Liviu wrote:
>>>> "Alf P. Steinbach"<alf.p.steinbach+usenet@gmail.com> wrote...
>>>>> On 22.11.2011 20:46, Liviu wrote:
>>>>>> "Alf P. Steinbach"<alf.p.steinbach+usenet@gmail.com> wrote...
>>>>>>>
>>>>>>> It's interesting that there is not much whining about how
>>>>>>> Windows consoles do not properly support international programs.
>>>>>>
>>>>>> Off-topicness aside, but the Windows console itself is as
>>>>>> "international" as one cares to make use of.
>>>>>>
>>>>>> Console WinRar (rar) supports UTF-16 response files, and console
>>>>>> 7-Zip (7za) takes UTF-8 ones. Console mode variant of the Epsilon
>>>>>> editor (epsilonC) handles a bunch of encodings, including all
>>>>>> flavors of Unicode, just fine. The ZTree file manager (a console
>>>>>> app) is fully Unicode enabled for both browsing and viewing
>>>>>> files.
>>>>>
>>>>> Well, what encodings programs can deal with in files has nothing
>>>>> to do with the Windows console subsystem's Unicode support.
>>>>>
>>>>> Given any program that deals with Unicode files, such as Microsoft
>>>>> Word [...]
>>>>
>>>> Sorry, but not sure what the point of your argument is here.
>>>
>>> How a program deals with files has nothing to do with Unicode
>>> support in the console system.

>>
>> OK, maybe (+), but this has little to do with what your _original_
>> post stated, which is what I was replying to.

>
> You should have referred to something real here, and quoted it.


It's still quoted at the top of this followup. It was quoted again
towards the bottom of my previous reply. For your convenience,
here it is one more time, in your own words...

|| It's interesting that there is not much whining about how Windows
|| consoles do not properly support international programs.

>>> The console system Unicode support has five main aspects:
>>>
>>> * Text buffer. This is limited to UCS2, that is, the Basic
>>> Multilingual Plane of Unicode.

>>
>> True as of Windows 2000, no longer since XP.

>
> That's incorrect. The console text buffer is UCS2. There is exactly 16
> bits per character cell.


Don't know what gave you that (wrong) idea.

> memory layout for one cell of a console text buffer is specified at
>
> http://msdn.microsoft.com/en-us/libr...=vs.85%29.aspx


FYI what that describes is the Unicode "character". I see no mention of
"cells" or console screen glyphs.

>> See for example
>> http://blogs.msdn.com/b/michkap/arch...11/416552.aspx

>
> That's an incorrect associative interpretation. The article does not
> support your contention.


My interpretation and contention are both correct. For example, U+61A8
is one _character_ displayed across two _cells_ in the console (yes,
even at the cmd prompt if you enable Alt+ Unicode input).

Funny, this was from one of my test files. Didn't realize that it meant
"'foolish, silly, coquettish" in Chinese until I google'd it now ;-)

>>> * Support commands. Most of them fail to handle UTF-8. Some
>>> of them, like 'more' and 'csc' (the C# compiler), crash.

>>
>> Right. Can you point to documentation stating that they are/should
>> support UTF-8?

>
> That's meaningless.
>
> You don't need documentation stating that a program should not crash.


If you expect a program to support a certain feature, you'd better base
that expectation on something other than blind hope.

> It is an implicit requirement of any program that it should not crash.


Exiting with an error is not a crash. It's just one of many UB possible
outcomes, when you pass unsupported arguments.

> But for any practical consideration the fact that Windows' standard
> programs crash, is pretty significant.


I thought the point was about what 3rd party console programs (such as
yourself would write in C/C++) could achieve with the Win32 console
API. You sounded like "not much". I've given you examples of such real
programs which do achieve full Unicode compliance. Granted, they are
not written by you. And sorry if you are still in denial.

Out of here for good now, and Happy Turkey everyone.

Liviu




Alf P. Steinbach 11-23-2011 03:56 AM

Re: Oh! Unicode, console windows, Windows! That must be fun! :-)
 
On 23.11.2011 04:01, Liviu wrote:
> A little knowledge is a dangerous thing.
> And a big ego only makes it worse ;-)
>
> P.S. I neither play nor appreciate sock puppets. Better save your cheap
> shots for cases where you have at least the shade of a leg to stand on.
>
> "Alf P. Steinbach"<alf.p.steinbach+usenet@gmail.com> wrote...
>> On 23.11.2011 01:13, Liviu wrote:
>>> "Alf P. Steinbach"<alf.p.steinbach+usenet@gmail.com> wrote...
>>>> On 22.11.2011 22:36, Liviu wrote:
>>>>> "Alf P. Steinbach"<alf.p.steinbach+usenet@gmail.com> wrote...
>>>>>> On 22.11.2011 20:46, Liviu wrote:
>>>>>>> "Alf P. Steinbach"<alf.p.steinbach+usenet@gmail.com> wrote...
>>>>>>>>
>>>>>>>> It's interesting that there is not much whining about how
>>>>>>>> Windows consoles do not properly support international programs.
>>>>>>>
>>>>>>> Off-topicness aside, but the Windows console itself is as
>>>>>>> "international" as one cares to make use of.
>>>>>>>
>>>>>>> Console WinRar (rar) supports UTF-16 response files, and console
>>>>>>> 7-Zip (7za) takes UTF-8 ones. Console mode variant of the Epsilon
>>>>>>> editor (epsilonC) handles a bunch of encodings, including all
>>>>>>> flavors of Unicode, just fine. The ZTree file manager (a console
>>>>>>> app) is fully Unicode enabled for both browsing and viewing
>>>>>>> files.
>>>>>>
>>>>>> Well, what encodings programs can deal with in files has nothing
>>>>>> to do with the Windows console subsystem's Unicode support.
>>>>>>
>>>>>> Given any program that deals with Unicode files, such as Microsoft
>>>>>> Word [...]
>>>>>
>>>>> Sorry, but not sure what the point of your argument is here.
>>>>
>>>> How a program deals with files has nothing to do with Unicode
>>>> support in the console system.
>>>
>>> OK, maybe (+), but this has little to do with what your _original_
>>> post stated, which is what I was replying to.

>>
>> You should have referred to something real here, and quoted it.

>
> It's still quoted at the top of this followup. It was quoted again
> towards the bottom of my previous reply. For your convenience,
> here it is one more time, in your own words...
>
> || It's interesting that there is not much whining about how Windows
> || consoles do not properly support international programs.


It's impossible that you disagree with any of that.

No one who reads [clc++] think that Windows consoles do support
international programs very well, and I hope that none of them think
that there's been much whining about it. I think that most readers would
probably disagree with my "interesting" assessment. But that's irony,
and I think they agree with the irony...

In short, when you try to give the impression that you disagree with any
of that, you're just again portraying yourself as free of knowledge of
the technical, but above average on creating impressions.


>>>> The console system Unicode support has five main aspects:
>>>>
>>>> * Text buffer. This is limited to UCS2, that is, the Basic
>>>> Multilingual Plane of Unicode.
>>>
>>> True as of Windows 2000, no longer since XP.

>>
>> That's incorrect. The console text buffer is UCS2. There is exactly 16
>> bits per character cell.

>
> Don't know what gave you that (wrong) idea.


I meant that there's exactly 16 bits per character code, with one such
character code, as a wchar_t value, per cell.

There are 16 bits because wchar_t in Windows is 16 bits.

The layout of a console screen buffer follows closely the layout of the
original IBM PC screen memory. The main difference is that instead of
just 8 bits per character code, you have 16 bits, in order to
accommodate original Unicode. There is a small functional difference in
that (at least by default) all the attribute bits control color instead
of blinking or underlining as in the original IBM PC screen adapter.


>> memory layout for one cell of a console text buffer is specified at
>>
>> http://msdn.microsoft.com/en-us/libr...=vs.85%29.aspx

>
> FYI what that describes is the Unicode "character". I see no mention of
> "cells" or console screen glyphs.


That's inconsistent (which is revealing): a bit further down you have no
problem understanding what a cell in the console window text buffer is.

The console window text buffer is an array of lines consisting of
character position cells.

Each cell has a character code (16 bits, for UCS2) and an attribute that
controls foreground and background color for this cell.


>>> See for example
>>> http://blogs.msdn.com/b/michkap/arch...11/416552.aspx

>>
>> That's an incorrect associative interpretation. The article does not
>> support your contention.

>
> My interpretation and contention are both correct. For example, U+61A8
> is one _character_ displayed across two _cells_ in the console (yes,
> even at the cmd prompt if you enable Alt+ Unicode input).


That's incorrect.

It displays as a single cell rectangle in a Windows 7 console window on
my machine.

There may be software that can display it correctly, but it is not default.


> Funny, this was from one of my test files. Didn't realize that it meant
> "'foolish, silly, coquettish" in Chinese until I google'd it now ;-)


Well it does not take much guts to hide behind a nick.

Have you noticed that *every* technical assertion you have made, has
been shown as incorrect?

It's near impossible to be that 100% consistently incompetent, so I
think it's part of your trolling, that is, that you endeavor to make as
many false but plausible-sounding technical assertions as you can in
order to rile me up.


>>>> * Support commands. Most of them fail to handle UTF-8. Some
>>>> of them, like 'more' and 'csc' (the C# compiler), crash.
>>>
>>> Right. Can you point to documentation stating that they are/should
>>> support UTF-8?

>>
>> That's meaningless.
>>
>> You don't need documentation stating that a program should not crash.

>
> If you expect a program to support a certain feature, you'd better base
> that expectation on something other than blind hope.


Asserting that one needs blind hope to expect that a program does not
crash in ordinary conditions, is stupid beyond belief -- unless the
person who says this is already identified as a troll and a liar.


>> It is an implicit requirement of any program that it should not crash.

>
> Exiting with an error is not a crash. It's just one of many UB possible
> outcomes, when you pass unsupported arguments.


That's incorrect: no arguments were passed in the example command
sequence I asked you to try.

One does not expect any program to crash like that.

Especially not Microsoft's own, like `more` and `csc`.


>> But for any practical consideration the fact that Windows' standard
>> programs crash, is pretty significant.

>
> I thought the point was about what 3rd party console programs (such as
> yourself would write in C/C++) could achieve with the Win32 console
> API.


There's a difference between "could" and practical reality.

I think the worst imaginable system is one where anything "could" be
achieved.

Because then it's difficult to get rid of.


> You sounded like "not much". I've given you examples of such real
> programs which do achieve full Unicode compliance. Granted, they are
> not written by you. And sorry if you are still in denial.


File handling, which you've focused on, has nothing to do with consoles.

But much is possible to do even with Windows consoles.

Again, but this time for the benefit of other readers, do you seriously
think that I would be writing blog articles about how to do something
that I regarded as impossible?

You have to know that that supposition does not make any sense at all.

Yet you have persisted in pressing it.

Which means that you're a troll, a liar.


Cheers & hth.,

- Alf


All times are GMT. The time now is 03:33 PM.

Powered by vBulletin®. Copyright ©2000 - 2014, vBulletin Solutions, Inc.
SEO by vBSEO ©2010, Crawlability, Inc.