Oh! Unicode, console windows, Windows! That must be fun! :-)

Alf P. Steinbach · Nov 22, 2011

After about a year of non-blogging I just posted about this: Unicode
console programs.

http://alfps.wordpress.com/2011/11/22/unicode-part-1-windows-console-io-approaches/

It's interesting that there is not much whining about how Windows
consoles do not properly support international programs. Perhaps console
programs are not as popular as they once were? Perhaps students nowadays
start directly with GUI programs, in some other language?

Anyway, here's the summary in the posting:

<summary>
Above I introduced two approaches to Unicode handling in small Windows
console programs:

* The all UTF-8 approach where everything is encoded as UTF-8, and
where there are no BOM encoding markers.

* The wide string approach where all external text (including the
C++ source code) is encoded as UTF-8, and all internal text is encoded
as UTF-16.

The all UTF-8 approach is the approach used in a typical Linux
installation. With this approach a novice can remain unaware that he is
writing code that handles Unicode: it Just Works™ – in Linux. However,
we saw that it mass-failed in Windows:

* Input with active codepage 65001 (UTF-8) failed due to various bugs.

* Console output with Visual C++ produced gibberish due to the
runtime library’s attempt to help by using direct console output.

* I mentioned how wide string literals with non-ASCII characters
are incorrectly translated to UTF-16 by Visual C++ due to the necessary
lying to Visual C++ about the source code encoding (which is
accomplished by not having a BOM at the start of the source code file).

The wide string approach, on the other hand, was shown to have special
support in Visual C++, via the _O_U8TEXT file mode, which I called an
UTF-8 stream mode. But I mentioned that as of Visual C++ 10 this special
file mode is not fully implemented and/or it has some bugs: it cannot be
used directly but needs some scaffolding and fixing. That’s what part 2
is about.
</summary>

Cheers,

- Alf

Liviu · Nov 22, 2011

Alf P. Steinbach said:
After about a year of non-blogging I just posted about this: Unicode
console programs.

http://alfps.wordpress.com/2011/11/22/unicode-part-1-windows-console-io-approaches/

It's interesting that there is not much whining about how Windows
consoles do not properly support international programs. [...]

Off-topicness aside, but the Windows console itself is as
"international" as one cares to make use of.

Console WinRar (rar) supports UTF-16 response files, and console
7-Zip (7za) takes UTF-8 ones. Console mode variant of the Epsilon
editor (epsilonC) handles a bunch of encodings, including all flavors
of Unicode, just fine. The ZTree file manager (a console app) is fully
Unicode enabled for both browsing and viewing files.

So I think what you meant is that the C/C++ runtime libraries of the
popular Windows compilers do not offer particularly easy or complete
coverage of the Windows console built-in "international" capabilities.
That is (unfortunately) correct, indeed.

Cheers,
Liviu

Alf P. Steinbach · Nov 22, 2011

Alf P. Steinbach said:
Alf P. Steinbach said:

After about a year of non-blogging I just posted about this: Unicode
console programs.

http://alfps.wordpress.com/2011/11/22/unicode-part-1-windows-console-io-approaches/

It's interesting that there is not much whining about how Windows
consoles do not properly support international programs. [...]

Click to expand...

Off-topicness aside, but the Windows console itself is as
"international" as one cares to make use of.

Console WinRar (rar) supports UTF-16 response files, and console
7-Zip (7za) takes UTF-8 ones. Console mode variant of the Epsilon
editor (epsilonC) handles a bunch of encodings, including all flavors
of Unicode, just fine. The ZTree file manager (a console app) is fully
Unicode enabled for both browsing and viewing files.

Well, what encodings programs can deal with in files has nothing to do
with the Windows console subsystem's Unicode support.

Given any program that deals with Unicode files, such as Microsoft Word,
if you have the source you can always link it as a console subsystem
program, and that changes nothing except that as a console program it
gets an automatic console window if it's not started from one. I think
you'll agree that Microsoft Word's ability to deal with Unicode files
has nothing to do with the Windows console subsystem. In spite of the
possibility of linking Word as a console program.

To see a bit of the limitations of the console subsystem, try to issue
these two commands (where 65001 is the UTF-8 codepage) in sequence:

<example>
W:\> chcp 65001
Active code page: 65001

W:\> more
Not enough memory.

W:\> _
</example>

I hope this helps you.

Maybe you'll even read my blog posting, heh.

So I think what you meant is that the C/C++ runtime libraries of the
popular Windows compilers do not offer particularly easy or complete
coverage of the Windows console built-in "international" capabilities.
That is (unfortunately) correct, indeed.

That too, that too.

Cheers & hth.,

- Alf

Liviu · Nov 22, 2011

Alf P. Steinbach said:
Alf P. Steinbach said:

It's interesting that there is not much whining about how Windows
consoles do not properly support international programs. [...]

Click to expand...

Off-topicness aside, but the Windows console itself is as
"international" as one cares to make use of.

Console WinRar (rar) supports UTF-16 response files, and console
7-Zip (7za) takes UTF-8 ones. Console mode variant of the Epsilon
editor (epsilonC) handles a bunch of encodings, including all flavors
of Unicode, just fine. The ZTree file manager (a console app) is
fully Unicode enabled for both browsing and viewing files.

Click to expand...

Well, what encodings programs can deal with in files has nothing to do
with the Windows console subsystem's Unicode support.

Given any program that deals with Unicode files, such as Microsoft
Word [...]

Sorry, but not sure what the point of your argument is here.

The first two programs I listed (rar, 7za) are console apps, and I
included them as (counter)examples to the notion that the "active
codepage" somehow limits what encodings console apps can use.

The last two programs (epsilonC, ZTree) are fully fledged interactive
console apps, as in "can be started at the cmd prompt, and run in
the parent console". Both are blisfully Unicode. Both take advantage
of the Win32 console subsystem builtin support for Unicode. Hope
you'll agree that once it provably happens, it most likely exists ;-)
If in doubt feel free to inspect both of them closely, they are genuine
console apps, not GUIs disguised to mimic a text mode window.

To see a bit of the limitations of the console subsystem, try to issue
these two commands (where 65001 is the UTF-8 codepage) in sequence:

<example>
W:\> chcp 65001
Active code page: 65001

W:\> more
Not enough memory.

W:\> _
</example>

Not all codepages are valid with 'chcp' and 65001 is one of those that
aren't (same goes for 65000 btw), so you are invoking what's essentially
UB in the CLI. See also
http://www.microsoft.com/resources/documentation/windows/xp/all/proddocs/en-us/chcp.mspx.

Maybe you'll even read my blog posting, heh.

I did, in fact, and remain of the opinion that...

That too, that too.

....it's not "too", it's "only" ;-)

Cheers,
Liviu

Alf P. Steinbach · Nov 22, 2011

Alf P. Steinbach said:
Alf P. Steinbach said:

It's interesting that there is not much whining about how Windows
consoles do not properly support international programs. [...]

Off-topicness aside, but the Windows console itself is as
"international" as one cares to make use of.

Console WinRar (rar) supports UTF-16 response files, and console
7-Zip (7za) takes UTF-8 ones. Console mode variant of the Epsilon
editor (epsilonC) handles a bunch of encodings, including all flavors
of Unicode, just fine. The ZTree file manager (a console app) is
fully Unicode enabled for both browsing and viewing files.

Click to expand...

Well, what encodings programs can deal with in files has nothing to do
with the Windows console subsystem's Unicode support.

Given any program that deals with Unicode files, such as Microsoft
Word [...]

Click to expand...

Sorry, but not sure what the point of your argument is here.

How a program deals with files has nothing to do with Unicode support in
the console system.

The console system Unicode support has five main aspects:

* Text buffer. This is limited to UCS2, that is, the Basic
Multilingual Plane of Unicode.

* Presentation. Console windows don't deal with glyphs that
need two cells. E.g. some Chinese ideograms.

* Command line. This works nicely, it's UTF-16 encoded.

* Standard i/o streams. They're broken. E.g. below you state
that console windows do not support conversion to/from UTF-8
in the standard streams (that's what the active codepage is,
the assumed encoding of those streams). In fact your
statement refers to documentation saying that one cannot
even get a console window to translate to/from Windows ANSI,
which happily is incorrect, but that's what you maintain.

* Support commands. Most of them fail to handle UTF-8. Some
of them, like 'more' and 'csc' (the C# compiler), crash.

The first two programs I listed (rar, 7za) are console apps, and I
included them as (counter)examples to the notion that the "active
codepage" somehow limits what encodings console apps can use.

Do you seriously think that I'm writing a series of blog articles about
how to do something that I think is impossible?

The last two programs (epsilonC, ZTree) are fully fledged interactive
console apps, as in "can be started at the cmd prompt, and run in
the parent console". Both are blisfully Unicode. Both take advantage
of the Win32 console subsystem builtin support for Unicode. Hope
you'll agree that once it provably happens, it most likely exists ;-)

You seem to be arguing against a statement of impossibility.

The only alternative I can see to you being an idiot, is that you're
trying to convey a false impression.

I believe the latter.

If in doubt feel free to inspect both of them closely, they are genuine
console apps, not GUIs disguised to mimic a text mode window.

Again, this sounds only like misdirection and reader manipulation.

You are arguing against something that nobody's argued for.

You are doing that in order to misdirect and deceive.

Not all codepages are valid with 'chcp' and 65001 is one of those that
aren't (same goes for 65000 btw), so you are invoking what's essentially
UB in the CLI. See also
http://www.microsoft.com/resources/documentation/windows/xp/all/proddocs/en-us/chcp.mspx.

The Microsoft documentation has lots of bugs. For example, the list
you're referring to incorrectly lists only OEM codepages.

Cheers & hth.,

- Alf

Liviu · Nov 23, 2011

Alf P. Steinbach said:
Alf P. Steinbach said:

On 22.11.2011 20:46, Liviu wrote:

It's interesting that there is not much whining about how Windows
consoles do not properly support international programs. [...]

Off-topicness aside, but the Windows console itself is as
"international" as one cares to make use of.

Console WinRar (rar) supports UTF-16 response files, and console
7-Zip (7za) takes UTF-8 ones. Console mode variant of the Epsilon
editor (epsilonC) handles a bunch of encodings, including all
flavors of Unicode, just fine. The ZTree file manager (a console
app) is fully Unicode enabled for both browsing and viewing files.

Well, what encodings programs can deal with in files has nothing to
do with the Windows console subsystem's Unicode support.

Given any program that deals with Unicode files, such as Microsoft
Word [...]

Click to expand...

Sorry, but not sure what the point of your argument is here.

Click to expand...

How a program deals with files has nothing to do with Unicode support
in the console system.

OK, maybe (+), but this has little to do with what your _original_ post
stated, which is what I was replying to.

(+) Except filenames themselves can be Unicode, too. There are apps
which support Unicode filenames, contents, and both. The 3 categories
overlap, but are not one and the same.

The console system Unicode support has five main aspects:

* Text buffer. This is limited to UCS2, that is, the Basic
Multilingual Plane of Unicode.

True as of Windows 2000, no longer since XP. See for example
http://blogs.msdn.com/b/michkap/archive/2005/05/11/416552.aspx

* Presentation. Console windows don't deal with glyphs that
need two cells. E.g. some Chinese ideograms.

Yes, they do. See for example
http://blogs.msdn.com/b/buckh/archive/2005/09/11/463427.aspx

* Command line. This works nicely, it's UTF-16 encoded.

Generally yes, but depends on the caller. For example, piping output
from another program at a cmd prompt started without "/u" incurs a
double translation.

* Standard i/o streams. They're broken. E.g. below you state
that console windows do not support conversion to/from UTF-8
in the standard streams (that's what the active codepage is,
the assumed encoding of those streams). In fact your
statement refers to documentation saying that one cannot
even get a console window to translate to/from Windows ANSI,
which happily is incorrect, but that's what you maintain.

You either misunderstood or misrepresent what I wrote. And sorry,
but "get a console window to translate to/from Windows ANSI" makes
no sense whatsoever.

* Support commands. Most of them fail to handle UTF-8. Some
of them, like 'more' and 'csc' (the C# compiler), crash.

Right. Can you point to documentation stating that they are/should
support UTF-8?

Note that the 1st is a CLI builtin, and the 2nd is a standalone app.
Just because they chose not to support UTF-8 has no bearing at all
on whether the Win32 console subsystem supports Unicode or not.

Do you seriously think that I'm writing a series of blog articles
about how to do something that I think is impossible?

No, I think you are just confused ;-)

You seem to be arguing against a statement of impossibility.

The only alternative I can see to you being an idiot, is that you're
trying to convey a false impression.

I believe the latter.

You seem to have conveniently forgot your opening assertion:

|| It's interesting that there is not much whining about how Windows
|| consoles do not properly support international programs.

And you seem to be confusing what the Win32 console subsystem
("Windows consoles") supports with what Microsoft's CLI (command
line interpreter, lest one misreads that, too) actually implements.

Consider this my closing post in this thread.

Cheers,
Liviu

Alf P. Steinbach · Nov 23, 2011

Alf P. Steinbach said:
Alf P. Steinbach said:

On 22.11.2011 20:46, Liviu wrote:

It's interesting that there is not much whining about how Windows
consoles do not properly support international programs. [...]

Off-topicness aside, but the Windows console itself is as
"international" as one cares to make use of.

Console WinRar (rar) supports UTF-16 response files, and console
7-Zip (7za) takes UTF-8 ones. Console mode variant of the Epsilon
editor (epsilonC) handles a bunch of encodings, including all
flavors of Unicode, just fine. The ZTree file manager (a console
app) is fully Unicode enabled for both browsing and viewing files.

Well, what encodings programs can deal with in files has nothing to
do with the Windows console subsystem's Unicode support.

Given any program that deals with Unicode files, such as Microsoft
Word [...]

Sorry, but not sure what the point of your argument is here.

Click to expand...

How a program deals with files has nothing to do with Unicode support
in the console system.

Click to expand...

OK, maybe (+), but this has little to do with what your _original_ post
stated, which is what I was replying to.

You should have referred to something real here, and quoted it.

(+) Except filenames themselves can be Unicode, too. There are apps
which support Unicode filenames, contents, and both. The 3 categories
overlap, but are not one and the same.

True as of Windows 2000, no longer since XP.

That's incorrect. The console text buffer is UCS2. There is exactly 16
bits per character cell.

The memory layout for one cell of a console text buffer is specified at

http://msdn.microsoft.com/en-us/library/windows/desktop/ms682013(v=vs.85).aspx

See for example
http://blogs.msdn.com/b/michkap/archive/2005/05/11/416552.aspx

That's an incorrect associative interpretation. The article does not
support your contention.

Yes, they do. See for example
http://blogs.msdn.com/b/buckh/archive/2005/09/11/463427.aspx

That's again an incorrect pure associative interpretation.

You seem to be conflating two different issues, namely UTF-16 surrogate
pairs, and glyphs that are wider than one cell.

The article you're linking to describes a bug where a single character
represented as a surrogate pair (two 16-bit values) is displayed using
two cells, even if it would need just one, e.g. like in the program
below. I am not at all sure that the g-clef character represented by the
surrogate pair in the code below should display in a single cell, but
the program just illustrates that with a valid surrogate pair that
logically is one character, the console subsystem incorrectly treats it
as two characters -- because consoles are UCS2, not UTF-16.

<code>
#undef UNICODE
#define UNICODE
#include <windows.h>

int main()
{
HANDLE const out = GetStdHandle( STD_OUTPUT_HANDLE );
wchar_t const gClefManual[] = L"\xD834\xDD1E\n"; // L"\u0001D11E\n"

DWORD nCharsWritten;
WriteConsole( out, gClefManual, 3, &nCharsWritten, nullptr );
}
</code>

That bug is not the feature of displaying a single glyph over two cells.

Generally yes, but depends on the caller. For example, piping output
from another program at a cmd prompt started without "/u" incurs a
double translation.

That's incorrect: the command line is not translated. It is not
translated because it is not passed via the i/o streams. It is instead
(ultimately) passed to the new process as an UTF-16 encoded argument of
the CreateProcess API function, and it is available to the process as a
UTF-16 encoded string via the GetCommandLine API function.

You either misunderstood or misrepresent what I wrote. And sorry,
but "get a console window to translate to/from Windows ANSI" makes
no sense whatsoever.

Maybe you did not understand what you wrote about the `chcp` command.

The translation I referred to is between the narrow character encoding
employed for a process' standard i/o streams such as standard output,
and the console window's UCS2 encoded text buffer.

This translation is specified by the console window's active codepage,
or if you want get really detailed, by the active codepages for input
and output (but at the ordinary user level one thinks of just one).

Right. Can you point to documentation stating that they are/should
support UTF-8?

That's meaningless.

You don't need documentation stating that a program should not crash.

It is an implicit requirement of any program that it should not crash.

Note that the 1st is a CLI builtin, and the 2nd is a standalone app.
Just because they chose not to support UTF-8 has no bearing at all
on whether the Win32 console subsystem supports Unicode or not.

That's incorrect. As of Windows 7 "the 1st", namely the 'more' command,
is the program [more.com] residing in the system folder. Despite the
name it's not a COM format executable but an ordinary PE format executable.

If you're interested in academic word games then it's true that there
exists an interpretation of "Win32 console subsystem" where such support
programs are not part of it, which is why I omitted the "sub", so for
the purposes of word gaming the crashes don't matter, I guess.

But for any practical consideration the fact that Windows' standard
programs crash, is pretty significant.

No, I think you are just confused ;-)

Everything you've written has so far been nearly void of real meaning,
and mostly incorrect technically.

But it has been full to the brim of misleading, manipulative and
deceptive wordage.

Thus, I know you for a liar.

Cheers & hth.,

- Alf

Alf P. Steinbach · Nov 23, 2011

What a surprise: Alf trolls the newsgroup again hoping that his latest
blog gives him some legitimacy just as with his "The unsigned types are
for bit-level operations (only)," manifesto.

I do not discount the notion that "Liviu" is a sock puppet of yours.

Just for other readers:

Leigh is a known troll. I killfiled him when he posted rather negative
opinions about the secretary of the standardization committee's library
group. Now on a new machine I can see his postings again.

- Alf

Liviu · Nov 23, 2011

A little knowledge is a dangerous thing.
And a big ego only makes it worse ;-)

P.S. I neither play nor appreciate sock puppets. Better save your cheap
shots for cases where you have at least the shade of a leg to stand on.

Alf P. Steinbach said:
Alf P. Steinbach said:

On 22.11.2011 22:36, Liviu wrote:
On 22.11.2011 20:46, Liviu wrote:

It's interesting that there is not much whining about how
Windows consoles do not properly support international programs.

Off-topicness aside, but the Windows console itself is as
"international" as one cares to make use of.

Console WinRar (rar) supports UTF-16 response files, and console
7-Zip (7za) takes UTF-8 ones. Console mode variant of the Epsilon
editor (epsilonC) handles a bunch of encodings, including all
flavors of Unicode, just fine. The ZTree file manager (a console
app) is fully Unicode enabled for both browsing and viewing
files.

Well, what encodings programs can deal with in files has nothing
to do with the Windows console subsystem's Unicode support.

Given any program that deals with Unicode files, such as Microsoft
Word [...]

Sorry, but not sure what the point of your argument is here.

How a program deals with files has nothing to do with Unicode
support in the console system.

Click to expand...

OK, maybe (+), but this has little to do with what your _original_
post stated, which is what I was replying to.

Click to expand...

You should have referred to something real here, and quoted it.

It's still quoted at the top of this followup. It was quoted again
towards the bottom of my previous reply. For your convenience,
here it is one more time, in your own words...

|| It's interesting that there is not much whining about how Windows
|| consoles do not properly support international programs.

That's incorrect. The console text buffer is UCS2. There is exactly 16
bits per character cell.

Don't know what gave you that (wrong) idea.

memory layout for one cell of a console text buffer is specified at

http://msdn.microsoft.com/en-us/library/windows/desktop/ms682013(v=vs.85).aspx

FYI what that describes is the Unicode "character". I see no mention of
"cells" or console screen glyphs.

That's an incorrect associative interpretation. The article does not
support your contention.

My interpretation and contention are both correct. For example, U+61A8
is one _character_ displayed across two _cells_ in the console (yes,
even at the cmd prompt if you enable Alt+ Unicode input).

Funny, this was from one of my test files. Didn't realize that it meant
"'foolish, silly, coquettish" in Chinese until I google'd it now ;-)

That's meaningless.

You don't need documentation stating that a program should not crash.

If you expect a program to support a certain feature, you'd better base
that expectation on something other than blind hope.

It is an implicit requirement of any program that it should not crash.

Exiting with an error is not a crash. It's just one of many UB possible
outcomes, when you pass unsupported arguments.

But for any practical consideration the fact that Windows' standard
programs crash, is pretty significant.

I thought the point was about what 3rd party console programs (such as
yourself would write in C/C++) could achieve with the Win32 console
API. You sounded like "not much". I've given you examples of such real
programs which do achieve full Unicode compliance. Granted, they are
not written by you. And sorry if you are still in denial.

Out of here for good now, and Happy Turkey everyone.

Liviu

Alf P. Steinbach · Nov 23, 2011

Alf P. Steinbach said:
A little knowledge is a dangerous thing.
And a big ego only makes it worse ;-)

P.S. I neither play nor appreciate sock puppets. Better save your cheap
shots for cases where you have at least the shade of a leg to stand on.

Alf P. Steinbach said:

On 22.11.2011 22:36, Liviu wrote:
On 22.11.2011 20:46, Liviu wrote:

It's interesting that there is not much whining about how
Windows consoles do not properly support international programs.

Off-topicness aside, but the Windows console itself is as
"international" as one cares to make use of.

Console WinRar (rar) supports UTF-16 response files, and console
7-Zip (7za) takes UTF-8 ones. Console mode variant of the Epsilon
editor (epsilonC) handles a bunch of encodings, including all
flavors of Unicode, just fine. The ZTree file manager (a console
app) is fully Unicode enabled for both browsing and viewing
files.

Well, what encodings programs can deal with in files has nothing
to do with the Windows console subsystem's Unicode support.

Given any program that deals with Unicode files, such as Microsoft
Word [...]

Sorry, but not sure what the point of your argument is here.

How a program deals with files has nothing to do with Unicode
support in the console system.

OK, maybe (+), but this has little to do with what your _original_
post stated, which is what I was replying to.

Click to expand...

You should have referred to something real here, and quoted it.

Click to expand...

It's still quoted at the top of this followup. It was quoted again
towards the bottom of my previous reply. For your convenience,
here it is one more time, in your own words...

|| It's interesting that there is not much whining about how Windows
|| consoles do not properly support international programs.

It's impossible that you disagree with any of that.

No one who reads [clc++] think that Windows consoles do support
international programs very well, and I hope that none of them think
that there's been much whining about it. I think that most readers would
probably disagree with my "interesting" assessment. But that's irony,
and I think they agree with the irony...

In short, when you try to give the impression that you disagree with any
of that, you're just again portraying yourself as free of knowledge of
the technical, but above average on creating impressions.

Don't know what gave you that (wrong) idea.

I meant that there's exactly 16 bits per character code, with one such
character code, as a wchar_t value, per cell.

There are 16 bits because wchar_t in Windows is 16 bits.

The layout of a console screen buffer follows closely the layout of the
original IBM PC screen memory. The main difference is that instead of
just 8 bits per character code, you have 16 bits, in order to
accommodate original Unicode. There is a small functional difference in
that (at least by default) all the attribute bits control color instead
of blinking or underlining as in the original IBM PC screen adapter.

FYI what that describes is the Unicode "character". I see no mention of
"cells" or console screen glyphs.

That's inconsistent (which is revealing): a bit further down you have no
problem understanding what a cell in the console window text buffer is.

The console window text buffer is an array of lines consisting of
character position cells.

Each cell has a character code (16 bits, for UCS2) and an attribute that
controls foreground and background color for this cell.

My interpretation and contention are both correct. For example, U+61A8
is one _character_ displayed across two _cells_ in the console (yes,
even at the cmd prompt if you enable Alt+ Unicode input).

That's incorrect.

It displays as a single cell rectangle in a Windows 7 console window on
my machine.

There may be software that can display it correctly, but it is not default.

Funny, this was from one of my test files. Didn't realize that it meant
"'foolish, silly, coquettish" in Chinese until I google'd it now ;-)

Well it does not take much guts to hide behind a nick.

Have you noticed that *every* technical assertion you have made, has
been shown as incorrect?

It's near impossible to be that 100% consistently incompetent, so I
think it's part of your trolling, that is, that you endeavor to make as
many false but plausible-sounding technical assertions as you can in
order to rile me up.

If you expect a program to support a certain feature, you'd better base
that expectation on something other than blind hope.

Asserting that one needs blind hope to expect that a program does not
crash in ordinary conditions, is stupid beyond belief -- unless the
person who says this is already identified as a troll and a liar.

Exiting with an error is not a crash. It's just one of many UB possible
outcomes, when you pass unsupported arguments.

That's incorrect: no arguments were passed in the example command
sequence I asked you to try.

One does not expect any program to crash like that.

Especially not Microsoft's own, like `more` and `csc`.

I thought the point was about what 3rd party console programs (such as
yourself would write in C/C++) could achieve with the Win32 console
API.

There's a difference between "could" and practical reality.

I think the worst imaginable system is one where anything "could" be
achieved.

Because then it's difficult to get rid of.

You sounded like "not much". I've given you examples of such real
programs which do achieve full Unicode compliance. Granted, they are
not written by you. And sorry if you are still in denial.

File handling, which you've focused on, has nothing to do with consoles.

But much is possible to do even with Windows consoles.

Again, but this time for the benefit of other readers, do you seriously
think that I would be writing blog articles about how to do something
that I regarded as impossible?

You have to know that that supposition does not make any sense at all.

Yet you have persisted in pressing it.

Which means that you're a troll, a liar.

Cheers & hth.,

- Alf

Liviu · Nov 23, 2011

Alf P. Steinbach said:
It's still quoted at the top of this followup. It was quoted again
towards the bottom of my previous reply. For your convenience,
here it is one more time, in your own words...

|| It's interesting that there is not much whining about how Windows
|| consoles do not properly support international programs.

Click to expand...

It's impossible that you disagree with any of that.

No one who reads [clc++] think that Windows consoles

....none thinks Windows consoles are on topic here, for that matter.

do support international programs very well, and I hope that none of
them think that there's been much whining about it. I think that most
readers would probably disagree with my "interesting" assessment.
But that's irony, and I think they agree with the irony...

In short, when you try to give the impression that you disagree with
any of that, you're just again portraying yourself as free of
knowledge of the technical, but above average on creating impressions.

You seem to still not acknowledge the simple fact that there are Win32
console programs with full "international" support. I've pointed you to
examples thereof in the first place. Just because you choose to deny
they exist won't make them go away, except maybe in the eyes of your
loyal-no-questions-asked blog followers.

I meant that there's exactly 16 bits per character code, with one such
character code, as a wchar_t value, per cell.

What exactly is that which you call a "cell"?

There are 16 bits because wchar_t in Windows is 16 bits.

Correct, so far.

The layout of a console screen buffer follows closely the layout of
the original IBM PC screen memory.
Huh??

That's incorrect.

It displays as a single cell rectangle in a Windows 7 console window
on my machine.

There may be software that can display it correctly, but it is not
default.

You are plain wrong. And I don't think you verified any of your
misplaced assumptions before posting. I start to doubt you even know
what "enable Alt+ Unicode input" means, which I'd have assumed any
self-professed Win32-console-expert must be familiar with. Try again,
look more closely.

File handling, which you've focused on, has nothing to do with
consoles.

You seem to be fixated on "file handling" just because my first example
was rar.exe. Did you care to read on to the Epsilon/ZTree parts? If
those are not fully Unicode enabled Win32 console apps, then what
exactly is your definition thereof?

Bye & hth,
Liviu

Alf P. Steinbach · Nov 23, 2011

Alf P. Steinbach said:
Alf P. Steinbach said:

On 23.11.2011 01:13, Liviu wrote:

You should have referred to something real here, and quoted it.

It's still quoted at the top of this followup. It was quoted again
towards the bottom of my previous reply. For your convenience,
here it is one more time, in your own words...

|| It's interesting that there is not much whining about how Windows
|| consoles do not properly support international programs.

Click to expand...

It's impossible that you disagree with any of that.

No one who reads [clc++] think that Windows consoles

Click to expand...

...none thinks Windows consoles are on topic here, for that matter.

It's enough on-topic to discuss here, since it pertains to the
(portability of the) functionality offered by the C++ standard library.

You seem to still not acknowledge the simple fact that there are Win32
console programs with full "international" support.

That's nonsense.

But well, you're into impressions.

You're trying hard to convey the impression that I do not "acknowledge"
the existence of some program, which implies that I have denied its
existence. Or something. That is not the case.

I've pointed you to
examples thereof in the first place. Just because you choose to deny
they exist won't make them go away,

I have not denied the existence of any programs.

except maybe in the eyes of your
loyal-no-questions-asked blog followers.

That's personal. OK, I can understand you getting personal after I have
pointed out how you're lying, misleading, deceiving, and generally
trolling, plus that *every* technically assertion of yours has been
incorrect -- which is so abnormal an incompetence that it must be by
design. But just to keep score, now you're over into the personal.

What exactly is that which you call a "cell"?

I gave you a link to its memory representation, in the text you snipped.

http://msdn.microsoft.com/en-us/library/windows/desktop/ms682013(v=vs.85).aspx

It contains a character code (16 bits) and a set of color attributes.
Checking now, I see that the attributes are also 16 bits. With 1
apparently unused bit.

A cell is displayed as a rectangular region, that was designed to
display a single character. The rectangles of the possible display
positions form a matrix in a console window. On old-fashioned terminals
such as the VT52 the screen typically had 25 lines of 80 cells
(character positions), and that has been the default size of a Windows
console window until Windows Vista at least -- up until and including
Windows XP it was also the default size of the underlying text buffer.

Correct, so far.

Good. Then notice first there is exactly 1 wchar_t in each cell (along
with an attribute word). And then that 16 or 16+1 bits suffices for
UCS2, but not for more than one plane of surrogate pairs.

Huh??

I'm talking about the color adapter, CGA and later (IIRC). To present
text on the early IBM PC, you placed byte values in memory starting at
B8000 physical (again, IIRC). The format was (IIRC) 1 char byte followed
by 1 color attribute byte forming a cell. The number of columns and rows
depending on the video mode, but default 25x80. And, different from a
Windows console window, with the msb of the background color nybble by
default controlling blinking or underline (I don't recall) -- but
possible to turn off, so you get 16 background colors.

Similarly, you can present text in a Windows console window by filling
in a buffer with roughly the same format.

There is an example at
http://msdn.microsoft.com/en-us/library/windows/desktop/ms685032(v=VS.85).aspx

You are plain wrong.

That's rubbish. I reported the effect on my PC. The effect on my PC is
not wrong, it is the default.

For example, when I `type` a file containing

Ã¦Ã¸Ã¥æ†¨Ã¦Ã¸Ã¥

it displays as

Ã¦Ã¸Ã¥HÃ¦Ã¸Ã¥

except that instead of H there is a rectangle, occupying a single cell.

There may be software that can make it display correctly, but it's not
there by default.

And I don't think you verified any of your
misplaced assumptions before posting.

Your assertion was wrong, and you call that a misplaced assumption of
mine. In response to me checking your assertion and revealing that it
was false, incorrect, wrong, you assert that I am not checking my
(unspecified) assumptions. You need to work on your logic mate. It's not
enough to assert and assert and assert. There needs to be something
more, like verifiable fact, or a chain of reasoning, or some such.

Do consider that *all* your assertions so far have been wrong.

I start to doubt you even know
what "enable Alt+ Unicode input" means,

I just pasted the character, and for good measure I copied it back from
its single cell in the console window.

I'd never heard of Alt+Unicode before you mentioned it, and I certainly
didn't need it to successfully place that character in a console
window's text buffer.

Isn't it rather silly to point out that someone hasn't heard about some
trivia that you mention? While at the same time revealing that you were
unable to think of simple pasting? Huh?

which I'd have assumed any
self-professed Win32-console-expert must be familiar with. Try again,
look more closely.

That's incorrect, I'm neither self-professed nor a Win32 console expert.

But I do know that none of your technical assertions so far, have been
correct: they have all been of the sort a purely associative mind comes
up with. As I have stated, I think you're doing that on purpose, for
trolling effect. It is, anyway, an achievement to get everything wrong.

You seem to be fixated on "file handling" just because my first example
was rar.exe. Did you care to read on to the Epsilon/ZTree parts? If
those are not fully Unicode enabled Win32 console apps, then what
exactly is your definition thereof?

I have no idea what you mean by "fully Unicode enabled Win32 console
apps" -- it is you and only you who have been going on about that.

And I suspect that you don't know what you mean by that.

Cheers & hth.,

- Alf

I need some help on a format issue that should be simple for someone here (but not me!)	0	Jul 6, 2023
Simple console input / output framework for teaching beginners	37	May 26, 2014
Python unicode and Windows cmd.exe	10	Mar 14, 2010
Unicode escaping fun & games	0	Apr 23, 2009
Converting EBCDIC to Unicode	3	Sep 28, 2010
How is unicode implemented behind the scenes?	4	Mar 9, 2014
Unicode (UTF-8) in C	13	Mar 16, 2014
Need help with printing Unicode! (C++ on CentOS)	30	Aug 28, 2009

Oh! Unicode, console windows, Windows! That must be fun! :-)

Alf P. Steinbach

Liviu

Alf P. Steinbach

Liviu

Alf P. Steinbach

Liviu

Alf P. Steinbach

Alf P. Steinbach

Liviu

Alf P. Steinbach

Liviu

Alf P. Steinbach

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads