S
Stefan Ram
When you use a Windows-1252 editor to edit Java source and
then the Java process prints it to a Windows CP-850 console,
umlauts, like »ü«, will not be rendered correctly, because
the process will print the character »³« that has the code
in CP 850 that »ü« has in Windows 1252.
ü ---Windows 1252---> 252 ---CP 850---> ³
Until Windows XP and Java 1.6, one could edit the Java
source code in a console with CP 850 (using the Windows
console program »EDIT«), thus writing the »"ü"« using CP
850. When such a source code then is executed, it will print
a literal »ü« in CP 850, because the literal byte in the
source code has the value »129«, which is the code of »ü« in
CP 850. I used this in my classes as a quick way to show
that Java is able to print umlauts into the console, one
just needs to use an editor with the same codepage as the
console.
In Windows 7 with Java 1.7, this now gives an error. Java
tries to outsmart me and detects that the »ü« in the CP-850
source code does not exist in the supposed charset
Windows-1252, it gives me an error message. If I then
compile with »-encoding CP-850«, the error will be gone, but
Java will be too smart: It detects that »ü« means »ü« in CP
850 and converts the literal byte value 129 from the source
code to the value »ü« has in Unicode, then it will print
this to the CP-850 console, neutralizing the intended effect
of using a CP-850 editor »EDIT« and giving me »³«, again.
So, this change might make some source code invalid or
change its behavoir.
Well.
then the Java process prints it to a Windows CP-850 console,
umlauts, like »ü«, will not be rendered correctly, because
the process will print the character »³« that has the code
in CP 850 that »ü« has in Windows 1252.
ü ---Windows 1252---> 252 ---CP 850---> ³
Until Windows XP and Java 1.6, one could edit the Java
source code in a console with CP 850 (using the Windows
console program »EDIT«), thus writing the »"ü"« using CP
850. When such a source code then is executed, it will print
a literal »ü« in CP 850, because the literal byte in the
source code has the value »129«, which is the code of »ü« in
CP 850. I used this in my classes as a quick way to show
that Java is able to print umlauts into the console, one
just needs to use an editor with the same codepage as the
console.
In Windows 7 with Java 1.7, this now gives an error. Java
tries to outsmart me and detects that the »ü« in the CP-850
source code does not exist in the supposed charset
Windows-1252, it gives me an error message. If I then
compile with »-encoding CP-850«, the error will be gone, but
Java will be too smart: It detects that »ü« means »ü« in CP
850 and converts the literal byte value 129 from the source
code to the value »ü« has in Unicode, then it will print
this to the CP-850 console, neutralizing the intended effect
of using a CP-850 editor »EDIT« and giving me »³«, again.
So, this change might make some source code invalid or
change its behavoir.
Well.