Martin said:
If you do "chcp 1251" (not "chcp1251") in the console, and then
run python.exe in the same console, what is the value of
sys.stdout.encoding?
correctly: 'cp1252' in my case; cyrillic-chars break "print" (on PC
linux 2.2 tty sys.stdout.encoding does not exist)
I live with this in site(customize):
# tolerant unicode output ... #
_stdout=sys.stdout
if sys.platform=='win32' and not
sys.modules.has_key('pywin.framework.startup'):
_stdoutenc=getattr(_stdout,'encoding',sys.getdefaultencoding())
class StdOut:
def write(self,s):
_stdout.write(s.encode(_stdoutenc,'backslashreplace'))
sys.stdout=StdOut()
elif sys.platform.startswith('linux'):
import locale
_stdoutenc=locale.getdefaultlocale()[1]
class StdOut:
def write(self,s):
_stdout.write(s.encode(_stdoutenc,'backslashreplace'))
sys.stdout=StdOut()
I think it it is good.
Errors should never pass silently.
Unless explicitly silenced.
A political question. Arguments:
* Webbrowsers for example have to display defective HTML as good as
possible, unknown unicode chars as "?" and so on... Users got very
angry in the beginning of browsers when 'strict' programmers displayed
their exception error boxes ...
* at least the "print" statement has to go through - the costs (for
angry users and developers; e.g.
http://blog.ianbicking.org/do-i-hate-unicode-or-do-i-hate-ascii.html)
are much higher when apps suddenly break in simple print/display-output
when the system picks up alien unicode chars somewhere (e.g.
occasionally in filenames,...). No one is really angry when
occasionally chinese chars are displayed cryptically on non-chinese
computers. One can investigate, add fonts, ... to improve, or do
nothing in most cases, but apps do not break on every print statement!
This is not only true for tty-output, but also for log-file redirect
and almost any common situation for print/normal
stdout/file-(write)-output.
* anything is nice-printable in python by default, why not
unicode-strings!? If the decision for default 'strict' encoding on
stdout stands, we have at least to discuss about print-repr for
unicode.
* the need for having technical strings 'strict' is much more rare. And
programmers are anyway very aware in such situations . e.g. by
asciifile.write( us.encode(xy,'strict') ) .
* on Windows for example the (good) mbcs_encode is anyway tolerant as
it: unkown chars are mapped to '?' . I never had any objection to this.
Some recommendations - soft to hard:
* make print-repr for unicode strings tolerant (and in PythonWin
alwasy tolerant 'mbcs' encoding)
* make stdout/files to have 'replace'-mode encoding by default.
(similar as done with my code above)
* set site.py/encoding=('ascii', 'replace') # if not
utf-8/mbcs/locale ;enable a tuple
* save sys._setdefaultencoding by default
* I would also live perfectly with .encode(enc) to run 'replace' by
default, and 'strict' on demand. None of my apps and scripts would
break because of this, but win. A programmer is naturally very aware
when he wants 'strict'. Can you name realistic cases where 'replace'
behavior would be so critical that a program damages something?
Not sure why you aren't using sys.stdout.encoding on Linux. I would do
try:
c = codecs.getwriter(sys.stdout.encoding)
except:
c = codecs.getwriter('ascii')
sys.stdout = c(sys.stdout, 'replace')
Also, I wouldn't edit site.py, but instead add sitecustomize.py.
I have more problems with the shape of sys.path in different
situations, multiple sitecustomize.py on other apps, environments, OS /
users, cxfreeze,py2exe ... sitecustomize not stackable easily: a
horror solution. The need is for a callable _function_ or for general
change in python behaviour.
modifiying site.py is better and stable for me (I have my
patch/module-todo-list handy each time i install a new python), as I
always want tolerant behaviour. in code i check for
site.encoding/_setdefaultencoding (I save this). Thus i get one central
error if setup is not correct, but not evil unicode-errors somewhere
deep in the app once on a russian computer in the future...
Because the author of the application wouldn't know that there
is a bug in the application, and that information was silently
discarded. Users might only find out much later that they have
question marks in places where users originally entered data,
and they would have no way of retrieving the original data.
If you can accept that data loss: fine, but you should silence
the errors explicitly.
this is black/white theoretical - not real and practical (as python
wants to be). see above.
Robert