A python IDE for teaching that supports cyrillic i/o

K

Kirill Simonov

Hi,

Could anyone suggest me a simple IDE suitable for teaching Python as a
first programming language to high school students? It is necessary
that it has a good support for input/output in Cyrillic.

Unfortunately, most IDEs I tried failed miserably in this respect. My
test was simple: I've run the code
name = raw_input("What's your name? ") # written in Russian
print "Hello, %s!" % name # in Russian as well
both from the shell and as a standalone script. This either caused a
UnicodeError or just printed invalid characters.

For the record, I've checked IDLE, PythonWin, Eric, DrPython, SPE, and
WingIDE. The only ones that worked are WingIDE and IDLE (under Linux,
but not under Windows).


Thanks,
Kirill
 
L

Leo Kislov

Kirill said:
Hi,

Could anyone suggest me a simple IDE suitable for teaching Python as a
first programming language to high school students? It is necessary
that it has a good support for input/output in Cyrillic.

Unfortunately, most IDEs I tried failed miserably in this respect. My
test was simple: I've run the code
name = raw_input("What's your name? ") # written in Russian
print "Hello, %s!" % name # in Russian as well
both from the shell and as a standalone script. This either caused a
UnicodeError or just printed invalid characters.

For the record, I've checked IDLE, PythonWin, Eric, DrPython, SPE, and
WingIDE. The only ones that worked are WingIDE and IDLE (under Linux,
but not under Windows).

IDLE on Windows works fine for your example in interactive console:
u'\u041b\u0435\u043e\u043d\u0438\u0434'

and as a script:

What's your name? Леонид
Hello, Леонид!

That is IDLE + python 2.4 on Windows. So I'm not sure what is the
problem. In other messages you seems to be talking about system
console. Why? It's not part of IDE.

And another question: are you aware of the fact that recommended way to
handle non-ascii characters is to use unicode type? Most of IDEs should
work fine with unicode.

-- Leo
 
A

Alan Franzoni

Kirill Simonov si è divertito a scrivere:
Unfortunately, most IDEs I tried failed miserably in this respect. My
test was simple: I've run the code
name = raw_input("What's your name? ") # written in Russian
print "Hello, %s!" % name # in Russian as well
both from the shell and as a standalone script. This either caused a
UnicodeError or just printed invalid characters.

I highly dislike asking stupid questions, and this might be stupid
indeed... but did you write

# -*- coding: iso-8859-5 -*-

or

# -*- coding: koi8_r -*-

(they both seem suited to the Russian language, but I don't know the
difference)

as the first line in your .py file?

Personally, I use Eclipse+Pydev (a bit steep to learn at the beginning, and
quite memory and cpu hogging since it's a java-based ide; don't use it on
old/slow computers with less than 512MB RAM, and don't use version < 3.2
either) and it uses that very line to recognize the actual character set
employed. You may check with other encodings as well.

http://docs.python.org/lib/standard-encodings.html

It does work on Windows indeed.

UPDATE:
I tried with Eclipse+Pydev, and using koi8_r I seems to be able to simply
copy&paste a piece of the ixbt.com homepage in the editor I can save and
use it correctly.


--
Alan Franzoni <[email protected]>
-
Togli .xyz dalla mia email per contattarmi.
Remove .xyz from my address in order to contact me.
-
GPG Key Fingerprint (Key ID = FE068F3E):
5C77 9DC3 BD5B 3A28 E7BC 921A 0255 42AA FE06 8F3E
 
K

Kirill Simonov

Kirill Simonov si è divertito a scrivere:


I highly dislike asking stupid questions, and this might be stupid
indeed... but did you write

# -*- coding: iso-8859-5 -*-

or

# -*- coding: koi8_r -*-

(they both seem suited to the Russian language, but I don't know the
difference)

They are different encodings for the same character set. The latter is
mostly used under Unices, and the former is not used anywhere as far as
I know. There are two more cyrillic encodings: cp866 and cp1251 - for
DOS and Windows correspondingly.
as the first line in your .py file?

No, I would prefer the editor to save the .py files with non-ASCII
characters in UTF-8 encoding adding the BOM at the beginning of the
file. This will allow the interpreted to detect the file encoding
correctly and would save a teacher from explaining what an encoding is
and why it is needed.
Personally, I use Eclipse+Pydev (a bit steep to learn at the beginning, and
quite memory and cpu hogging since it's a java-based ide; don't use it on
old/slow computers with less than 512MB RAM, and don't use version < 3.2
either) and it uses that very line to recognize the actual character set
employed. You may check with other encodings as well.

Unfortunately, the Eclipse CPU/memory requirements are extremely high
for a high school, so I haven't even tried it.
 
A

Alan Franzoni

Kirill Simonov si è divertito a scrivere:
No, I would prefer the editor to save the .py files with non-ASCII
characters in UTF-8 encoding adding the BOM at the beginning of the
file. This will allow the interpreted to detect the file encoding
correctly and would save a teacher from explaining what an encoding is
and why it is needed.

You'll run into encoding problems anyway in your programmer's life. I don't
think it's a workaround, try this article:

http://www.joelonsoftware.com/articles/Unicode.html

I think it's highly useful, you could tell that to your students.

BTW, not every editor supports the BOM. Have you tried with the explicit
encoding line:

# -*- coding: utf-8 -*-

Eclipse+Pydev seems to work with that. I'm not able to check with other
editors right now, but it seems you're experiencing a simple encoding
problem; your editor doesn't know which encoding you'd like to use, so it
defaults to ascii or iso-8859-1 leading to such problems.



--
Alan Franzoni <[email protected]>
-
Togli .xyz dalla mia email per contattarmi.
Remove .xyz from my address in order to contact me.
-
GPG Key Fingerprint (Key ID = FE068F3E):
5C77 9DC3 BD5B 3A28 E7BC 921A 0255 42AA FE06 8F3E
 
K

Kirill Simonov

IDLE on Windows works fine for your example in interactive console:

Have you tried to use cyrillic characters in a Python string in
interactive console? When I do it, I get the "Unsupported characters in
input" error. For instance,
Unsupported characters in input
And another question: are you aware of the fact that recommended way to
handle non-ascii characters is to use unicode type? Most of IDEs should
work fine with unicode.

Usually using unicode type gives you much more headache than benefits
unless you are careful enough to never mix unicode and str objects.

Anyway, I just want the interactive console of an IDE to behave like a
real Python console under a UTF-8 terminal (with sys.stdout.encoding ==
'utf-8').


Thanks,
Kirill
 
K

Kirill Simonov

Kirill Simonov si è divertito a scrivere:


You'll run into encoding problems anyway in your programmer's life. I don't
think it's a workaround, try this article:

http://www.joelonsoftware.com/articles/Unicode.html

I think it's highly useful, you could tell that to your students.

Please remember that most of the students will not become professional
programmers. Personally I think that encoding problems are nightmare in
general, and in Python in particular, and would like to avoid explaining
it as longer as possible. Besides, it won't be me, it will be a teacher
who will explain it, and the teacher themself might have a vague notion
about this topic.
Eclipse+Pydev seems to work with that. I'm not able to check with other
editors right now, but it seems you're experiencing a simple encoding
problem; your editor doesn't know which encoding you'd like to use, so it
defaults to ascii or iso-8859-1 leading to such problems.

No, my problem is that the emulation of sys.stdin/sys.stdout under
an IDE's interactive console doesn't work like the real
sys.stdin/sys.stdout under a real terminal.


Thanks,
Kirill.
 
L

Leo Kislov

Kirill said:
Have you tried to use cyrillic characters in a Python string in
interactive console? When I do it, I get the "Unsupported characters in
input" error. For instance,

Unsupported characters in input

That works for me in Win XP English, with Russian locale and Russian
language for non-unicode programs. Didn't you say you want to avoid
unicode? If so, you need to set proper locale and language for
non-unicode programs.
Usually using unicode type gives you much more headache than benefits
unless you are careful enough to never mix unicode and str objects.

For a professional programmer life is full of headaches like this :)
For high school students it could be troublesome and annoying, I agree.

Anyway, I just want the interactive console of an IDE to behave like a
real Python console under a UTF-8 terminal (with sys.stdout.encoding ==
'utf-8').

Do you realize that utf-8 locale makes len() function and slicing of
byte strings look strange for high school students?

hi = u"Привет".encode("utf-8")
r = u"Ñ€".encode("utf-8")
print len(hi) # prints 12
print hi[1] == r # prints False
for char in hi:
print char # prints garbage

As I see you have several options:
1. Set Russian locale and Russian language for non-unicode programs on
Windows.
2. Introduce students to unicode.
3. Wait for python 3.0
4. Hack some IDE to make unicode friendly environment like unicode
literals by default, type("Привет") == unicode, unicode
stdin/stdout, open() uses utf-8 encoding by default for text files,
etc...

-- Leo
 
K

Kirill Simonov

That works for me in Win XP English, with Russian locale and Russian
language for non-unicode programs. Didn't you say you want to avoid
unicode? If so, you need to set proper locale and language for
non-unicode programs.

Thanks. After I set Russian language for non-unicode programs, the
`print "Привет"` expression started to work correctly.

On the other hand,doesn't display "Привет". The output looks like a CP1251-encoded string
was displayed using the latin1 character set.

It seems that the interactive interpreter in IDLE uses the CP1251
codepage.
Anyway, I just want the interactive console of an IDE to behave like a
real Python console under a UTF-8 terminal (with sys.stdout.encoding ==
'utf-8').

Do you realize that utf-8 locale makes len() function and slicing of
byte strings look strange for high school students?

hi = u"Привет".encode("utf-8")
r = u"Ñ€".encode("utf-8")
print len(hi) # prints 12
print hi[1] == r # prints False
for char in hi:
print char # prints garbage

No, it slipped off my mind...
As I see you have several options:
1. Set Russian locale and Russian language for non-unicode programs on
Windows.

I guess I will go this route. Looks that IDLE works reasonable well in
CP1251 locale.


Thanks,
Kirill
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,995
Messages
2,570,230
Members
46,817
Latest member
DicWeils

Latest Threads

Top