pyw program not displaying unicode characters properly

J

jjmeric

Hi everybody !

Our language lab at INALCO is using a nice language parsing and analysis
program written in Python. As you well know a lot of languages use
characters that can only be handled by unicode.

Here is an example of the problem we have on some Windows computers.
In the attached screen-shot (DELETED),
the bambara character (a sort of epsilon) is displayed as a square.

The fact that it works fine on some computers and fails to display the
characters on others suggests that it is a user configuration issue:
Recent observations: it's OK on Windows 7 but not on Vista computers,
it's OK on some Windows XP computers, it's not on others Windows XP...

On the computers where it fails, we've tried to play with options in the
International settings, but are not able to fix it.

Any idea that would help us go in the right direction, or just fix it,
is welcome !

Thanks!
I ni ce! (in bambara, a language spoken in Mali, West Africa)
 
A

Alain Ketterlin

jjmeric said:
Our language lab at INALCO is using a nice language parsing and analysis
program written in Python. As you well know a lot of languages use
characters that can only be handled by unicode.

Here is an example of the problem we have on some Windows computers.
In the attached screen-shot (DELETED),

Usenet has no attachments. Place your document on some publicly
accessible web-servers, if needed.
the bambara character (a sort of epsilon) is displayed as a square.

The fact that it works fine on some computers and fails to display the
characters on others suggests that it is a user configuration issue:
Recent observations: it's OK on Windows 7 but not on Vista computers,
it's OK on some Windows XP computers, it's not on others Windows XP...

You need a font that has glyphs for all unicode characters (at least the
ones you use). See http://en.wikipedia.org/wiki/Unicode_font for a
start. I don't know enough about Windows to give you a name. Anyone?

-- Alain.

P/S: and this has not much to do with python, which will happily send
out any unicode char, and cannot know which ones your terminal/whatever
will be able to display
 
M

MRAB

Hi everybody !

Our language lab at INALCO is using a nice language parsing and analysis
program written in Python. As you well know a lot of languages use
characters that can only be handled by unicode.

Here is an example of the problem we have on some Windows computers.
In the attached screen-shot (DELETED),
the bambara character (a sort of epsilon) is displayed as a square.

The fact that it works fine on some computers and fails to display the
characters on others suggests that it is a user configuration issue:
Recent observations: it's OK on Windows 7 but not on Vista computers,
it's OK on some Windows XP computers, it's not on others Windows XP...

On the computers where it fails, we've tried to play with options in the
International settings, but are not able to fix it.

Any idea that would help us go in the right direction, or just fix it,
is welcome !

Thanks!
I ni ce! (in bambara, a language spoken in Mali, West Africa)
A square is shown when the font being used doesn't contain a visible
glyph for the codepoint.

Which codepoint is it? What is the codepoint's name?

Here's how to find out:
'LATIN CAPITAL LETTER OPEN E'
 
J

jjmeric

Alain, MRAB
Thank you for prompt responses.

What they suggest to me is I should look into what font is being used by
this Python for Windows program.
I am not the programmer, so not idea where to look for.
The program settings do not include a choice for display font.

The font that used for display resembles a sort of Helvetica, but no
idea how to check this.

Is there some sort of defaut font, or is there in Python or Python for
Windows any ini file where the font used can be seen, eventually changed
to a more appropriate one with all the required glyphs (like Lucida Sans
Unicode has).

Thanks again...
 
R

Roy Smith

MRAB said:
Which codepoint is it? What is the codepoint's name?

Here's how to find out:

'LATIN CAPITAL LETTER OPEN E'

Wow, I never knew you could do that. I usually just google for "unicode
0190" :)
 
S

Steven D'Aprano

Usenet has no attachments.

*snarfle*

You almost owed me a new monitor. I nearly sprayed my breakfast all over
it.

"Usenet has no attachments" -- that's like saying that the Web has no
advertisements. Maybe the websites you visit have no advertisements, but
there's a *vast* (and often disturbing) part of the WWW that has
advertisements, some sites are nothing but advertisements.

And so it is with Usenet, there is a vast (and often disturbing) area of
Usenet containing attachments, and often nothing but attachments. The
vast volume of all these attachments are such that it is getting hard to
find ISPs that provide free access to binary newsgroups, but some still
do, and dedicated for-fee Usenet providers do too.
 
I

Ian Kelly

Is there some sort of defaut font, or is there in Python or Python for
Windows any ini file where the font used can be seen, eventually changed
to a more appropriate one with all the required glyphs (like Lucida Sans
Unicode has).

No, this is up to the program and the GUI framework it uses. Do you
have any idea which one that would be (e.g. Tkinter, wxPython, PyQT,
etc.)?
 
J

jjmeric

No, this is up to the program and the GUI framework it uses. Do you
have any idea which one that would be (e.g. Tkinter, wxPython, PyQT,
etc.)?

Thanks Ian
I have no idea, but - thanks to you - I now have an interesting question
to ask back to the team who works on this in Russia... more later !
 
A

Alain Ketterlin

Steven D'Aprano said:
Usenet has no attachments.

*snarfle*

You almost owed me a new monitor. I nearly sprayed my breakfast all over
it. [...]

I owe you nothing, and you can do whatever you want with your breakfast.
"Usenet has no attachments" -- that's like saying that the Web has no
advertisements. Maybe the websites you visit have no advertisements, but
there's a *vast* (and often disturbing) part of the WWW that has
advertisements, some sites are nothing but advertisements.[...]

I really don't know what you are ranting about here. See Dennis' response.

Any idea about a reasonable complete unicode font on Windows? /That/
would be helpful.

-- Alain.
 
S

Steven D'Aprano

Classically, NNTP did not have "attachments" as seen in MIME email.

It did have "binaries" in some encoding -- UUE, BASE64, or some
newer format, but these encodings were the raw body of the post(s), not
something "attached" as a separate file along with a text body.


"A rose by any other name..."


A mere implementation detail. The intention is identical: to attach a non-
text file to a message that otherwise would be text. And the interface is
close enough as makes no difference.

You can even have a text part and binaries parts in the same news posting.
 
R

Roy Smith

Dennis Lee Bieber said:
Classically, NNTP did not have "attachments" as seen in MIME email.

NNTP (Network News Transport Protocol) and SMTP (Simple Mail Transfer
Protocol) are both just ways of shipping around messages. Neither one
really knows about attachments. In both mail and news, "attachments"
are a higher-level concept encoded inside the message content and
managed by the various user applications.
It did have "binaries" in some encoding -- UUE, BASE64, or some
newer format, but these encodings were the raw body of the post(s), not
something "attached" as a separate file along with a text body.

This is all true of both mail and news, with only trivial changes of the
formats and names of the encodings.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,995
Messages
2,570,230
Members
46,817
Latest member
DicWeils

Latest Threads

Top