kdialog and unicode

D

Dumbkiwi

I'm trying to get python, unicode and kdialog to play nicely together.
This is a linux machine, and kdialog is a way to generate dialog boxes in
kde with which users can interact (for example input text), and you can
use the outputted text in your script.

Anyway, what I'm doing is reading from a utf-8 encoded text file using the
codecs module, and using the following:

data = codecs.open('file', 'r', 'utf-8')

I then manipulate the data to break it down into text snippets.

Then I run this command:
Traceback (most recent call last):
File "<stdin>", line 1, in ?
UnicodeEncodeError: 'ascii' codec can't encode character u'\u017a' in
position 272: ordinal not in range(128)

I would really like kdialog display the text as utf-8. However, it seems
that python is trying to pass the utf-8 encoded data as ascii, which
obviously fails because it can't deal with the utf-8 encoded text. Is it
possible to pass the text out to kdialog as utf-8, rather than ascii?

Or have I completely misunderstood the whole process, in which case, can
you please enlighten me.

Matt
 
P

Peter Otten

Dumbkiwi said:
I'm trying to get python, unicode and kdialog to play nicely together.
This is a linux machine, and kdialog is a way to generate dialog boxes in
kde with which users can interact (for example input text), and you can
use the outputted text in your script.

Anyway, what I'm doing is reading from a utf-8 encoded text file using the
codecs module, and using the following:

data = codecs.open('file', 'r', 'utf-8')

data is now a unicode string.
I then manipulate the data to break it down into text snippets.

Then I run this command:

Traceback (most recent call last):
File "<stdin>", line 1, in ?
UnicodeEncodeError: 'ascii' codec can't encode character u'\u017a' in
position 272: ordinal not in range(128)

I would really like kdialog display the text as utf-8. However, it seems
that python is trying to pass the utf-8 encoded data as ascii, which
obviously fails because it can't deal with the utf-8 encoded text. Is it
possible to pass the text out to kdialog as utf-8, rather than ascii?

Just encode the data in the target encoding before passing it to os.popen():

test = os.popen('kdialog --inputbox %s' % data.encode("utf-8"))

Peter
 
D

Dumbkiwi

data is now a unicode string.



Just encode the data in the target encoding before passing it to
os.popen():

test = os.popen('kdialog --inputbox %s' % data.encode("utf-8"))

Peter

I had tried that, but then the text looks like crap. The text I'm using
for this is Polish, and there are a lot of non-English characters in
there. Using this method results in some strange characters - basically it
looks like a file encoded in utf-8, but displayed using iso-8859-1.

Is this the best I can do?

Thanks for your help.

Matt
 
P

Peter Otten

I had tried that, but then the text looks like crap. The text I'm using
for this is Polish, and there are a lot of non-English characters in
there. Using this method results in some strange characters - basically it
looks like a file encoded in utf-8, but displayed using iso-8859-1.

Is this the best I can do?

I've just tried the setup you described (with German umlauts instead of
Polish characters) on my Suse 9.1, and it works as expected with both
Python 2.3 and 2.4. Perhaps the target encoding you need is not UTF-8. I
would try other popular encodings used for Polish text (no idea what these
are). sys.stdout.encoding might give you a clue.

Peter
 
D

dumbkiwi

Peter Otten said:
I've just tried the setup you described (with German umlauts instead of
Polish characters) on my Suse 9.1, and it works as expected with both
Python 2.3 and 2.4. Perhaps the target encoding you need is not UTF-8. I
would try other popular encodings used for Polish text (no idea what these
are). sys.stdout.encoding might give you a clue.

Peter

Both sys.stdout.encoding and sys.stdin.encoding give:

ANSI_X3.4-1968

which is ascii (I think).

I'd be interested to see what your default encoding is, and why your
output was different.

Anyway, from your post, I've done some more digging, and found the
command:

sys.setappdefaultencoding()

which I've used, and it's fixed the problem (I think).

Thanks for your help.

Matt
 
P

Peter Otten

dumbkiwi said:
I'd be interested to see what your default encoding is,
ascii

and why your output was different.

If only I knew.
Anyway, from your post, I've done some more digging, and found the
command:

sys.setappdefaultencoding()

That is an alias for sys.setdefaultencoding() created by your IDE (Eric),
and therefore may not always be available.

Peter
 
J

John Machin

Anyway, from your post, I've done some more digging, and found the
command:

sys.setappdefaultencoding()

which I've used, and it's fixed the problem (I think).

Dumb Kiwi, eh? Maybe not so dumb -- where'd you find
sys.setappdefaultencoding()? I'm just a dumb Aussie [1]; I looked in
the 2.4.1 docs and also did import sys; dir(sys) and I can't spot it.

In any case, how could the magical sys.setappdefaultencoding() fix
your problem? From your description, your problem appeared to be that
you didn't know what encoding to use.

What is the essential difference between

send(u_data.encode('polish'))

and

sys.setappdefaultencoding('polish')
...
send(u_data)

[1]: Now that's *TWO* contenders for TautologyOTW :)

Cheers,

John
 
D

dmbkiwi

Peter said:
If only I knew.


That is an alias for sys.setdefaultencoding() created by your IDE (Eric),
and therefore may not always be available.

Peter

Hmmm. That's disappointing. I've also discovered that you can do:

import sys
reload(sys)

and then get access to sys.setdefaultencoding().

Will that get me into trouble?

Matt
 
D

dmbkiwi

John said:
Peter Otten <[email protected]> wrote in message
Anyway, from your post, I've done some more digging, and found the
command:

sys.setappdefaultencoding()

which I've used, and it's fixed the problem (I think).

Dumb Kiwi, eh? Maybe not so dumb -- where'd you find
sys.setappdefaultencoding()? I'm just a dumb Aussie [1]; I looked in
the 2.4.1 docs and also did import sys; dir(sys) and I can't spot it.

Hmmm. See post above, seems to be something generated by eric3. So
this may not be the fix I'm looking for.
In any case, how could the magical sys.setappdefaultencoding() fix
your problem? From your description, your problem appeared to be that
you didn't know what encoding to use.

I knew what encoding to use, the problem was that the text was being
passed to kdialog as ascii. The .encode('utf-8') at least allows
kdialog to run, but the text still looks like crap. Using
sys.setappdefaultencoding() seemed to help. The text looked a bit
better - although not entirely perfect - but I think that's because the
font I was using didn't have the correct characters (they came up as
square boxes).
What is the essential difference between

send(u_data.encode('polish'))

and

sys.setappdefaultencoding('polish')
...
send(u_data)

Not sure - I'm new to character encoding, and most of this seems like
black magic to me.
[1]: Now that's *TWO* contenders for TautologyOTW :)

Cheers,

John

Matt
 
J

John Machin

John said:
Peter Otten <[email protected]> wrote in message
Dumbkiwi wrote:

Just encode the data in the target encoding before passing it to
os.popen():
Anyway, from your post, I've done some more digging, and found the
command:

sys.setappdefaultencoding()

which I've used, and it's fixed the problem (I think).

Dumb Kiwi, eh? Maybe not so dumb -- where'd you find
sys.setappdefaultencoding()? I'm just a dumb Aussie [1]; I looked in
the 2.4.1 docs and also did import sys; dir(sys) and I can't spot it.

Hmmm. See post above, seems to be something generated by eric3. So
this may not be the fix I'm looking for.
In any case, how could the magical sys.setappdefaultencoding() fix
your problem? From your description, your problem appeared to be that
you didn't know what encoding to use.

I knew what encoding to use,

Would you mind telling us (a) what that encoding is (b) how you came
to that knowledge (c) why you just didn't do

test = os.popen('kdialog --inputbox %s'
%(data.encode('that_encoding')))

instead of

test = os.popen('kdialog --inputbox %s' %(data.encode('utf-8')))
the problem was that the text was being
passed to kdialog as ascii.

It wasn't being passed to kdialog; there was an attempt which failed.
The .encode('utf-8') at least allows
kdialog to run, but the text still looks like crap. Using
sys.setappdefaultencoding() seemed to help. The text looked a bit
better - although not entirely perfect - but I think that's because the
font I was using didn't have the correct characters (they came up as
square boxes).

And the font you *were* using is what? And the font you are now using
is what? What facilities do you have to use different fonts?
Not sure - I'm new to character encoding, and most of this seems like
black magic to me.

The essential difference is that setting a default encoding is a daft
idea.

[1]: Now that's *TWO* contenders for TautologyOTW :)

Before I retract that back to one contender, I'll give it one more
shot:

1. Your data: you say it is Polish text, and is utf-8. This implies
that it is in Unicode, encoded as utf-8. What evidence do you have?
Have you been able to display it anywhere so that it "looks good"?
If it's not confidential, can you show us a dump of the first say 100
bytes of text, in an unambiguous form, like this:

print repr(open('polish.text', 'rb').read(100))

2. Your script: You say "I then manipulate the data to break it down
into text snippets" - uh-huh ... *what* manipulations? Care to tell
us? Care to show us the code?

3. kdialog: I know nothing of KDE and its toolkit. I would expect
either (a) it should take utf-8 and be able to display *any* of the
first 64K (nominal) Unicode characters, given a Unicode font or (b)
you can encode your data in a legacy charset, *AND* tell it what that
charset is, and have a corresponding font or (c) you have both
options. Which is correct, and what are the details of how you can
tell kdialog what to do -- configuration? command-line arguments?

HTHYTHYS,

John
 
D

dmbkiwi

John said:
John said:
On 26 Apr 2005 13:39:26 -0700, (e-mail address removed) (dumbkiwi) wrote:

Peter Otten <[email protected]> wrote in message
Dumbkiwi wrote:

Just encode the data in the target encoding before passing
it
to
os.popen():


Anyway, from your post, I've done some more digging, and found the
command:

sys.setappdefaultencoding()

which I've used, and it's fixed the problem (I think).


Dumb Kiwi, eh? Maybe not so dumb -- where'd you find
sys.setappdefaultencoding()? I'm just a dumb Aussie [1]; I looked in
the 2.4.1 docs and also did import sys; dir(sys) and I can't spot
it.

Hmmm. See post above, seems to be something generated by eric3. So
this may not be the fix I'm looking for.
In any case, how could the magical sys.setappdefaultencoding() fix
your problem? From your description, your problem appeared to be that
you didn't know what encoding to use.

I knew what encoding to use,

Would you mind telling us (a) what that encoding is (b) how you came
to that knowledge (c) why you just didn't do

(a) utf-8
(b) I asked the author of the text, and it displays properly in other
parts of the script when not using kdialog. Is there a way to test it
otherwise - I presume that there is.
test = os.popen('kdialog --inputbox %s'
%(data.encode('that_encoding')))

instead of

test = os.popen('kdialog --inputbox %s' %(data.encode('utf-8')))

Because, "that_encoding" == "utf-8" (as far as I was aware).
It wasn't being passed to kdialog; there was an attempt which failed.

Quite right.
And the font you *were* using is what? And the font you are now using
is what? What facilities do you have to use different fonts?

The font I was using was bitstream vera sans. The font I'm now using
is verdana.
The essential difference is that setting a default encoding is a daft
idea.
Because it acheives nothing more than what I can do with
..encode('that_encoding')?
[1]: Now that's *TWO* contenders for TautologyOTW :)

Before I retract that back to one contender, I'll give it one more
shot:
Aaah, there's nothing better than a bit of cheerful snarkiness on a
newsgroup.
1. Your data: you say it is Polish text, and is utf-8. This implies
that it is in Unicode, encoded as utf-8. What evidence do you have?

See above.
Have you been able to display it anywhere so that it "looks good"?

Yes. What I am doing here is a theme for a superkaramba widget (see
http://netdragon.sourceforge.net). It displays fine everywhere else on
the widget, it's just in the kdialog boxes that it doesn't display
correctly.
If it's not confidential, can you show us a dump of the first say 100
bytes of text, in an unambiguous form, like this:

Can't do it now, because I'm at work. I can do it when I get home
tonight.
print repr(open('polish.text', 'rb').read(100))

2. Your script: You say "I then manipulate the data to break it down
into text snippets" - uh-huh ... *what* manipulations? Care to tell
us? Care to show us the code?

Manipulation is simply breaking the text down into dictionary pairs.
It is basically a translation file for my widget, with English text,
and a corresponding Posish text. I use the re module to parse the
file, and create dictionary pairs between the English text, and the
corresponding Polish text.
3. kdialog: I know nothing of KDE and its toolkit. I would expect
either (a) it should take utf-8 and be able to display *any* of the
first 64K (nominal) Unicode characters, given a Unicode font or (b)
you can encode your data in a legacy charset, *AND* tell it what that
charset is, and have a corresponding font or (c) you have both
options. Which is correct, and what are the details of how you can
tell kdialog what to do -- configuration? command-line arguments?

That's what I was hoping someone here might be able to tell me. Having
searched on line, I cannot find any information about kdialog and
encoding. I have left a message on the relevant kde mailing list, but
have had no response. The command line options are found with kdialog
--help, but as you don't have kde, it will be difficult for you to look
at those. Having examined them at length, there is no option for
encoding.
HTHYTHYS,

John

Thanks for your help and interest.

Matt
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads

Python 3.3, gettext and Unicode problems 0
Thinking Unicode 0
Unicode error 19
Unicode 2
Unicode 20
Convert unicode escape sequences to unicode in a file 1
helping with unicode 4
b64encode and unicode problem 2

Members online

Forum statistics

Threads
474,237
Messages
2,571,189
Members
47,825
Latest member
XCCMilo924

Latest Threads

Top