playing with pyGoogle - strange codec error

B

Brian Blazer

Hello,

I am playing around with pyGoogle and encountered an error that I have
never seen, and I am unsure how to correct for it. Here is a code
snippet:

for r in data.results:
print 'Title: ',r.title
print 'URL: ',r.URL
print 'Summary: ',r.snippet
print

Everything works fine until I get to r.snippet. Here is the error:

UnicodeEncodeError: 'ascii' codec can't encode character '\ua9' in
position 119: ordinal not in range(128)

Any help is appreciated.

Thanks,
Brian
 
B

Brian Blazer

On 2005-04-04 10:06:23 -0500, Brian Blazer <[email protected]> said:

<snip>

You know, I am beginning to think that I MAY have stumbled on a bug
here. At first I was thinking that this issue was related to the
offending character being out of range for the Mac. Then I tried it on
A MS machine and a linux box; all with the same error.

This does not happen when I wrote the same script in java. This is
making me wonder if there is an issue with the wrapper for the google
api that was originally done in java.

For the sake of it, here is the full code (minus my google key). It is
going to look wierd, but those print statements are there so that I
dont have to open the file it is writing to every time I want to see
stuff. it has my name hard coded into the search query. The commented
r.snippet.encode(mac_roman) was there to see if by changing the
encoding, I could make it work (no luck). I also tried putting

#-*- coding: utf-8 -*-

right after the shebang (as listed here:
http://www.python.org/peps/pep-0263.html). Again, no help.

Anyway, here is the code ------------------------>

import google

google.LICENSE_KEY = 'insertKeyHere'
#print google.doSpellingSuggestion('helllo')
data = google.doGoogleSearch('Brian Blazer')
print 'Found %d results' % len(data.results)

searchData = open('searchData.txt','w')

for r in data.results:
# r.snippet.encode('mac_roman')
searchData.write ('Title: ' + r.title + '\n' + '\n')
searchData.write ('URL: ' + r.URL + '\n' + '\n')
searchData.write ('Snippet: ' + r.snippet + '\n' + '\n'+'\n')
print r.URL
print r.title
print r.snippet
 
E

Erik Max Francis

Brian said:
You know, I am beginning to think that I MAY have stumbled on a bug
here. At first I was thinking that this issue was related to the
offending character being out of range for the Mac. Then I tried it on
A MS machine and a linux box; all with the same error.

The problem, common to all three, is that you're using a terminal whose
default encoding doesn't specify a valid encoding for the copyright
character (in the first case, the default encoding is 'ascii'; it is
likely the case for the others, as well).

When you print a Unicode string, by default it is encoded to your
default encoding. The problem is this cannot be done faithfully with a
string containing a non-ASCII symbol (like the copyright character which
is actually triggering it for you). So, consequently, the encoding is
failing with an error.

What you probably want here is either to use another encoding, or to
specify what to do in the case that the encoding is not possible.
Either encode to a different encoding (one which you know your terminal
supports even though it is not detected, e.g., 'latin-1'), or specify
what to do with errors in the encoding (e.g., 'ignore', which removes
the offending characters, or 'replace', which replaces them with
question marks):

aUnicodeString.decode('latin-1')
aUnicodeString.decode('ascii', 'replace')
This does not happen when I wrote the same script in java. This is
making me wonder if there is an issue with the wrapper for the google
api that was originally done in java.

Java does not handle Unicode the same way.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,234
Messages
2,571,178
Members
47,808
Latest member
sunnysingh55

Latest Threads

Top