Problems in Using C-API for Unicode handling

A

abhi

Hi,
I am trying to handle Unicode objects in C (Python 2.5.2). I am
getting PyObjects from and want to coerce them to unicode objects. The
documentation provides two APIs for that:

PyUnicode_FromEncodedObject(PyObject *obj, const char *encoding,
const char *errors)
PyUnicode_FromObject(PyObject *obj)

(http://www.python.org/doc/2.5.2/api/unicodeObjects.html)
Now I want to utf-16 so I am trying to use the first one, but it is
giving back NULL in case of PyObject is already Unicode type which is
expected. What puzzles me is that PyUnicode_FromObject(PyObject *obj)
is passing irrespective of type of PyObject. The API says it is
Shortcut for PyUnicode_FromEncodedObject(obj, NULL, "strict") but if I
use that, it returns NULL where as PyUnicode_FromObject works.

Is there any way by which I can take in any PyObject and convert it to
utf-16 object? Any help is appreciated.

Thanks,
Abhigyan
 
T

Terry Reedy

abhi said:
Hi,
I am trying to handle Unicode objects in C (Python 2.5.2). I am
getting PyObjects from and want to coerce them to unicode objects. The
documentation provides two APIs for that:

PyUnicode_FromEncodedObject(PyObject *obj, const char *encoding,
const char *errors)
PyUnicode_FromObject(PyObject *obj)

(http://www.python.org/doc/2.5.2/api/unicodeObjects.html)
Now I want to utf-16 so I am trying to use the first one, but it is
giving back NULL in case of PyObject is already Unicode type which is
expected. What puzzles me is that PyUnicode_FromObject(PyObject *obj)
is passing irrespective of type of PyObject. The API says it is
Shortcut for PyUnicode_FromEncodedObject(obj, NULL, "strict") but if I
use that, it returns NULL where as PyUnicode_FromObject works.

Is there any way by which I can take in any PyObject and convert it to
utf-16 object? Any help is appreciated.

Whether Unicode objects are utf-16 or utf=32 depends on your Python
build. You can always convert a byte string representation of an object
to unicode.
 
A

abhi

Whether Unicode objects are utf-16 or utf=32 depends on your Python
build.  You can always convert a byte string representation of an object
to unicode.- Hide quoted text -

- Show quoted text -

Hi,
I agree with you. I have a Python unicode object in C (I don't know
which utf) and I want to convert this explicitely to utf-16. Is there
any way to do this?
PyUnicode_FromEncodedObject(PyObject *obj, const char *encoding, const
char *errors) says that obj can't be a unicode type so I guess I can't
use this one, does anybody knows any other method by which I can
achieve my goal?

Thanks,
Abhigyan
 
S

Steve Holden

abhi said:
Hi,
I agree with you. I have a Python unicode object in C (I don't know
which utf) and I want to convert this explicitely to utf-16. Is there
any way to do this?
PyUnicode_FromEncodedObject(PyObject *obj, const char *encoding, const
char *errors) says that obj can't be a unicode type so I guess I can't
use this one, does anybody knows any other method by which I can
achieve my goal?
I suspect that a "Python Unicode object in C" will be using either UCS-2
or UCS-4 representation, depending on the options your interpreter was
built with. So whatever else it is, it won't be UTF-anything. Don't know
whether that helps or not.

regards
Steve
 
S

Stefan Behnel

abhi said:
Now I want to utf-16 so I am trying to use the first one, but it is
giving back NULL in case of PyObject is already Unicode type which is
expected. What puzzles me is that PyUnicode_FromObject(PyObject *obj)
is passing irrespective of type of PyObject. The API says it is
Shortcut for PyUnicode_FromEncodedObject(obj, NULL, "strict") but if I
use that, it returns NULL where as PyUnicode_FromObject works.

Is there any way by which I can take in any PyObject and convert it to
utf-16 object? Any help is appreciated.

Use PyUnicode_FromObject() to convert the (non-string) object to a unicode
object, then encode the unicode object as UTF-16 using the respecive
functions in the codecs API (see the bottom of the C-API docs page for the
unicode object).

Note, however, that you will not succeed to convert a byte string to the
corresponding unicode string using PyUnicode_FromObject(), except in the
simple case where the string is ASCII encoded. Doing this right requires
explicit decoding using a byte encoding that you must specify (again, see
the codecs API).

Stefan
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,990
Messages
2,570,211
Members
46,796
Latest member
SteveBreed

Latest Threads

Top