Codecs

Ivan Van Laningham · Jul 9, 2005

Hi All--
As far as I can tell, after looking only at the documentation (and not
searching peps etc.), you cannot query the codecs to give you a list of
registered codecs, or a list of possible codecs it could retrieve for
you if you knew enough to ask for them by name.

Why not? It seems to me that if I want to try to read an unknown file
using an exhaustive list of possible encodings, the best place to keep
the most current list is the codec registry itself, not in the
documentation for the codec module.

Metta,
Ivan
----------------------------------------------
Ivan Van Laningham
God N Locomotive Works
http://www.andi-holmes.com/
http://www.foretec.com/python/workshops/1998-11/proceedings.html
Army Signal Corps: Cu Chi, Class of '70
Author: Teach Yourself Python in 24 Hours

=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?= · Jul 10, 2005

Ivan said:
Hi All--
As far as I can tell, after looking only at the documentation (and not
searching peps etc.), you cannot query the codecs to give you a list of
registered codecs, or a list of possible codecs it could retrieve for
you if you knew enough to ask for them by name.

Why not?

There are several answers to that question. Which of them is true,
I don't know. In order of likelyhood:
1. When the API was designed, that functionality was forgotten.
It was not possible to add it later on (because of 2)
2. Registration builds on the notion of lookup functions. The
lookup function gets a codec name, and either succeeds in
finding the codec, or raises an exception.
Now, a lookup function, in principle, might not "know" in
advance what codecs it supports, or the number of encoding
it supports might not be finite. So asking such a lookup
function for the complete list of codecs might not be
implementable.

As an example of a lookup function that doesn't know what
encodings it supports, look at my iconv module. This module
provides all codecs that iconv_open(3) supports, yet there
is no standard way to query the iconv library in advance
for a list of all supported codecs.

As an example for a lookup function that supports an infinite
number of codecs, consider the (theoretical) encrypt/password
encoding, which encrypts a string with a password, and the
password is part of the codec name. Each password defines
a new encoding, and there is an infinite number of them.

Now, if 1) would have been considered, it might have been possible
to design the API in a way that didn't support all cases that
the current API supports. Alas, somebody must have misplaced
the time machine.

Regards,
Martin

=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?= · Jul 10, 2005

Ivan said:
Hi All--
As far as I can tell, after looking only at the documentation (and not
searching peps etc.), you cannot query the codecs to give you a list of
registered codecs, or a list of possible codecs it could retrieve for
you if you knew enough to ask for them by name.

Why not?

There are several answers to that question. Which of them is true,
I don't know. In order of likelyhood:
1. When the API was designed, that functionality was forgotten.
It was not possible to add it later on (because of 2)
2. Registration builds on the notion of lookup functions. The
lookup function gets a codec name, and either succeeds in
finding the codec, or raises an exception.
Now, a lookup function, in principle, might not "know" in
advance what codecs it supports, or the number of encoding
it supports might not be finite. So asking such a lookup
function for the complete list of codecs might not be
implementable.

As an example of a lookup function that doesn't know what
encodings it supports, look at my iconv module. This module
provides all codecs that iconv_open(3) supports, yet there
is no standard way to query the iconv library in advance
for a list of all supported codecs.

As an example for a lookup function that supports an infinite
number of codecs, consider the (theoretical) encrypt/password
encoding, which encrypts a string with a password, and the
password is part of the codec name. Each password defines
a new encoding, and there is an infinite number of them.

Now, if 1) would have been considered, it might have been possible
to design the API in a way that didn't support all cases that
the current API supports. Alas, somebody must have misplaced
the time machine.

Regards,
Martin

John Machin · Jul 11, 2005

Ivan said:
It seems to me that if I want to try to read an unknown file
using an exhaustive list of possible encodings ...

Supposing such a list existed:

What do you mean by "unknown file"? That the encoding is unknown?

Possibility 1:
You are going to try to decode the file from "legacy" to Unicode --
until the first 'success' (defined how?)? But the file could be decoded
by *several* codecs into Unicode without an exception being raised. Just
a simple example: the encodings ['iso-8859-' + x for x in '12459']
define *all* possible 256 characters.

There are various language-guessing algorithms based on e.g. frequency
of ngrams ... try Google.

Possibility 2:
You "know" the file is in a Unicode-encoding e.g. utf-8, have
successfully decoded it to Unicode, and are going to try to encode the
file in a "legacy" encoding but you don't know which one is appropriate?
Sorry, same "But".

PySol not working on WinXP, SP2	2	Jun 1, 2005
Slight discrepancy with filecmp.cmp	3	Apr 18, 2005
Problem with Tkinter scrollbar callback	6	Jan 24, 2008
Codec lookup fails for bad codec name, blowing up BeautifulSoup	3	Nov 9, 2007
python-dev Summary for 2006-02-16 through 2006-02-28	1	Apr 29, 2006
python-dev Summary for 2006-08-01 through 2006-08-15	0	Oct 9, 2006
python-dev Summary for 2003-08-16 through 2003-08-31	0	Sep 13, 2003
comp.lang.vhdl FAQ part 3 of 4: products & services	0	Jul 8, 2003

Codecs

Ivan Van Laningham

=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=

=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=

John Machin

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads