[RELEASED] Python 3.1 final

B

Benjamin Peterson

On behalf of the Python development team, I'm thrilled to announce the first
production release of Python 3.1.

Python 3.1 focuses on the stabilization and optimization of the features and
changes that Python 3.0 introduced. For example, the new I/O system has been
rewritten in C for speed. File system APIs that use unicode strings now handle
paths with undecodable bytes in them. Other features include an ordered
dictionary implementation, a condensed syntax for nested with statements, and
support for ttk Tile in Tkinter. For a more extensive list of changes in 3.1,
see http://doc.python.org/3.1/whatsnew/3.1.html or Misc/NEWS in the Python
distribution.

To download Python 3.1 visit:

http://www.python.org/download/releases/3.1/

The 3.1 documentation can be found at:

http://docs.python.org/3.1

Bugs can always be reported to:

http://bugs.python.org


Enjoy!
 
N

Nobody

Python 3.1 focuses on the stabilization and optimization of the features and
changes that Python 3.0 introduced. For example, the new I/O system has been
rewritten in C for speed. File system APIs that use unicode strings now
handle paths with undecodable bytes in them.

That's a significant improvement. It still decodes os.environ and sys.argv
before you have a chance to call sys.setfilesystemencoding(), but it
appears to be recoverable (with some effort; I can't find any way to re-do
the encoding without manually replacing the surrogates).

However, sys.std{in,out,err} are still created as text streams, and AFAICT
there's nothing you can do about this from within your code.

All in all, Python 3.x still has a long way to go before it will be
suitable for real-world use.
 
M

Martin v. Löwis

That's a significant improvement. It still decodes os.environ and sys.argv
before you have a chance to call sys.setfilesystemencoding(), but it
appears to be recoverable (with some effort; I can't find any way to re-do
the encoding without manually replacing the surrogates).

See PEP 383.
However, sys.std{in,out,err} are still created as text streams, and AFAICT
there's nothing you can do about this from within your code.

That's intentional, and not going to change. You can access the
underlying byte streams if you want to, as you could already in 3.0.

Regards,
Martin

P.S. Please identify yourself on this newsgroup.
 
P

Paul Moore

2009/6/28 "Martin v. Löwis said:
That's intentional, and not going to change. You can access the
underlying byte streams if you want to, as you could already in 3.0.

I had a quick look at the documentation, and couldn't see how to do
this. It's the first time I'd read the new IO module documentation, so
I probably missed something obvious. Could you explain how I get the
byte stream underlying sys.stdin? (That should give me enough to find
what I was misunderstanding in the docs).

Thanks,
Paul.
 
B

Benjamin Peterson

Nobody said:
Such as not trying to shoe-horn every byte string it encounters into
Unicode. Some of them really are *just* byte strings.


You're certainly allowed to convert them back to byte strings if you want.
 
T

Terry Reedy

Nobody said:
Such as not trying to shoe-horn every byte string it encounters into
Unicode. Some of them really are *just* byte strings.

Let's ignore the disinformation. So false it is hardly worth refuting.
 
B

Benjamin Peterson

Paul Moore said:
The "buffer" attribute doesn't seem to be documented in the docs for
the io module. I'm guessing that the TextIOBase class should have a
note that you get at the buffer through the "buffer" attribute?


Good point. I've now documented it, and the "raw" attribute of BufferedIOBase.
 
A

Aahz

You're certainly allowed to convert them back to byte strings if you want.

Yes, but do you get back the original byte strings? Maybe I'm missing
something, but my impression is that this is still an issue for the email
module as well as command-line arguments and environment variables.
 
B

Benjamin Peterson

Aahz said:
Yes, but do you get back the original byte strings? Maybe I'm missing
something, but my impression is that this is still an issue for the email
module as well as command-line arguments and environment variables.

The email module is, yes, broken. You can recover the bytestrings of
command-line arguments and environment variables.
 
N

Nobody

The email module is, yes, broken. You can recover the bytestrings of
command-line arguments and environment variables.

1. Does Python offer any assistance in doing so, or do you have to
manually convert the surrogates which are generated for unrecognised bytes?

2. How do you do this for non-invertible encodings (e.g. ISO-2022)?

Most of the issues can be worked around by calling
sys.setfilesystemencoding('iso-8859-1') at the start of the program, but
sys.argv and os.environ have already been converted by this point.
 
B

Benjamin Peterson

Nobody said:
1. Does Python offer any assistance in doing so, or do you have to
manually convert the surrogates which are generated for unrecognised bytes?

fs_encoding = sys.getfilesystemencoding()
bytes_argv = [arg.encode(fs_encoding, "surrogateescape") for arg in sys.argv]
2. How do you do this for non-invertible encodings (e.g. ISO-2022)?

What's a non-invertible encoding? I can't find a reference to the term.
 
H

Hallvard B Furuseth

Benjamin said:
Nobody said:
On Sun, 28 Jun 2009 19:21:49 +0000, Benjamin Peterson wrote:
1. Does Python offer any assistance in doing so, or do you have to
manually convert the surrogates which are generated for unrecognised bytes?

fs_encoding = sys.getfilesystemencoding()
bytes_argv = [arg.encode(fs_encoding, "surrogateescape") for arg in sys.argv]
2. How do you do this for non-invertible encodings (e.g. ISO-2022)?

What's a non-invertible encoding? I can't find a reference to the term.

Different ISO-2022 strings can map to the same Unicode string.
Thus you can convert back to _some_ ISO-2022 string, but it won't
necessarily match the original.
 
M

Martin v. Löwis

2. How do you do this for non-invertible encodings (e.g. ISO-2022)?

ISO-2022 cannot be used as a system encoding.

Please do read the responses I write, and please do identify yourself.

Regards,
Martin
 
G

Gerhard Häring

Scott said:
Fortunately, I have assiduously avoided the real word, and am happy to
embrace the world from our 'bot overlords.

Congratulations on another release from the hydra-like world of
multi-head development.

+1 QOTW

-- Gerhard
 
N

Nobody

1. Does Python offer any assistance in doing so, or do you have to
manually convert the surrogates which are generated for unrecognised bytes?

fs_encoding = sys.getfilesystemencoding()
bytes_argv = [arg.encode(fs_encoding, "surrogateescape") for arg in sys.argv]

This results in an internal error:
"\udce4\udceb\udcef\udcf6\udcfc".encode("iso-8859-1", "surrogateescape")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
SystemError: Objects/bytesobject.c:3182: bad argument to internal function

[FWIW, the error corresponds to _PyBytes_Resize, which has a
cautionary comment almost as large as the code.]

The documentation gives the impression that "surrogateescape" is only
meaningful for decoding.
What's a non-invertible encoding? I can't find a reference to the term.

One where different inputs can produce the same output.
 
N

Nobody

See PEP 383.

Okay, that's useful, except that it may have some bugs:
r = "\udce4\udceb\udcef\udcf6\udcfc".encode("iso-8859-1", "surrogateescape")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
SystemError: Objects/bytesobject.c:3182: bad argument to internal function

Trying a few random test cases suggests that the ratio of valid to invalid
bytes has an effect. Strings which consist mostly of invalid bytes trigger
the error, those which are mostly valid don't.

The error corresponds to _PyBytes_Resize(), which has the following
words of caution in a preceding comment:

/* The following function breaks the notion that strings are immutable:
it changes the size of a string. We get away with this only if there
is only one module referencing the object. You can also think of it
as creating a new string object and destroying the old one, only
more efficiently. In any case, don't use this if the string may
already be known to some other part of the code...
Note that if there's not enough memory to resize the string, the original
string object at *pv is deallocated, *pv is set to NULL, an "out of
memory" exception is set, and -1 is returned. Else (on success) 0 is
returned, and the value in *pv may or may not be the same as on input.
As always, an extra byte is allocated for a trailing \0 byte (newsize
does *not* include that), and a trailing \0 byte is stored.
*/

Assuming that this gets fixed, it should make most of the problems with
3.0 solvable. OTOH, it wouldn't have killed them to have added e.g.
sys.argv_bytes and os.environ_bytes.
That's intentional, and not going to change. You can access the
underlying byte streams if you want to, as you could already in 3.0.

Okay, I've since been pointed to the relevant information (I was looking
under "File Objects"; I didn't think to look at "sys").
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,955
Messages
2,570,117
Members
46,705
Latest member
v_darius

Latest Threads

Top