python 3's adoption

P

Paul Rubin

Jonathan Gardner said:
You're referring to the O(N**2) bit, right? I am sure he knew it was O
(N*log(N)), which is still worse than O(N) for key.

It's O(n log n) for both key and cmp. key is usually more convenient
and usually gives a constant-factor speedup (DSU pattern), but it uses
more memory and there are some types of orderings that it can't handle
without a bunch of contortion.
 
C

Carl Banks

You're referring to the O(N**2) bit, right? I am sure he knew it was O
(N*log(N)), which is still worse than O(N) for key.

If he didn't, well, Python has some fundamental flaws in its basic
sort algorithm.

Quicksort is O(N**2) worst case, perhaps that's what he meant. I'm
not sure about timsort but since it's based on quicksort I'd guess it
is O(N**2) or worse, worst case, as well.

Typical case of a sort is still O(N*log(N)) whether using key or cmp.
People recommended against cmp not because of overall algorithmic time
considerations, but because of the number of function calls. cmp
function was called a minimum of N-1 times and typically around N*log2
(N) times, whereas key was called exactly N times. Since the overhead
of a Python function call was very large compared to the built-in cmp
operation, it usually outweighed the algorithmic time considerations.


Carl Banks
 
S

Steven D'Aprano

The main problem with the incompatibility is for porting code, not for
writing code from scratch.

Correct. It's a trivial problem, but still a problem.
It's also a problem wrt. learning the language.

This makes no sense. Why is it harder to learn

print(x, y, z)

than this?

print x, y, z

The first case is like the other functions you have to learn, like len().
In fact, many newbies to Python put unneeded parentheses around arguments
to print simply because they assume it is a function.

I would argue that the new print function is *simpler* to learn. It is
more consistent with other built-ins, and has fewer magic idioms to
learn. Instead of:

print >>fileObj, x, y, z

you use regular function syntax with a meaningful keyword:

print(x, y, z, file=fileObj)

If you want suppress the newline at the end of each print:

print x, y, z, # note the final comma

compared to:

print(x, y, z, end='')

If you want to change the space between elements, instead of:

sys.stdout.write(str(x) + "*" + str(y) + "*" + str(z) + '\n')

you use:

print(x, y, z, sep='*')


If you want to override the behaviour of print in a module, instead of
having to edit the source code of the module (which might not even be
available), all you need to do is monkey-patch it:

import module
module.print = myprint
 
S

Steven D'Aprano

Why not? GCC lets you use any statement in an expression:

I stand corrected. Whether it is a good idea or not is another story.


What about assert, import, and pass?

Both assert and pass are special and need to be treated specially by the
compiler. Arguably we could replace pass with None, but pass makes the
intent more obvious.

Although assert could be a function, it is treated specially by the
compiler and so making it a statement emphases that it is special.

The regular form of import could easily be a function returning the
module object. After all, that's exactly what __import__ is! But using
__import__ for anything but the simplest cases is quite awkward, hence we
have syntactic sugar in the form of the import statement.

You also missed del as another statement-based command.

All the current statements have good strong reasons for being statements.
print on the other hand lacks a strong reason for being a statement, and
the only good argument against changing is to avoid an incompatible
change. But since it is not a gratuitous incompatible change, the long
term benefit out-weighs the short-term pain, in my opinion.
 
S

Steven D'Aprano

You're referring to the O(N**2) bit, right? I am sure he knew it was O
(N*log(N)), which is still worse than O(N) for key.

If he didn't, well, Python has some fundamental flaws in its basic sort
algorithm.


I may have the details wrong, but the performance of sort with cmp is
very poor.

sort with key calls the key function once for each element in the list,
which is O(N), and then sorts O(N*log N), giving a total of O(N*log N).

sort with cmp has to compare the comparison function multiple times for
each element. I *thought* it ended up calling it N times each, giving
O(N**2), but I may be delusional. Whatever it is though, it's quite slow.
Here's my test code (for Python 2.5):



from timeit import Timer
from string import letters
from random import shuffle

def mycmp(s, t):
return cmp(s.upper(), t.upper())

def randstr():
L = list(letters)
shuffle(L)
return ''.join(L)

setup = "from __main__ import mycmp, data"

N = 300
data = [randstr() for _ in xrange(N)]
t1 = Timer("d = data[:]; d.sort()", setup) # raw sort
t2 = Timer("d = data[:]; d.sort(cmp=mycmp)", setup)
t3 = Timer("d = data[:]; d.sort(key=str.upper)", setup)

for t in (t1, t2, t3):
print min(t.repeat(number=500))

N *= 10 # Do it again with bigger N.
data = [randstr() for _ in xrange(N)]
for t in (t1, t2, t3):
print min(t.repeat(number=500))



And here are the results on my PC:

# N = 300
0.0631489753723
2.36756515503
0.226485967636

# N = 3000
1.03457903862
35.3092911243
2.77242517471



This doesn't support my claim of O(N**2), but it does demonstrate that
sort with cmp should be avoided if at all possible.
 
S

Steven D'Aprano

Quicksort is O(N**2) worst case, perhaps that's what he meant.

No, apparently I was delusional.

I'm not
sure about timsort but since it's based on quicksort I'd guess it is
O(N**2) or worse, worst case, as well.

timsort is actually a mergesort, not a quicksort.

You may like to read this:

http://svn.python.org/projects/python/trunk/Objects/listsort.txt

Typical case of a sort is still O(N*log(N)) whether using key or cmp.
People recommended against cmp not because of overall algorithmic time
considerations, but because of the number of function calls. cmp
function was called a minimum of N-1 times and typically around N*log2
(N) times, whereas key was called exactly N times. Since the overhead
of a Python function call was very large compared to the built-in cmp
operation, it usually outweighed the algorithmic time considerations.

Yes, that sounds more like it. Thanks for the corrections.
 
G

Gib Bogle

Luis said:
"Ad hominem"?
Please, operor non utor lingua non notus per vulgaris populus.
Gratias ago vos...

"ad hominem" is "lingua notus per vulgaris populus", the vulgar pop of these
parts, anyway.
 
J

Jonathan Gardner

What about assert and pass?

If you're going to have statements, you're going to need the null
statement. That's "pass". It could be renamed "null_statement" but
"pass" is a better description. "None" and "pass" are cousins of
sorts, since "None" is the null expression and "pass" the null
statement.

I agree on "assert". I don't like running a program in test mode and
then running it in production mode with different code. I would rather
test what I am going to actually run. "assert" should be a function,
and support for removing assert statements should be eliminated. I
simply don't use assert statements at all.
 
P

Paul Rubin

Jonathan Gardner said:
If you're going to have statements, you're going to need the null
statement. That's "pass".

Why? Expressions are statements, so you could just say "pass" (in
quotes, denoting a string literal), or 0, or None, os anything else like
that, instead of having a special statement.
 
A

Alf P. Steinbach

* Steven D'Aprano:
Correct. It's a trivial problem, but still a problem.


This makes no sense. Why is it harder to learn

print(x, y, z)

than this?

print x, y, z

I think it's actually easier to learn just the 3.x form.

But it's more difficult to learn that there are /two different/ syntactic forms,
depending on which version of Python you're using and in addition depending on
__future__ declarations!

E.g., picking up some example code from the web, and it does not work...

The first case is like the other functions you have to learn, like len().
In fact, many newbies to Python put unneeded parentheses around arguments
to print simply because they assume it is a function.

I would argue that the new print function is *simpler* to learn. It is
more consistent with other built-ins, and has fewer magic idioms to
learn.

Yes, yes, yes, I agree.

Instead of:

print >>fileObj, x, y, z

you use regular function syntax with a meaningful keyword:

print(x, y, z, file=fileObj)

If you want suppress the newline at the end of each print:

print x, y, z, # note the final comma

compared to:

print(x, y, z, end='')

Actually I thought the final comma thing was nice. It was like Basic. I think
the 2.x 'print' must have been modeled on Basic's 'print'.

If you want to change the space between elements, instead of:

sys.stdout.write(str(x) + "*" + str(y) + "*" + str(z) + '\n')

you use:

print(x, y, z, sep='*')


If you want to override the behaviour of print in a module, instead of
having to edit the source code of the module (which might not even be
available), all you need to do is monkey-patch it:

import module
module.print = myprint
Traceback (most recent call last):


Cheers,

- Alf
 
L

Lie Ryan

Why? Expressions are statements, so you could just say "pass" (in
quotes, denoting a string literal), or 0, or None, os anything else like
that, instead of having a special statement.

or, if the null statement "pass" is removed, you could define:
pass = None
or
pass = object() # sentinel

and have essentially the same thing.

hmm... on second thought, not special-casing pass means doing a
LOAD_GLOBAL or LOAD_CONST for operation that is supposed to be doing
nothing.
 
L

Lie Ryan

* Steven D'Aprano:

Actually I thought the final comma thing was nice. It was like Basic. I
think the 2.x 'print' must have been modeled on Basic's 'print'.

if that was true, then python missed the final semicolon
Traceback (most recent call last):

Monkey patching follows (or should follow) the same rule as class
inheritance overriding: the overrider's input domain must be a superset
of the overriden's input domain and the overrider's output range must be
a subset of the overriden's output range. 666 object (int) is not even
remotely compatible with function object.
 
A

Alf P. Steinbach

* Lie Ryan:
Monkey patching follows (or should follow) the same rule as class
inheritance overriding: the overrider's input domain must be a superset
of the overriden's input domain and the overrider's output range must be
a subset of the overriden's output range. 666 object (int) is not even
remotely compatible with function object.

Yes, that's my point.

A 'print' replacement should ideally provide all the guarantees on behavior that
standard 'print' does.

And with that it's not so easy to put in a /correct/ replacement of 'print'; in
particular, it has to honor the 'file' keyword argument.

Thus the 3.x design makes it easy to replace 'print' incorrectly.

I'd rather still had 'print' as keyword... ;-)


Cheers,

- Alf
 
T

Terry Reedy

Why? Expressions are statements, so you could just say "pass" (in
quotes, denoting a string literal), or 0, or None, os anything else like
that, instead of having a special statement.

As Python is currently compiled, you are right, pass is not needed.
A string becomes the doc attribute, and becomes local var 0, but 0 is
just ignored. I actually expected a load_const but that is now optimized
away. I am not sure this was always true. Perhaps 'pass' is easier than
'0' for mewcomers reading the tutorial, but I have no data.
1 0 LOAD_CONST 1 (None)
3 RETURN_VALUE 1 0 LOAD_CONST 0 (None)
3 RETURN_VALUE 1 0 LOAD_CONST 0 (None)
3 RETURN_VALUE
Terry Jan Reedy
 
S

Steven D'Aprano

As Python is currently compiled, you are right, pass is not needed. A
string becomes the doc attribute, and becomes local var 0, but 0 is just
ignored. I actually expected a load_const but that is now optimized
away. I am not sure this was always true. Perhaps 'pass' is easier than
'0' for mewcomers reading the tutorial, but I have no data.


As I said earlier, a dedicated statement `pass` indicates the intent of
the author better than some arbitrary constant.

if flag:
0
else:
do_something()


could be a mistake, perhaps the programmer intended to write "x = 0", but
there is no ambiguity with the pass statement. I would hope that
PyChecker and the equivalent would flag the use of bare constants in code
and give a warning.

In any case, while such a idiom works in code, it plays havoc with the
interactive interpreter:
.... "pass"
....
'pass'
'pass'
'pass'
'pass'
'pass'

Traceback (most recent call last):
File "<stdin>", line 2, in <module>
KeyboardInterrupt


I have trimmed the output for convenience -- in reality, it fills my
terminal's output buffer faster than I can hit ctrl-C. Using a constant
as a proxy for the pass statement would get obnoxious really fast.
 
P

Paul Boddie

So, for practical reasons, i think a “key” parameter is fine. But
chopping off “cmp” is damaging. When your data structure is complex,
its order is not embedded in some “key”. Taking out “cmp” makes it
impossible to sort your data structure.

What would annoy me if I used Python 3.x would be the apparent lack of
the __cmp__ method for conveniently defining comparisons between
instances of my own classes. Having to define all the rich comparison
methods frequently isn't even as much fun as it sounds.

Paul
 
S

Steven D'Aprano

* Lie Ryan:

Yes, that's my point.

A 'print' replacement should ideally provide all the guarantees on
behavior that standard 'print' does.

And with that it's not so easy to put in a /correct/ replacement of
'print'; in particular, it has to honor the 'file' keyword argument.

Oh come on, that's easy. This is Python we're talking about. Any object
with a write() method is valid.

.... def __init__(self):
.... self.buffer = []
.... def write(self, *args):
.... self.buffer.extend(args)
....['Say goodnight Gracie', ' ', 'Goodnight Gracie!', '\n']


In many cases, probably most, you won't be writing your own full-blown
print replacement, but merely decorating the existing function. How easy
is that? Trivial. Here's how to shut down a noisy module that prints too
much:
.... def inner(*args, **kwargs):
.... out = kwargs.get('file')
.... if out in (None, sys.stdout):
.... pass
.... else:
.... func(*args, **kwargs)
.... return inner
....['Say goodnight Gracie', ' ', 'Goodnight Gracie!', '\n', 'spam spam
spam', '\n']

(except of course you would monkey-patch the module).

Because print is a built-in, getting the original version back is even
easier:
'spam spam spam'


Thus the 3.x design makes it easy to replace 'print' incorrectly.

This is Python -- the compiler won't stop you from shooting yourself in
the foot.
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: 'int' object is not callable

I'd rather still had 'print' as keyword... ;-)

We've shown many benefits of print as a function. Can you give any
benefits of it being a statement, other than backward compatibility for
those who learned it from Python 2.x?

If you were designing your own language from scratch, so that backwards
compatibility wasn't an issue, why would you make print a statement?
 
S

Steven D'Aprano

I agree on "assert". I don't like running a program in test mode and
then running it in production mode with different code. I would rather
test what I am going to actually run. "assert" should be a function, and
support for removing assert statements should be eliminated.

But then it wouldn't be an assert, it would be a test, and tests are best
written explicitly.

The whole point of assertions (outside of test code) is that they should
ALWAYS pass. Since they should always pass, they're safe to optimize
away, if the user explicitly runs Python with the "optimize" runtime
switch. This is under the end-user's control, not the application writer.
If you don't trust optimized code, don't use it.

Assertions should be treated as "this will never happen" code, and are
only there to catch coding errors and logic mistakes. If you expect that
a test could fail, it shouldn't be written as an assert but as an
explicit test.
I simply don't use assert statements at all.

Outside of explicit test code (e.g. unit tests or equivalent), asserts
should be rare. If you never use them at all, it probably means you're
either not programming defensively enough, or you're needlessly writing
explicit tests to cover situations that can't happen. assert exists to
cover the middle ground between those two extremes.
 
P

Paul Rubin

Steven D'Aprano said:
If you were designing your own language from scratch, so that backwards
compatibility wasn't an issue, why would you make print a statement?

As another real estate analogy, my apartment has some problems with its
plumbing, plus an ugly spot on the kitchen wall that could use a coat of
paint to fix. The plumbing problem is fairly serious (they keep
shutting off the water in the building when it acts up) but the kitchen
spot is a minor annoyance that is mostly hidden by the refrigerator
anyway. Fixing either of those things would be a hassle: I'd have to
cover all my stuff with plastic and stay out of the apartment for a day
or so while the maintenance guys hacked on things, and I'd expect some
of my stuff to get damaged inadvertently no matter how careful everyone
was.

It's worth dealing with the repair hassles to get the plumbing fixed,
and if I'm going to have to deal with that disruption anyway, then sure,
I'd like them to paint the kitchen spot at the same time. But if they
say they want me to cover up all my stuff and leave so they can JUST fix
the kitchen spot, and then do the same thing a second time so they can
fix the plumbing at some unspecified date in the future, then I'd rather
just live with the kitchen spot the way it is. Yes it's a cosmetic
blemish, but it's not causing any real problems, and I'd rather not
deal with the hassle and risk of fixing it if there's no other benefit.

In Python terms, the print statement is the spot on the wall, while the
plumbing is something like the GIL and the legacy codebase that would
break in a hundred ways if Python had real parallelism and a tracing
garbage collector and a native-code compiler and the various language
changes it would take to make all that stuff really fly rather than just
limp along. If they are going to make everyone deal with the disruption
of migrating to an incompatible version, they should do it once rather
than twice.

In short, there's a mythical Python 4 that only exists in my
imagination, but it already interests me a lot more than Python 3.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,176
Messages
2,570,950
Members
47,503
Latest member
supremedee

Latest Threads

Top