Language design

Chris Angelico · Sep 12, 2013

3. The default return value of methods is None instead of self.
If it was self, it would be possible to chain method calls (which
is called a cascade in smalltalk).

lst = []
lst.append(1).append(2).append(3) ## FAIL

Click to expand...

Click to expand...

Traceback (most recent call last):
...
AttributeError: 'NoneType' object has no attribute 'append'

That's a policy decision: a method (or function) will *EITHER* return
a value, *OR* mutate its primary argument (in the case of a method,
that's self). It reduces the chances of code like this:

foo = [1, 4, 2, 8, 5, 7]
largest_digit = foo.sort()[-1]
one_seventh = ''.join(map(str,foo))

If you used sorted(foo) instead of foo.sort(), this wouldn't crash
out, and it'd do what you expect. Having foo.sort() return self would
mean this wouldn't crash, but would potentially do something
surprising.

But while I understand the reasoning behind it, I don't entirely
agree. There are times when I use Pike's sort() function [1], which
does return its mutated argument, something like this:

foo = sort(blah_blah_blah())

Why should that be split into two statements? Or alternatively, why
should an extra copy of the list be created (if you use Python's
sorted() here)? But for the new programmer, this is a convenient
safety-net, and if list.sort() worked the other way, it'd be just as
much a gotcha ("I ask for a sorted list, and it also changed the
original?!??").

ChrisA

[1] http://pike.lysator.liu.se/generated/manual/modref/ex/predef_3A_3A/sort.html

Steven D'Aprano · Sep 12, 2013

1) It tried to make Object the parent of every class.

Tried, and succeeded.

No one's close enough to God to make that work.

Non-sequitor. One doesn't need to be close to a deity to have a single
root of the object hierarchy.

2) It didn't make dicts inherit from sets when they were added to
Python.

Why would you want dicts to inherit from sets?

3) It used the set literal for dict, so that there's no obvious
way to do it. This didn't get changed in Py3k.

No, it uses the dict literal for dicts.

And the obvious way to form an empty set is by calling set(), the same as
str(), int(), list(), float(), tuple(), dict(), ...

4?) It allowed
[reference] variables to be used as dict keys. This creates a parsing
difficulty for me, mentally. Keys should be direct, hashable values,
not hidden in a variable name.

I don't even understand what you are talking about here. "[reference]
variables"? What does that mean?

Dict keys are direct, hashable values, and I have no idea what you mean
by "hidden in a variable name".

Steven D'Aprano · Sep 12, 2013

2. Reduce removed from standard library. That is a big fail, in my
opinion.

And Guido's Time Machine strikes again!

py> from functools import reduce
py> reduce
<built-in function reduce>

[...]

4. As has been mentioned already, some built-in functions do magic
stuff behind the scenes:

() == []

Click to expand...

False

But:

bool(().__eq__([]))

Click to expand...

Click to expand...

True

Because:

().__eq__([])

Click to expand...

Click to expand...

NotImplemented

which yields True when cast to boolean. """

I don't see that this is a gotcha, let alone a design mistake. There's no
reason to be calling __eq__ directly instead of == but if you do, you're
responsible for handling the operator protocol yourself. Namely, if the
operator special method returns NotImplemented, you're supposed to
reverse the operands and try again.

Terry Reedy · Sep 12, 2013

Er? That doesn't address the task of importing a module from a source
code file given its path on the filesystem.

Other languages have the equivalent of â€˜include "/path/to/file.py"â€™,

Some includes are equivalent to

with open("/path/to/file.py") as f:
exec(f.read())

but Python doesn't.

which Python does have.

Python also has __import__("/path/to/file.py"), which is used by import
when the module does not exist.

Mark Janssen · Sep 12, 2013

No, that's inaccurate. A sequence of bytes is binary data. Unicode is
not binary data.

Well now, this is an area that is not actually well-defined. I would
say 16-bit Unicode is binary data if you're encoding in base 65,536,
just as 8-bit ascii is binary data if you're encoding in base-256.
Which is to say: there is no intervening data to suggest a TYPE.

Chris Angelico · Sep 12, 2013

Well now, this is an area that is not actually well-defined. I would
say 16-bit Unicode is binary data if you're encoding in base 65,536,
just as 8-bit ascii is binary data if you're encoding in base-256.
Which is to say: there is no intervening data to suggest a TYPE.

Unicode is not 16-bit any more than ASCII is 8-bit. And you used the
word "encod[e]", which is the standard way to turn Unicode into bytes
anyway. No, a Unicode string is a series of codepoints - it's most
similar to a list of ints than to a stream of bytes.

ChrisA

Mark Janssen · Sep 12, 2013

Why is this so difficult?

Er? That doesn't address the task of importing a module from a source
code file given its path on the filesystem.

That's true, I guess was hooked on Python's abstraction mechanism for
making the file system invisible. But I like the idea of programming
*relative* path addressing, so you can create a sort of "name space"
for your modules. So instead of "import /path/to/file.py" which makes
a system dependency (i.e. *yours*), you could have "import
TestPackage.collections.bag" (using periods for file path separators
in keeping with the Pythonic Way).

Mark Janssen · Sep 12, 2013

Unicode is not 16-bit any more than ASCII is 8-bit. And you used the

word "encod[e]", which is the standard way to turn Unicode into bytes
anyway. No, a Unicode string is a series of codepoints - it's most
similar to a list of ints than to a stream of bytes.

Okay, now you're in blah, blah land.

--mark

Chris Angelico · Sep 12, 2013

Unicode is not 16-bit any more than ASCII is 8-bit. And you used the
word "encod[e]", which is the standard way to turn Unicode into bytes
anyway. No, a Unicode string is a series of codepoints - it's most
similar to a list of ints than to a stream of bytes.

Click to expand...

Okay, now you're in blah, blah land.

Eh?

Apart from the grammatical oddity (artifact of editing - should be
"more similar" not "most similar"), I don't see anything wrong in what
I said there.

ChrisA

Mark Janssen · Sep 12, 2013

1) It tried to make Object the parent of every class.

Tried, and succeeded.

Really? Are you saying you (and the community at-large) always derive
from Object as your base class?

Non-sequitor. One doesn't need to be close to a deity to have a single
root of the object hierarchy.

But wait is it the "base" (at the bottom of the hierarchy) or is it
the "parent" at the top? You see, you, like everyone else has been
using these terms loosely, confusing yourself.

Why would you want dicts to inherit from sets?

A dict is-a set of {key

bject, key

bject} pairs bound together with
a colon ":". By inheriting from sets you get a lot of useful
functionality for free. That you don't know how you could use that
functionality is a failure of your imagination, not of the general
idea.

No, it uses the dict literal for dicts.

Right. The dict literal should be {:} -- the one obvious way to do
it. Pay me later.

And the obvious way to form an empty set is by calling set(), the same as
str(), int(), list(), float(), tuple(), dict(), ...

Blah, blah. Let me know when you got everyone migrated over to Python.v3.

4?) It allowed
[reference] variables to be used as dict keys. This creates a parsing
difficulty for me, mentally. Keys should be direct, hashable values,
not hidden in a variable name.

Click to expand...

I don't even understand what you are talking about here. "[reference]
variables"? What does that mean?

It's a just a tricky point, that I will wait to comment on.

--mark

Benjamin Kaplan · Sep 12, 2013

Unicode is not 16-bit any more than ASCII is 8-bit. And you used the
word "encod[e]", which is the standard way to turn Unicode into bytes
anyway. No, a Unicode string is a series of codepoints - it's most
similar to a list of ints than to a stream of bytes.

Click to expand...

Okay, now you're in blah, blah land.

--mark
--

There's no such thing as 16-bit Unicode. Unicode is a sequence of
characters, not a sequence of bytes. It's an abstract thing. To work
with it on a computer, you need to use a byte encoding because
computers don't deal with with abstract things. UTF-16 is one encoding
method that can map any character defined in Unicode to a sequence of
bytes. UTF-16 isn't Unicode, it's just a function that maps a byte
string to a character string. Python's unicode class is a character
string- as far as the user is concerned, it's made up of those
abstract "character" things and not bytes at all.

Terry Reedy · Sep 12, 2013

Really? Are you saying you (and the community at-large) always derive
from Object as your base class?

The name is 'object', and yes, everyone does it because it is automatic.
(I am including indirect inheritance, and excluding weird metaclass games.)
['__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__',
'__format__', '__ge__', '__getattribute__', '__gt__', '__hash__',
'__init__', '__le__', '__lt__', '__module__', '__ne__', '__new__',
'__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__',
'__str__', '__subclasshook__', '__weakref__']

Just *where* do you think all those methods come from.

( said:
But wait is it the "base" (at the bottom of the hierarchy) or is it
the "parent" at the top?

This sort of quibbling should be beneath you.

A dict is-a set of {keybject, keybject} pairs bound together with
a colon ":".

Yes... but there is a very important additional condition: each key
appears only once.

Humans are primates, but that is not a sufficient characterization.

By inheriting from sets you get a lot of useful
functionality for free.

Actually, you get a lot of un-useful functionality for free. Because of
the extra condition, the rule for adding a key

bject pair to a dict is
different from the rule for adding a key

bject pair to a set of such
pairs. The set-union of two dicts is not necessarily a dict. To put is
another way, dicts as set subclasses would violate the Liskov
Substitution Principle.

'Homogenous' sets (of strings, numbers) would be proper subclasses of set.

Right. The dict literal should be {:}

and the set literal 'should' be {}, and would be if Python were
redesigned from scratch. Is your imagination so stunted that you
actually think we did not discuss that when designing Python 3?
We did, but Guido rejected switching because he thought it would cause
too much pain and discourage adoption of Python 3 even more than the
other code-breaking changes that were made.

Chris Angelico · Sep 12, 2013

Really? Are you saying you (and the community at-large) always derive
from Object as your base class?

Uhh, yep? It kinda happens automatically for me:

Python 3.3.0 (v3.3.0:bd8afb90ebf2, Sep 29 2012, 10:55:48) [MSC v.1600
32 bit (Intel)] on win32(<class 'object'>,)

Yeah, I think I'm always deriving from object. Also, if ever I write
code that has to run also on 2.x, I'll do that explicitly, to be sure
it works the same way.

ChrisA

Steven D'Aprano · Sep 12, 2013

On Thu, Sep 12, 2013 at 10:25 AM, Mark Janssen

Well now, this is an area that is not actually well-defined. I would
say 16-bit Unicode is binary data if you're encoding in base 65,536,
just as 8-bit ascii is binary data if you're encoding in base-256.
Which is to say: there is no intervening data to suggest a TYPE.

Click to expand...

Unicode is not 16-bit any more than ASCII is 8-bit. And you used the
word "encod[e]", which is the standard way to turn Unicode into bytes
anyway. No, a Unicode string is a series of codepoints - it's most
similar to a list of ints than to a stream of bytes.

And not necessarily ints, for that matter.

Let's be clear: the most obvious, simple, hardware-efficient way to
implement a Unicode string holding arbitrary characters is as an array of
32-bit signed integers restricted to the range 0x0 - 0x10FFFF. That gives
you a one-to-one mapping of int <-> code point.

But it's not the only way. One could implement Unicode strings using any
similar one-to-one mapping. Taking a leaf out of the lambda calculus, I
might implement each code point like this:

NULL pointer <=> Code point 0
^NULL <=> Code point 1
^^NULL <=> Code point 2
^^^NULL <=> Code point 3

and so on, where ^ means "pointer to".

Obviously this is mathematically neat, but practically impractical. Code
point U+10FFFF would require a chain of 1114111 pointer-to-pointer-to-
pointer before the NULL. But it would work. Or alternatively, I might
choose to use floats, mapping (say) 0.25 <=> U+0376. Or whatever.

What we can say, though, is that to represent the full Unicode charset
requires 21 bits per code-point, although you can get away with fewer
bits if you have some out-of-band mechanism for recognising restricted
subsets of the charset. (E.g. you could use just 7 bits if you only
handled the characters in ASCII, or just 3 bits if you only cared about
decimal digits.) In practice, computers tend to be much faster when
working with multiples of 8 bits, so we use 32 bits instead of 21. In
that sense, Unicode is a 32 bit character set.

But Unicode is absolutely not a 16 bit character set.

And of course you can use *more* bits than 21, or 32. If you had a
computer where the native word-size was (say) 50 bits, it would make
sense to use 50 bits per character.

As for the question of "binary data versus text", well, that's a thorny
one, because really *everything* in a computer is binary data, since it's
stored using bits. But we can choose to *interpret* some binary data as
text, just as we interpret some binary data as pictures, sound files,
video, Powerpoint presentations, and so forth. A reasonable way of
defining a text file might be:

If you decode the bytes making up an alleged text file into
code-points, using the correct encoding (which needs to be
known a priori, or stored out of band somehow), then provided
that none of the code-points have Unicode General Category Cc,
Cf, Cs, Co or Cn (control, format, surrogate, private-use,
non-character/reserved), you can claim that it is at least
plausible that the file contains text.

Whether that text is meaningful is another story.

You might wish to allow Cf and possibly even Co (format and private-use),
depending on the application.

Roy Smith · Sep 12, 2013

Steven D'Aprano said:
just 3 bits if you only cared about decimal digits.

That's a neat trick.

Chris Angelico · Sep 12, 2013

That's a neat trick.

It is! It's one of the fancy things we can do in the Land Downunder.
By the time we've dodged spiders, snakes, and Drop Bears, squeezing
ten options into three bits is easy!

Of course, we always keep a fourth bit lying around for when the
tourists come through. 33% extra profit when we sell them the
unnecessary spare bit.

ChrisA

Joshua Landau · Sep 12, 2013

What design mistakes, traps or gotchas do you think Python has?

Click to expand...

My favourite gotcha is this:

elt, = elts

It's a nice and compact way to do both:

assert len(elts) == 0
elt = elts[0]

but it sure looks strange at first sight. As a bonus, it works on any
iterable, not just ones that support __getitem__.

I very much enjoy the "[elt] = elts" spelling, although I don't get
how this is a "gotcha". It's just a semi-obscure usage of unpacking.

Steven D'Aprano · Sep 12, 2013

That's a neat trick.

Well obviously it's compressed.

Sorry for the typo, I meant 4.

Steven D'Aprano · Sep 12, 2013

By the way, please keep attributions for those you are quoting. It is
rude otherwise.

Really? Are you saying you (and the community at-large) always derive
from Object as your base class?

Not directly, that would be silly. But if you derive from int, or dict,
or ValueError, or any other type, you're indirectly deriving from object
since they derive from object. In Python 3, *everything* derives from
object.

In Python 2, the situation is slightly different in that there are still
legacy ("old style" or "classic") classes, but that's an old version of
Python. It's not quite obsolete as yet, but in another five years or so
it will be. The important thing is, as of *right now*, there are Python
versions where object is the base class of every class.

But wait is it the "base" (at the bottom of the hierarchy) or is it the
"parent" at the top? You see, you, like everyone else has been using
these terms loosely, confusing yourself.

Depends on whether I'm standing on my head or not.

Or more importantly, it depends on whether I visualise my hierarchy going
top->down or bottom->up. Both are relevant, and both end up with the
*exact same hierarchy* with only the direction reversed.

A dict is-a set of {keybject, keybject} pairs bound together with a
colon ":".

It certainly is not.

py> {'x': []} # Lists can be in dicts.
{'x': []}
py> set([[]]) # But not in sets.
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'list'

By inheriting from sets you get a lot of useful
functionality for free. That you don't know how you could use that
functionality is a failure of your imagination, not of the general idea.

No you don't. You get a bunch of ill-defined methods that don't make
sense on dicts.

For example: what is the intersection of these two dicts?

{'a': 1, 'b': 3}
{'a': 3, 'b': 5}

I can see SIX possibilities:

{}
{'a': 1, 'b': 3}
{'a': 3, 'b': 5}
{'a': 3}
{'b': 3}
raise an exception

Right. The dict literal should be {:} -- the one obvious way to do it.

I don't agree it is obvious. It is as obvious as (,) being the empty tuple
or [,] being the empty list.

Pay me later.

Blah, blah. Let me know when you got everyone migrated over to
Python.v3.

What does this have to do with Python 3? It works fine in Python 2.

4?) It allowed
[reference] variables to be used as dict keys. This creates a parsing
difficulty for me, mentally. Keys should be direct, hashable values,
not hidden in a variable name.

Click to expand...

I don't even understand what you are talking about here. "[reference]
variables"? What does that mean?

Click to expand...

It's a just a tricky point, that I will wait to comment on.

I'm looking forward to an explanation, as I'm intrigued.

Skip Montanaro · Sep 12, 2013

More likely, JP Morgan's mail system added that footer to the message
on the way out the virtual door. My recommendation would be to not
post using your company email address. Get a free email address.

Skip

The devolution of English language and slothful c.l.p behaviors exposed!	50	Jan 24, 2012
Why C Is Not My Favourite Programming Language	132	Feb 5, 2005
python-dev Summary for 2004-08-16 through 2004-08-31	1	Sep 19, 2004
comp.lang.vhdl FAQ part 1 of 4: general	0	Jul 8, 2003
comp.lang.c Answers to Frequently Asked Questions (FAQ List)	15	Apr 1, 2006
comp.lang.c Answers to Frequently Asked Questions (FAQ List)	1	Feb 1, 2004

Language design

Chris Angelico

Steven D'Aprano

Steven D'Aprano

Terry Reedy

Mark Janssen

Chris Angelico

Mark Janssen

Mark Janssen

Chris Angelico

Mark Janssen

Benjamin Kaplan

Terry Reedy

Chris Angelico

Steven D'Aprano

Roy Smith

Chris Angelico

Joshua Landau

Steven D'Aprano

Steven D'Aprano

Skip Montanaro

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads