Why is dictionary.keys() a list and not a set?

  • Thread starter Christoph Zwerschke
  • Start date
C

Christoph Zwerschke

Alex said:
Absolutely not in my use cases. The typical reason I'm asking for
.values() is because each value is a count (e.g. of how many time the
key has occurred) and I want the total, so the typical idiom is
sum(d.values()) -- getting a result set would be an utter disaster, and
should I ever want it, set(d.values()) is absolutely trivial to code.

Agree. This reason alone is enough for values() to be a list, besides
the problem that it can contain unhashable values.
Note that in both cases I don't need a LIST, just an ITERATOR, so
itervalues would do just as well (and probably faster) except it looks
more cluttered and by a tiny margin less readable -- "the sum of the
values" rolls off the tongue, "the sum of the itervalues" doesn't!-)
So, the direction of change for Python 3000, when backwards
compatibility can be broken, is to return iterators rather than lists.
At that time, should you ever want a list, you'll say so explicitly, as
you do now when you want a set, i.e. list(d.values())

Sounds reasonable.
In Python today 'in' doesn't necessarily mean set membership, but some
fuzzier notion of "presence in container"; e..g., you can code

'zap' in 'bazapper'

and get the result True (while 'paz' in 'bazapper' would be False, even
though, if you thought of the strings as SETS rather than SEQUENCES of
characters, that would be absurd). So far, strings (plain and unicode)
are the only containers that read 'in' this way (presence of subsequence
rather than of single item), but one example should be enough to show
that "set membership" isn't exactly what the 'in' operator means.

In this case, I would interpret the set on which 'in' operates as the
set of all substrings of the given string to have peace for my soul ;-)
You appear to have a strange notion of "derived data type". In what
sense, for example, would a list BE-A set? It breaks all kind of class
invariants, e.g. "number of items is identical to number of distinct
items", commutativity of addition, etc..

Sorry. Your're right. In the computer world, sets are derived from lists
(collections). In mathematics, lists are derived from sets. Here, you
would define the list [1, 1] as the set {1, {1, 1}} (you see, the
cardinality is the same). Yes, mathematics is strange ;-) Actually, in
mathematics, everything is a set and set theory is the "theory of
everything". When I grew up pedagogues here in Germany even believed it
would be best if kids learn set theory and draw venn diagrams before
they learn arithmetics... We were tortured with that kind of things in
the first class. Probably I'm still suffering from the late damages of
that treatment.

-- Christoph
 
A

Alex Martelli

Christoph Zwerschke said:
mathematics, everything is a set and set theory is the "theory of
everything". When I grew up pedagogues here in Germany even believed it
would be best if kids learn set theory and draw venn diagrams before

An alternative theory, of course, is "God made the natural numbers; all
else is the work of man" -- and that one is by a German, too (Kronecker,
if I recall correctly). The hope to found all of mathematics on set
theory was primarily a _British_ effort, as I see it (Russell and
Whitehead), and failed a long time ago... I'm not sure what, if
anything, a mathematician of today would propose as the foundational
theory... perhaps modal logic, but I'm really just guessing, being,
myself, an engineer rather than a mathematician (perhaps category
theory? it's hard to say...).

What, if anything, is the theoretically proper foundation, is of course
a separate issue from where is it best to start _teaching_ maths... with
geometry as the Greeks would have it, with arithmetic as in traditional
schooling, or with sets as the modern pedagogues would have it. Me, I'm
partial to arithmetic, but sets are just fine with me if they turn out
to work better (I wonder if proper controlled studies have ever been
done before revolutionizing the teaching of elementary maths,
though...!). Again you see my engineer's bias: "whatever works"!-)

But OO really requires a different mindset, particularly when operating
under a regime of "mutable" objects. "A circle IS-AN ellipse" in
Euclidean geometry... but inheriting Circle from Ellipse doesn't work in
OO if the objects are changeable, since you can, e.g., change
eccentricity in an Ellipse but not in a Circle...


Alex
 
C

Christoph Zwerschke

Mike said:
If the hash changes, you've screwed up the set. What it really should
say is "collection of objects with fixed hashes", or words to that
effect. Do you want to file a PR on this?

I fear I have not enough understanding of Python's hashing concepts to
make a suggestion for improvement here.
How so? As I demonstrated, you can subclass any class that doesn't
have a hash to add one, and then use the subclass, which - except for
having a hash - will have exactly the same behavior as your original
class.

But you surely wouldn't want to do this to the list of items just to be
able to return it as a set.

-- Christoph
 
C

Christoph Zwerschke

Alex said:
An alternative theory, of course, is "God made the natural numbers; all
else is the work of man" -- and that one is by a German, too (Kronecker,
if I recall correctly).

Yes, it was Kronecker. But even natural numbers are usually constructed
with sets using Peano's axioms.
> The hope to found all of mathematics on set theory was primarily a
> _British_ effort, as I see it (Russell and Whitehead), and failed a
> long time ago... I'm not sure what, if anything, a mathematician of
> today would propose as the foundational theory...

Perhaps "string theory" ;-) So probably strings should become the
fundamental datatype in Python (immutable strings of course) :)
But OO really requires a different mindset, particularly when operating
under a regime of "mutable" objects. "A circle IS-AN ellipse" in
Euclidean geometry... but inheriting Circle from Ellipse doesn't work in
OO if the objects are changeable, since you can, e.g., change
eccentricity in an Ellipse but not in a Circle...

Good example.

-- Christoph
 
M

Mike Meyer

Christoph Zwerschke said:
I fear I have not enough understanding of Python's hashing concepts to
make a suggestion for improvement here.

Well, I just made a suggestion. You found the problem - you want to do
the PR, or should I?
But you surely wouldn't want to do this to the list of items just to
be able to return it as a set.

Well, I don't have a use case for this, but it's not hard. Write a
class whose __new__ either returns the original object (if it's
hashable), or creates an instance of the class, which will proxy for
that object. Something like:

# Untested code
class Hash(object):
def __new__(cls, obj):
if hasattr(obj, '__hash__'):
return obj
self.__obj = obj
return object.__new__()
def __getattr__(self, name):
return getattr(self.__obj, name)
def __setattr(self, name, value):
setattr(self.__obj, name, value)
def __hash__(self):
return id(self.__obj)

and then do:

set([Hash(i) for i in lst])

If you need to turn a list of arbitrary, possibly unhashable, objects
into a set, is there a problem with the above?

<mike
 
A

Alex Martelli

Christoph Zwerschke said:
Yes, it was Kronecker. But even natural numbers are usually constructed
with sets using Peano's axioms.

Peano's axioms are perfectly abstract, as far as I recall. Russell and
Whitehead did try to construct naturals from sets (defining, e.g., '5'
as "the set of all sets with five items"), but that was before the
inherent contradictions of set theory were widely known (though Russell
himself had destroyed Frege's attempts at theorization by pointing out
one such contradiction, the one wrt the "set of all sets that don't
include themselves as a member" if I recall correctly). Later, Goedel
showed that any purely formal theory that's powerful enough to model
natural arithmetic cannot be both complete and consistent...

Perhaps "string theory" ;-) So probably strings should become the

Some physicists, maybe, surely not mathematicians though!


Alex
 
?

=?ISO-8859-15?Q?=22Martin_v=2E_L=F6wis=22?=

Christoph said:
I know. But this does not answer the question: If the keys *are* already
a set according to their semantics, why aren't they returned as a set
from the beginning?

Notice that this was not the question you originally asked. This
one has a clear answer: because that was always the case, and because
it was never changed.

Your original question was "could it be changed, and should it be
changed?" As the answer to the first part is already "no", the
second part really doesn't raise.
This, of course, in turn raises the question ;-) Would it be desirable
to have an additional, more general set datatype that can contain
mutable objects? I know about the perfomance drawbacks. And I think I
know the answer: It would not make much sense, since the implementation
would be actually not different from a list. So you could take a list
anyway. Which gives the answer why values() and items() return a list.

Actually, this answer is only partial. The real question is a semantical
one:

In a set, all elements are different. In Python, this means that, for
any two elements X and Y of a set, X!=Y. Now, consider this set:

a = [1]
b = [1,2]
S = set(a,b)
a.append(2)

Now, a==b, so the inherent set property breaks. In theory, this should
cause S to contain only a single element; the other one should
disappear. This is not only hard to implement (removal from a set
would have to remove all duplicates, iterating over a set would have
to skip over duplicates) - there is also another semantical issue:
If one element is skipped/dropped, which of these (a or b)?

Regards,
Martin
 
?

=?ISO-8859-15?Q?=22Martin_v=2E_L=F6wis=22?=

Christoph said:
Sorry. Your answer was good; I missed the point and thought you wrote
set(d.keys()). Is it documented anywhere that set(d) = set(d.keys())? I
think this should go into the Python Doco where keys() are explained.

It follows from what is documented. set(<iterable object>) creates a
set that contains all elements in the iterable object:

http://docs.python.org/lib/built-in-funcs.html#l2h-63

Now, is a dictionary an iterable object? Yes, it is:

http://www.python.org/peps/pep-0234.html

Together, this gives the property I demonstrated.

Unfortunately, the PEP apparently hasn't made it into the
implementation.

Regards,
Martin
 
P

Paul Rubin

Peano's axioms are perfectly abstract, as far as I recall. Russell and
Whitehead did try to construct naturals from sets (defining, e.g., '5'
as "the set of all sets with five items"), but that was before the
inherent contradictions of set theory were widely known (though Russell
himself had destroyed Frege's attempts at theorization by pointing out
one such contradiction, the one wrt the "set of all sets that don't
include themselves as a member" if I recall correctly).

That's only what's called "naive set theory". Axiomatic set theory
(the kind they use now) doesn't have these issues.

http://en.wikipedia.org/wiki/Naive_set_theory
http://en.wikipedia.org/wiki/Axiomatic_set_theory
Later, Goedel showed that any purely formal theory that's powerful
enough to model natural arithmetic cannot be both complete and
consistent...

Yes, it's not considered terribly troublesome. Set theory (and
arithmetic) are generally accepted as being consistent but incomplete.
So there will always be something for mathemeticians to do ;-).
 
?

=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=

Mike said:
If the hash changes, you've screwed up the set. What it really should
say is "collection of objects with fixed hashes", or words to that
effect. Do you want to file a PR on this?

It's more than that: you really mean "hashable values", where "hashable"
does not just imply that the hash value is fixed, but also that the
equal objects hash same.

Regards,
Martin
 
M

Mike Meyer

Martin v. Löwis said:
It's more than that: you really mean "hashable values", where "hashable"
does not just imply that the hash value is fixed, but also that the
equal objects hash same.

I would have phrased it as "identical" instead of equal. Sets should
hold unique elements, where "unique" depends on the use at hand. For
some uses, "==" might be appropriate. For others, "is" might be
appropriate.

Would you care to suggest an improved wording?

<mike
 
C

Christoph Zwerschke

Mike said:
> Well, I just made a suggestion. You found the problem - you want to do
> the PR, or should I?

Please go ahead if you have the time. By the way, the doco may be also
inexact about the keys of dictionaries.
# Untested code
class Hash(object):
def __new__(cls, obj):
if hasattr(obj, '__hash__'):
return obj
self.__obj = obj
return object.__new__()
def __getattr__(self, name):
return getattr(self.__obj, name)
def __setattr(self, name, value):
setattr(self.__obj, name, value)
def __hash__(self):
return id(self.__obj)

This will not work since even lists seem to have a __hash__ attribute.
Also, you will get an infinite recursion when trying to access
self.__obj. But principally, it should work. Ad hoc solution:

class HashProxy:
def __init__(self, obj):
self.__obj = obj
def __getattr__(self, name):
return getattr(self.__obj, name)
def __setattr(self, name, value):
setattr(self.__obj, name, value)
def __hash__(self):
return id(self.__obj)

def Hash(obj):
try:
hash(obj)
except TypeError:
return HashProxy(obj)
else:
return obj
> If you need to turn a list of arbitrary, possibly unhashable, objects
> into a set, is there a problem with the above?

Well, first you don't get the real unhashable objects, but proxied
objects. This will lead to problems if you want to check the type in the
following - you will always get the same type. Second, as you have
already mentioned, the elements of the sets will not be guaranteed to be
different anymore (only different in terms of identity (is), but they
still may be equal (==)). This would not be what you expect from a set.

-- Christoph
 
C

Christoph Zwerschke

Martin said:
> Your original question was "could it be changed, and should it be
> changed?" As the answer to the first part is already "no", the
> second part really doesn't raise.

Right of course. The second part of the question was rather hypothetical
(in the sense of "if we could start with Python from scratch, would it
then be desirable").
> In a set, all elements are different. In Python, this means that, for
> any two elements X and Y of a set, X!=Y. Now, consider this set:
> a = [1]
> b = [1,2]
> S = set(a,b)
> a.append(2)
> Now, a==b, so the inherent set property breaks. In theory, this should
> cause S to contain only a single element; the other one should
> disappear.

I just started to see these problems as well. I believed that the
immutability of set elements had only technical reasons (hashing etc.)
but your're right, it has also semantical reasons. In the above case, a
or b would have to magically disappear, and even if that would be
possible it would be completely unclear which of the two, and what would
happen if you subsequently do a.append(3)? Should it magically appear
again? So I understand you will get into very hot water if you try to
implement sets with mutable elements.
> This is not only hard to implement (removal from a set
> would have to remove all duplicates, iterating over a set would have
> to skip over duplicates) - there is also another semantical issue:
> If one element is skipped/dropped, which of these (a or b)?

I should have read your posting fully before writing the above ;-)

-- Christoph
 
C

Christoph Zwerschke

Paul said:
Yes, it's not considered terribly troublesome. Set theory (and
arithmetic) are generally accepted as being consistent but incomplete.
So there will always be something for mathemeticians to do ;-).

Somehow I found this being comforting. Even in the realm of Mathematics
there are things which can only be believed, but not be proven. Just as
in Physics there are things which can only be predicted with a certain
probability, but they are not predetermined completely.

But now we're really getting off topic.

-- Christoph
 
C

Christoph Zwerschke

Martin said:
It follows from what is documented. set(<iterable object>) creates a
set that contains all elements in the iterable object:
http://docs.python.org/lib/built-in-funcs.html#l2h-63
Now, is a dictionary an iterable object? Yes, it is:
http://www.python.org/peps/pep-0234.html
Together, this gives the property I demonstrated.

You need to know a third thing, namely that as an iterable object, a
dictionary returns the keys only, not the key/value tuples which would
also make some sense. If it had been designed that way, you could write

for k, v in d:
print k, v

instead of:

for k in d:
print k, d[k]

What I wanted to say is that the doco could mention this possibility to
get the keys as a set at the place where it explains the keys() method.

-- Christoph
 
B

bonono

Christoph said:
You can already get a set from a dictionary's keys in an efficient manner:
l = dict.fromkeys(range(10))
set(l)
Set([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

Good point. I expected that set(l) = set(l.items()) and not
set(l.keys()), but the latter would not work with mutable values. See
discussion with Martin.
puzzled. items() return tuples which I believe can be element of set ?
Or I misread you ?
 
M

Mike Meyer

Christoph said:
You can already get a set from a dictionary's keys in an efficient manner:
l = dict.fromkeys(range(10))
set(l)
Set([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
Good point. I expected that set(l) = set(l.items()) and not
set(l.keys()), but the latter would not work with mutable values. See
discussion with Martin.
puzzled. items() return tuples which I believe can be element of set ?
Or I misread you ?

Not all tuples can be elements of a set. Elements of a set have to be
hashable. Tuples compute their hash by hashing their contents. If
their contents aren't hashable, the tuple isn't hashable, and hence
can't be an element of a set. If the values in the dictionary aren't
hashable, then the tuples returned by items() won't be hashable
either, and hence can't be elements of a set.

<mike
 
B

bonono

Mike said:
Christoph said:
(e-mail address removed) schrieb:
You can already get a set from a dictionary's keys in an efficient manner:
l = dict.fromkeys(range(10))
set(l)
Set([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
Good point. I expected that set(l) = set(l.items()) and not
set(l.keys()), but the latter would not work with mutable values. See
discussion with Martin.
puzzled. items() return tuples which I believe can be element of set ?
Or I misread you ?

Not all tuples can be elements of a set. Elements of a set have to be
hashable. Tuples compute their hash by hashing their contents. If
their contents aren't hashable, the tuple isn't hashable, and hence
can't be an element of a set. If the values in the dictionary aren't
hashable, then the tuples returned by items() won't be hashable
either, and hence can't be elements of a set.
ah, thanks. I was thinking that tuples being immutable object must be
hashable. So it is not about mutable but hashable.
 
B

bonono

Mike said:
Christoph said:
(e-mail address removed) schrieb:
You can already get a set from a dictionary's keys in an efficient manner:
l = dict.fromkeys(range(10))
set(l)
Set([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
Good point. I expected that set(l) = set(l.items()) and not
set(l.keys()), but the latter would not work with mutable values. See
discussion with Martin.
puzzled. items() return tuples which I believe can be element of set ?
Or I misread you ?

Not all tuples can be elements of a set. Elements of a set have to be
hashable. Tuples compute their hash by hashing their contents. If
their contents aren't hashable, the tuple isn't hashable, and hence
can't be an element of a set. If the values in the dictionary aren't
hashable, then the tuples returned by items() won't be hashable
either, and hence can't be elements of a set.
A related issue, from the doc :

Set elements are like dictionary keys; they need to define both
__hash__ and __eq__ methods.

and dir(any_tuple) would show __hash__ and __eq__, would that be a bit
confusing as even though tuple has these two properties(or whatever
terms it should be called), it really depends what it contains ?

Or in other words, tuple is hashable if and only if every object it
contains is also hashable ?

If that is the case, would it be better to correct the doc to say such
thing ?
 
B

bonono

Mike said:
Christoph Zwerschke wrote:
(e-mail address removed) schrieb:
You can already get a set from a dictionary's keys in an efficient manner:
l = dict.fromkeys(range(10))
set(l)
Set([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
Good point. I expected that set(l) = set(l.items()) and not
set(l.keys()), but the latter would not work with mutable values. See
discussion with Martin.
puzzled. items() return tuples which I believe can be element of set ?
Or I misread you ?

Not all tuples can be elements of a set. Elements of a set have to be
hashable. Tuples compute their hash by hashing their contents. If
their contents aren't hashable, the tuple isn't hashable, and hence
can't be an element of a set. If the values in the dictionary aren't
hashable, then the tuples returned by items() won't be hashable
either, and hence can't be elements of a set.
A related issue, from the doc :

Set elements are like dictionary keys; they need to define both
__hash__ and __eq__ methods.

and dir(any_tuple) would show __hash__ and __eq__, would that be a bit
confusing as even though tuple has these two properties(or whatever
terms it should be called), it really depends what it contains ?

Or in other words, tuple is hashable if and only if every object it
contains is also hashable ?

If that is the case, would it be better to correct the doc to say such
thing ?

And this, again from the doc(about mapping objects):

A mapping object maps immutable values to arbitrary objects.

Seems that is questionable too.

a=(1,[])
d={}
d[a]=1

again would give TypeError, list object are unhashable.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,982
Messages
2,570,185
Members
46,738
Latest member
JinaMacvit

Latest Threads

Top