why () is () and [] is [] work in other way?

R

rusi

"is" is never ill-defined. "is" always, without exception, returns True
if the two operands are the same object, and False if they are not. This
is literally the simplest operator in Python.

Circular definition: In case you did not notice, 'is' and 'are' are
(or is it is?) the same verb.
 
T

Thomas Rachel

Am 24.04.2012 08:02 schrieb rusi:
Circular definition: In case you did not notice, 'is' and 'are' are
(or is it is?) the same verb.

Steven's definition tries not to define the "verb" "is", but it defines
the meanung of the *operator* 'is'.

He says that 'a is b' iff a and be are *the same objects*. We don't need
to define the verb "to be", but the target of the definition is the
entity "object" and its identity.


Thomas
 
R

rusi

Am 24.04.2012 08:02 schrieb rusi:



Steven's definition tries not to define the "verb" "is", but it defines
the meanung of the *operator* 'is'.

He says that 'a is b' iff a and be are *the same objects*. We don't need
to define the verb "to be", but the target of the definition is the
entity "object" and its identity.

Identity, sameness, equality and the verb to be are all about the same
concept(s) and their definitions are *intrinsically* circular; see
http://plato.stanford.edu/entries/identity/#2

And the seeming simplicity of the circular definitions hide the actual
complexity of 'to be'
for python: http://docs.python.org/reference/expressions.html#id26
(footnote 7)
for math/philosophy: http://www.math.harvard.edu/~mazur/preprints/when_is_one.pdf
 
T

Thomas Rachel

Am 24.04.2012 15:25 schrieb rusi:
Identity, sameness, equality and the verb to be are all about the same
concept(s) and their definitions are *intrinsically* circular; see
http://plato.stanford.edu/entries/identity/#2

Mybe in real life language. In programming and mathematics there are
several forms of equality, where identity (≡) is stronger than equality (=).

Two objects can be equal (=) without being identical (≡), but not the
other way.

As the ≡ is quite hard to type, programming languages tend to use other
operators for this.

E.g., in C, you can have

int a;
int b;
a = 4;
b = 4;

Here a and b are equal, but not identical. One can be changed without
changing the other.

With

int x;
int *a=&x, *b=&x;

*a and *b are identical, as they point to the same location.

*a = 4 results in *b becoming 4 as well.


In Python, you can have the situations described here as well.

You can have a list and bind it to 2 names, or you can take 2 lists and
bind them to that name.

a = [3]
b = [3]

Here a == b is True, while a is b results in False.


Thomas
 
N

Nobody

If I can't predict the output of

print (20+30 is 30+20) # check whether addition is commutative print
(20*30 is 30*20) # check whether multiplication is commutative

by just reading the language definition and the code, I'd have to say "is"
is ill-defined.

If anything is ill-defined, then it's "+" and "*", i.e. it's unspecified
whether the value which they return is a unique object or a reference to
some other object.

More accurately, the existence of "is", "is not" and "id" cause many other
constructs to have "ill-defined" behaviour.
Whether a and b are the same object is implementation-dependent.

And what's wrong with that? If you want a language which precisely
specifies all observable behaviour, you're going to end up with a rather
useless language. For a start, it can't have a time() function. For
similar reasons, you can't have networking or any form of preemptive
concurrency (which includes any form of inter-process communication on an
OS which uses preemptive multi-tasking).
 
A

Adam Skutt

If anything is ill-defined, then it's "+" and "*", i.e. it's unspecified
whether the value which they return is a unique object or a reference to
some other object.

Such a definition precludes meaningful operator overloading and is
highly problematic for floating-point numbers. There's also no way to
enforce it, but I think you know that too. :)

Identity and equality are distinct concepts in programming languages.
There's nothing that can be done about that, and no particularly good
reason to force certain language behaviors because some "programmers"
have difficulty with the distinction.

Though, maybe it's better to use a different keyword than 'is' though,
due to the plain English
connotations of the term; I like 'sameobj' personally, for whatever
little it matters. Really, I think taking away the 'is' operator
altogether is better, so the only way to test identity is:
id(x) == id(y)
Though I would prefer:
addr(x) == addr(y)
myself, again, for what little it matters. The right thing to do when
confronted with this problem is teach the difference and move on.

As an aside, the whole problem with 'is' and literals is perhaps the
only really good argument for a 'new' keyword/operator like C++ and
Java have. Then it's more explicit to the programmer that they've
created two objects (in this case, anyway).
More accurately, the existence of "is", "is not" and "id" cause many other
constructs to have "ill-defined" behaviour.



And what's wrong with that? If you want a language which precisely
specifies all observable behaviour, you're going to end up with a rather
useless language. For a start, it can't have a time() function. For
similar reasons, you can't have networking or any form of preemptive
concurrency (which includes any form of inter-process communication on an
OS which uses preemptive multi-tasking).

Fully specified does not mean fully deterministic. What makes a
specification of "Any value in the range 0 through N" less 'full' than
a specification of "X" or a constant?

Adam
 
S

Steven D'Aprano

Though, maybe it's better to use a different keyword than 'is' though,
due to the plain English
connotations of the term; I like 'sameobj' personally, for whatever
little it matters. Really, I think taking away the 'is' operator
altogether is better, so the only way to test identity is:
id(x) == id(y)

Four reasons why that's a bad idea:

1) The "is" operator is fast, because it can be implemented directly by
the interpreter as a simple pointer comparison (or equivalent). The id()
idiom is slow, because it involves two global lookups and an equality
comparison. Inside a tight loop, that can make a big difference in speed.

2) The "is" operator always has the exact same semantics and cannot be
overridden. The id() function can be monkey-patched.

3) The "is" idiom semantics is direct: "a is b" directly tests the thing
you want to test, namely whether a is b. The id() idiom is indirect:
"id(a) == id(b)" only indirectly tests whether a is b.

4) The id() idiom already breaks if you replace names a, b with
expressions:
id([1,2]) == id([3,4])
True



Though I would prefer:
addr(x) == addr(y)

But that's absolutely wrong. id(x) returns an ID, not an address. It just
happens that, as an accident of implementation, the CPython interpreter
uses the object address as an ID, because objects can't move. That's not
the case for all implementations. In Jython, objects can move and the
address is not static, and so IDs are assigned on demand starting with 1:


steve@runes:~$ jython
Jython 2.5.1+ (Release_2_5_1, Aug 4 2010, 07:18:19)
[OpenJDK Client VM (Sun Microsystems Inc.)] on java1.6.0_18
Type "help", "copyright", "credits" or "license" for more information.3


Other implementations may make other choices. I don't believe that the
language even defines the id as a number, although I could be wrong about
that.

Personally, I prefer the Jython approach, because it avoids those
annoying questions like "How do I dereference the address of an
object?" (answer: Python is not C, you can't do that), and IDs are
globally unique and never reused for the lifetime of the process.
 
A

Adam Skutt

Four reasons why that's a bad idea:

1) The "is" operator is fast, because it can be implemented directly by
the interpreter as a simple pointer comparison (or equivalent). The id()
idiom is slow, because it involves two global lookups and an equality
comparison. Inside a tight loop, that can make a big difference in speed.

The runtime can optimize the two operations to be equivalent, since
they are logically equivalent operations. If you removed 'is',
there's little reason to believe it would do otherwise.
2) The "is" operator always has the exact same semantics and cannot be
overridden. The id() function can be monkey-patched.

I can't see how that's useful at all. Identity is a fundamental
property of an object; hence retrieval of it must be a language
operation. The fact Python chooses to do otherwise is unfortunate,
but also irrelevant to my position.
3) The "is" idiom semantics is direct: "a is b" directly tests the thing
you want to test, namely whether a is b. The id() idiom is indirect:
"id(a) == id(b)" only indirectly tests whether a is b.

The two expressions are logically equivalent, so I don't see how this
matters, nor how it is true.
4) The id() idiom already breaks if you replace names a, b with
expressions:
id([1,2]) == id([3,4])

True

It's not broken at all. The lifetime of temporary objects is
intentionally undefined, and that's a /good/ thing. What's
unfortunate is that CPython optimizes temporaries differently between
the two logically equivalent expressions.

As long as this holds:.... def __del__(self):
.... print "Farewell to: %d" % id(self)
....Farewell to: 4146953292
Farewell to: 4146953260
FalseFarewell to: 4146953420
Farewell to: 4146953420
True

then there's nothing "broken" about the behavior of either expression.
I personally think logically equivalent expressions should give the
same results, but since both operations follow the rules of object
identity correctly, it's not the end of the world. It's only
surprising to the programmer if:
1) They don't understand identity.
2) They don't understand what objects are and are not temporaries.

Code that relies on the identity of a temporary object is generally
incorrect. This is why C++ explicitly forbids taking the address
(identity) of temporaries. As such, the language behavior in your
case is inconsequential. Making demons fly out of the programmer's
nose would be equally appropriate.

The other solution is to do what Java and C# do: banish id() entirely
and only provide 'is' (== in Java, Object.ReferenceEquals() in C#).
That seems just as fine, really, Practically, it's also probably the
better solution for CPython, which is fine by me. My preference for
keeping id() and removing 'is' probably comes from my background as a C
++ programmer, and I already said it matters very little.
But that's absolutely wrong. id(x) returns an ID, not an address.
It just
happens that, as an accident of implementation, the CPython interpreter
uses the object address as an ID, because objects can't move. That's not
the case for all implementations. In Jython, objects can move and the
address is not static, and so IDs are assigned on demand starting with 1:

steve@runes:~$ jython
Jython 2.5.1+ (Release_2_5_1, Aug 4 2010, 07:18:19)
[OpenJDK Client VM (Sun Microsystems Inc.)] on java1.6.0_18
Type "help", "copyright", "credits" or "license" for more information.>>>id(42)
1
3

An address is an identifier: a number that I can use to access a
value[1]. I never said that id() must return an address the host CPU
understands (virtual, physical, or otherwise). Most languages use
addresses that the host CPU cannot understand without assistance at
least sometimes, including C on some platforms.
Other implementations may make other choices. I don't believe that the
language even defines the id as a number, although I could be wrong about
that.

http://docs.python.org/library/functions.html#id says it must be an
integer of some sort. Even if it didn't say that, it hardly seems as
a practical imposition.
Personally, I prefer the Jython approach, because it avoids those
annoying questions like "How do I dereference the address of an
object?" (answer: Python is not C, you can't do that),

The right way to solve that question isn't to fix the runtime, but to
teach people what pointer semantics actually mean, much like the
identity problem we're discussing now.

Adam

[1] I'd be more willing to accept a more general definition that
allows for non-numeric addresses, but such things are rare.
 
J

John Nagle

Four reasons why that's a bad idea:

1) The "is" operator is fast, because it can be implemented directly by
the interpreter as a simple pointer comparison (or equivalent).

This assumes that everything is, internally, an object. In CPython,
that's the case, because Python is a naive interpreter and everything,
including numbers, is "boxed". That's not true of PyPy or Shed Skin.
So does "is" have to force the creation of a temporary boxed object?

The concept of "object" vs. the implementation of objects is
one reason you don't necessarily want to expose the implementation.

John Nagle
 
S

Steven D'Aprano

This assumes that everything is, internally, an object.

No it doesn't. It assumes that everything provides the external interface
of an object. Internally, the implementation could be anything you like,
so long as it simulates an object when observed from Python.
 
S

Steven D'Aprano

The runtime can optimize the two operations to be equivalent, since they
are logically equivalent operations. If you removed 'is', there's
little reason to believe it would do otherwise.

I'm afraid you are mistaken there. *By design*, Python allows shadowing
and monkey-patching of built-ins. (Although not quite to the same degree
as Ruby, and thank goodness!)

Given the language semantics of Python, "id(a) == id(b)" is NOT
equivalent to "a is b" since the built-in id() function can be shadowed
by some other function at runtime, but the "is" operator cannot be.

An extremely clever optimizing implementation like PyPy may be able to
recognise at runtime that the built-in id() is being called, but that
doesn't change the fact that PyPy MUST support this code in order to
claim to be a Python compiler:

id = lambda x: 999
id(None) == id("spam") # returns True

If your runtime doesn't allow that, it isn't Python.

I can't see how that's useful at all. Identity is a fundamental
property of an object; hence retrieval of it must be a language
operation. The fact Python chooses to do otherwise is unfortunate, but
also irrelevant to my position.

It's useful for the same reason that shadowing any other builtin is
useful. id() isn't special enough to complicate the simple, and
effective, execution model just to satisfy philosophers.

The two expressions are logically equivalent, so I don't see how this
matters, nor how it is true.

They are not *logically* equivalent. First you have to define what you
mean by identity, then you have to define what you mean by an ID, and
then you have to decide whether or not to enforce the rule that identity
and IDs are 1:1 or not, and if so, under what circumstances. My library
card ID may, by coincidence, match your drivers licence ID. Doesn't mean
we're the same person. Entities may share identities, or may have many
identities. The Borg design pattern, for example, would be an excellent
candidate for ID:identity being treated as many-to-one.

Even if you decide that treating IDs as 1:1 is the only thing that makes
sense in your philosophy, in practice that does not hold for Python. IDs
may be reused by Python. There are circumstances where different objects
get the same ID. Hence, comparing IDs is not equivalent to identity
testing.

But I was actually referring to something more fundamental than that. The
statement "a is b" is a *direct* statement of identity. "John is my
father." "id(a) == id(b)" is *indirect*: "The only child of John's
grandfather is the parent of the mother-in-law of my sister-in-law" sort
of thing. (Excuse me if I got the relationships mixed up.)

4) The id() idiom already breaks if you replace names a, b with
expressions:
id([1,2]) == id([3,4])
True

It's not broken at all.

It is broken in the sense that "id(a) == id(b)" is to be treated as
equivalent to "a is b". The above example demonstrates that you CANNOT
treat them as equivalent.


[...]
The other solution is to do what Java and C# do: banish id() entirely

Solution to *what problem*?

I do not believe that there is a problem here that needs to be solved.
Both id() and "is" are perfectly fine for what they do.

But that's absolutely wrong. id(x) returns an ID, not an address. It
just
happens that, as an accident of implementation, the CPython interpreter
uses the object address as an ID, because objects can't move. That's
not the case for all implementations. In Jython, objects can move and
the address is not static, and so IDs are assigned on demand starting
with 1:

steve@runes:~$ jython
Jython 2.5.1+ (Release_2_5_1, Aug 4 2010, 07:18:19) [OpenJDK Client VM
(Sun Microsystems Inc.)] on java1.6.0_18 Type "help", "copyright",
"credits" or "license" for more information.>>> id(42) 1
id("Hello World!") 2
id(None)

3
An address is an identifier: a number that I can use to access a
value[1].

Then by your own definition, Python's id() does not return an address,
since you cannot use it to access a value.
 
A

Adam Skutt

I'm afraid you are mistaken there. *By design*, Python allows shadowing
and monkey-patching of built-ins. (Although not quite to the same degree
as Ruby, and thank goodness!)
Yes, I understand that. You still haven't explained why this behavior
is correct in this particular situation. Arguing from the position
of, "What Python does must be correct" isn't a valid tactic, I'm
afraid.
It's useful for the same reason that shadowing any other builtin is
useful. id() isn't special enough to complicate the simple, and
effective, execution model just to satisfy philosophers.

If overriding id() is useful, then overriding 'is' must be useful
too. Python is still broken. Unless you can prove the two operations
shouldn't be logically equivalent (and you don't), you can't
meaningfully argue for different semantics for them. You still end up
with a broken language either way.
They are not *logically* equivalent. First you have to define what you
mean by identity, then you have to define what you mean by an ID, and
then you have to decide whether or not to enforce the rule that identity
and IDs are 1:1 or not, and if so, under what circumstances.

You're going to have to explain the value of an "ID" that's not 1:1
with an object's identity, for at least the object's lifecycle, for a
programmer. If you can't come up with a useful case, then you haven't
said anything of merit.

Plainly, to show they're not logically equivalent, you need to explain
why the guarantee provided by id() is improper. Then, you need to
generalize it to all programming languages. Python's concept of
identity is not unique nor special. It uses the exact same rules as C+
+, C#, Java, and many other languages.
My library
card ID may, by coincidence, match your drivers licence ID. Doesn't mean
we're the same person.

I don't know why you even remotely think this is relevant. All it
does is further demonstrate that you don't understand the object-
oriented concept of identity at all. Comparing library IDs and
drivers license IDs is an improper operation. Languages that can have
multiple IDs, possibly overlapping, /do/ disallow such idiocy.
identities. The Borg design pattern, for example, would be an excellent
candidate for ID:identity being treated as many-to-one.

How would inheritance work if I did that?
Even if you decide that treating IDs as 1:1 is the only thing that makes
sense in your philosophy, in practice that does not hold for Python. IDs
may be reused by Python.

They may be reused in all languages I can think of. They're only
unique for the lifetime of the object, because that's all we need as
programmers.
There are circumstances where different objects
get the same ID. Hence, comparing IDs is not equivalent to identity
testing.

Two objects only get the same ID if one of the objects is dead. The
results of such an comparison are obviously meaningless. Some
runtimes even try very hard to prevent you from doing such silly
things.
But I was actually referring to something more fundamental than that. The
statement "a is b" is a *direct* statement of identity. "John is my
father." "id(a) == id(b)" is *indirect*: "The only child of John's
grandfather is the parent of the mother-in-law of my sister-in-law" sort
of thing. (Excuse me if I got the relationships mixed up.)

Again, the fact that you somehow think this absurd family tree is
relevant only shows you're fundamentally confused about what object
oriented identity means. That's rather depressing, seeing as I've
given you a link to the definition.

In a mathematical sense, you're saying that given f(x) = x+2, using
f(x) is somehow more "direct" (whatever the hell that even means) than
using 'x+2'. That's just not true. We freely and openly interchange
them all the time doing mathematics. Programming is no different.
It is broken in the sense that "id(a) == id(b)" is to be treated as
equivalent to "a is b". The above example demonstrates that you CANNOT
treat them as equivalent.

They give different results, yes, and that's unfortunate. However, my
code clearly demonstrates they are logically equivalent operations:
the temporaries have different addresses in these cases. Both
expressions return the correct answer based on their inputs.

Since there are side-effects involved here, you cannot simply compare
the results and conclude they are different. You must account for the
side-effects, and when you do, then we conclude they have the same
semantics.
Solution to *what problem*?
This confusion that many people have over what 'is' does, including
yourself.
An address is an identifier: a number that I can use to access a
value[1].

Then by your own definition, Python's id() does not return an address,
since you cannot use it to access a value.

The fact Python lacks explicit dereferencing doesn't change the fact
that id() returns an address. Replace 'can' with 'could' or 'could
potentially' or the whole phrase with 'represents' if you wish. It's
a rather pointless thing to quibble over.

Would you call the result of casting a C pointer to an int an
address? If so, you must call the result of id() an address as well--
you can't dereference either of them. If not, then you need to
provide an alternate name for the result of casting a C pointer to an
int.

Adam
 
A

Adam Skutt

    This assumes that everything is, internally, an object.  In CPython,
that's the case, because Python is a naive interpreter and everything,
including numbers, is "boxed".  That's not true of PyPy or Shed Skin.
So does "is" have to force the creation of a temporary boxed object?

That's what C# does AFAIK. Java defines '==' as value comparison for
primitives and '==' as identity comparison for objects, but I don't
exactly know how one would do that in Python.

Adam
 
A

Adam Skutt

Why should we take from Java one of its worst misfeatures and disfigure
Python for life?

There are a lot of misfeatures in Java. Lack of operating overloading
really isn't one of them. I prefer languages that include operator
overloading, but readily understand and accept the arguments against
it. Nor is the differing behavior for '==' between primitives and
objects a misfeature.

C# and Python do have a misfeature: '==' is identity comparison only
if operator== / __eq__ is not overloaded. Identity comparison and
value comparison are disjoint operations, so it's entirely
inappropriate to combine them.

I don't necessarily mind if the two operations have the same symbol,
as long as there's some other way in-context to determine which
operation is occurring. This is the case in C and C++, for example.
Python's way is much much cleaner.

Nope. Automatically substituting identity equality for value equality
is wrong. While rare, there are legitimate reasons for the former to
be True while the latter is False. Moreover, it means that class
authors must remember to write an __eq__ when appropriate and won't
get any sort of error when they forget to do so. That can lead to
bugs.

Adam
 
R

rusi

In a mathematical sense, you're saying that given f(x) = x+2, using
f(x) is somehow more "direct" (whatever the hell that even means) than
using 'x+2'.  That's just not true.  We freely and openly interchange
them all the time doing mathematics.  Programming is no different.

If f(x) and x+2 are freely interchangeable then you have referential
transparency, a property that only purely functional languages have.
In python:
a = [1,2]
m1 = [a, a]
m2 = [[1,2],[1,2]]

may make m1 and m2 seem like the same until you assign to m1[0][0].
eg One would not be able to distinguish m1 and m2 in Haskell.

On the whole I whole-heartedly agree that 'a is b' be replaced by
id(a) == id(b), with id itself replaced by something more obviously
implementational like addrof.

The reasons are fundamental: The word 'is' is arguably the primal
existential verb. Whereas 'is' in python is merely a leakage of
implementation up to the programmer level:
http://www.joelonsoftware.com/articles/LeakyAbstractions.html

Such a leakage may be justified just as C allowing inline asm may be
justified.
However dignifying such a leakage with the primal existential verb
causes all the confusions seen on this thread (and the various
stackoverflow questions etc).
 
A

Adam Skutt

If f(x) and x+2 are freely interchangeable then you have referential
transparency, a property that only purely functional languages have.
In python:

I think you misunderstood what I was trying to explain. Steven is
trying to claim that there's some sort of meaningful difference
between calling an operation/algorithm/function by some name versus
handing out its definition. I was merely pointing out that we
routinely substitute the two when it is appropriate to do so.

My apologies if you somehow took that to mean that I was implying
there was referential transparency here. I couldn't think of a better
example for what I was trying to say.

Adam
 
R

rusi

I think you misunderstood what I was trying to explain.  Steven is
trying to claim that there's some sort of meaningful difference
between calling an operation/algorithm/function by some name versus
handing out its definition.  I was merely pointing out that we
routinely substitute the two when it is appropriate to do so.

My apologies if you somehow took that to mean that I was implying
there was referential transparency here.  I couldn't think of a better
example for what I was trying to say.

Adam

And my apologies... I forgot to state my main point:
Programmer accessible object identity is the principal impediment to
referential transparency.
In a functional language one can bind a name to a value -- period.
There is nothing more essence-ial -- its platonic id -- to the name
than that and so the whole can of worms connected with object identity
remains sealed within the language implementation.
 
J

John Nagle

That's what C# does AFAIK. Java defines '==' as value comparison for
primitives and '==' as identity comparison for objects, but I don't
exactly know how one would do that in Python.

I would suggest that "is" raise ValueError for the ambiguous cases.
If both operands are immutable, "is" should raise ValueError.
That's the case where the internal representation of immutables
shows through.

If this breaks a program, it was broken anyway. It will
catch bad comparisons like

if x is 1000 :
...

which is implementation dependent.

John Nagle
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,146
Messages
2,570,832
Members
47,374
Latest member
EmeliaBryc

Latest Threads

Top