why () is () and [] is [] work in other way?

Steven D'Aprano · Apr 27, 2012

On Apr 26, 5:10Â am, Steven D'Aprano <steve
(e-mail address removed)> wrote:
This confusion that many people have over what 'is' does, including
yourself.

I have no confusion over what "is" does. The "is" operator returns True
if and only if the two operands are the same object, otherwise it returns
False.

If you think that "is" does something different, you are wrong.

An address is an identifier: a number that I can use to access a
value[1].

Click to expand...

Then by your own definition, Python's id() does not return an address,
since you cannot use it to access a value.

Click to expand...

The fact Python lacks explicit dereferencing doesn't change the fact
that id() returns an address. Replace 'can' with 'could' or 'could
potentially' or the whole phrase with 'represents' if you wish. It's a
rather pointless thing to quibble over.

You can't treat id() as an address. Did you miss my post when I
demonstrated that Jython returns IDs generated on demand, starting from
1? In general, there is *no way even in principle* to go from a Python ID
to the memory location (address) of the object with that ID, because in
general objects *may not even have a fixed address*. Objects in Jython
don't, because the Java virtual machine can move them in memory.

The fact that CPython happens to use the memory address of objects,
suitably converted to an int object, is a red-herring. It leads to
nothing but confusion.

Would you call the result of casting a C pointer to an int an address?
If so, you must call the result of id() an address as well-- you can't
dereference either of them. If not, then you need to provide an
alternate name for the result of casting a C pointer to an int.

I don't need to do anything of the sort. It was *your* definition, not
mine. Don't put the responsibility on me for your definition being broken.

(And for the record, in C you can cast an integer into a pointer,
although the results are implementation-specific. There's no equivalent
in Python.)

Adam Skutt · Apr 27, 2012

Yes, object identity is implemented almost? everywhere by comparing
the value of two pointers (references)[1]. I've already said I'm not
really sure how else one would go about implementing it.

You might tell me that that's just an implementation detail, but when an
implementation detail is easier to understand and makes more sense than
the whole abstraction which is built upon it, something is seriously wrong.

Click to expand...

Click to expand...

I'm not sure what abstraction is being built here. I think you have
me confused for someone else, possibly Steven.

Click to expand...

The abstraction is this:
- There are primitives and objects.
- Primitives are not objects. The converse is also true.
- Primitives can become objects (boxing).
- Two primitives x and y are equal iff x == y.
- Two objects x and y are equal iff x.equals(y).
- Two objects are the same object iff x == y.
- If x is a primitive, then y = x is a deep copy.
- If x is an object, then y = x is a shallow copy.
- ...

This is not an abstraction at all, but merely a poor explanation of
how things work in Java. Your last statement is totally incorrect, as
no copying of the object occurs whatsoever. The reference is merely
reseated to refer to the new object. If you're going to chide me for
ignoring the difference between the reference and the referent object,
then you shouldn't ignore it either, especially in the one case where
it actually matters! If we try to extend this to other languages,
then it breaks down completely.

The truth:
- Primitives can be references.
- Two primitives are equal iff x == y.
- Operator '.' automatically derefences references.

You have the first statement backwards. References are a primitive
construct, not the other way around. While true, it's still a bad way
to think about what's going on. It breaks down once we add C++ /
Pascal reference types to the mix, for example.

It's better to think about variables (names) and just recognize that
not all variables have the same semantics. It avoids details that are
irrelevant to writing actual programs and remains consistent.

Equality or equivalence is a relation which is:
- reflexive
- symmetric
- transitive
Everything else... is something else. Call it semi-equality,
tricky-equality or whatever, but not equality, please.

Sure, but then it's illegal to allow the usage of '==' with floating
point numbers, which will never have these properties in any usable
implementation[1]. So we're back to what started this tangent, and we
end up needing 'equals()' methods on our classes to distinguish
between the different forms of equality. That's precisely what you
want to avoid.

Or we can just accept that '==' doesn't always possess those
properties, which is what essentially every programming language does,
and call it (value) equality. As long as we don't cross incompatible
meanings, it's hard to believe that this isn't the right thing to do.

When everything is "white", the word "white" becomes redundant.
So the fact that everything in Python have reference semantics means
that we can't stop thinking about value and reference semantics.

Nope. The behavior of variables is absolutely essential to writing
correct programs. If I write a program in Python that treats
variables as if they were values, it will be incorrect.

By defining a copy constructor.

Then write me a working one. I'll wait. To save yourself some time,
you can start with std::fstream.

Python is already without pointers (*).
A world where everyone is a lawyer is a world without lawyers (really,
there isn't any other way we can get rid of them ).

(*) By the way, some would argue that references are not pointers.

They would be completely and utterly wrong, and probably imbuing
pointers with properties they don't actually possess. Unless you're
talking about C++ / Pascal references, which really aren't pointers
and do possess a different set of semantics (alias might be a better
term for them).

Adam

[1] Not in any fashion that's useful to the programmer, at any rate.

Adam Skutt · Apr 27, 2012

I gave an example earlier, but you seem to have misunderstood it, so I'll
give more detail.

In the Borg design pattern, every Borg instance shares state and are
indistinguishable, with only one exception: object identity. We can
distinguish two Borg instances by using "is".

Since the whole point of the pattern is for Borg instances to be
indistinguishable, the existence of a way to distinguish Borg instances
is a flaw and may be undesirable. At least, it's exposing an
implementation detail which some people argue should not be exposed.

Then people should stop with such idiocy like the Borg pattern. It's a
bad example from an even worse idea.

Why should the caller care whether they are dealing with a singleton
object or an unspecified number of Borg objects all sharing state? A
clever interpreter could make many Borg instances appear to be a
singleton. A really clever one could also make a singleton appear to be
many Borg instances.

Trivial: to break cyclical references in a deep copy operation.

John's argument is that Python should raise an exception if you compare
"2 is 2", or for that matter "3579 is 3579", which is foolish.

You don't inherit from Borg instances, and instances inherit from their
class the same as any other instance.

I think you misunderstood me. Define a Borg class where somehow
identity is the same for all instances. Inherit from that class and
add per-instance members. Now, identity can't be the same for all
instances. As a result, you've just violated the Liskov Substituion
Principal: code that relies on all Borg class instances having the
same identity will fail when passed an instance of the subclass.

It's impossible to combine identities and not violate LSP, unless you
forbid subclasses. Your idea violates one of the most fundamental
tenants of object-oriented programming. This is because object
identity is one of the fundamental consequences of object-oriented
programming. You can't do away with it, and any attempt to do so
really just suggests that you don't understand OOP at all.

Adam

Steven D'Aprano · Apr 27, 2012

If I write a program in Python that treats variables as if they were
values, it will be incorrect.

I'm sorry, it is unclear to me what distinction you are making between
variables and values. Can you give simple examples of both incorrect and
correct code demonstrating what you mean?

(I know what distinction *I* would make, but I'm not sure if it is the
same one you are making.)

Adam Skutt · Apr 27, 2012

I have no confusion over what "is" does.

False. If you did, then you would not have suggested the difference
in True/False result between "id([1,2]) == id([1, 2])" and "[1, 2] is
[1, 2]" matters. You would understand that the result of an identity
test with temporary objects is meaningless, since identity is only
meaningful while the objects are alive. That's a fundamental
mistake.

An address is an identifier: a number that I can use to access a
value[1].
Then by your own definition, Python's id() does not return an address,
since you cannot use it to access a value.

Click to expand...

Click to expand...

The fact Python lacks explicit dereferencing doesn't change the fact
that id() returns an address. Replace 'can' with 'could' or 'could
potentially' or the whole phrase with 'represents' if you wish. It'sa
rather pointless thing to quibble over.

Click to expand...

You can't treat id() as an address. Did you miss my post when I
demonstrated that Jython returns IDs generated on demand, starting from
1? In general, there is *no way even in principle* to go from a Python ID
to the memory location (address) of the object with that ID, because in
general objects *may not even have a fixed address*. Objects in Jython
don't, because the Java virtual machine can move them in memory.

Yes, there is a way. You add a function deref() to the language. In
CPython, that simply treats the passed value as a memory address and
treats it as an object, perhaps with an optional check. In Jython,
it'd access a global table of numbers as keys with the corresponding
objects as values, and return them. The value of id() is absolutely
an address, even in Jython. The fact the values can move about is
irrelevant.

Again, if this wasn't possible, then you couldn't implement 'is'.
Implementing 'is' requires a mechanism for comparing objects that
doesn't involve ensuring the contents of the two operands in memory is
the same.

I don't need to do anything of the sort.

Yes, you do, because you called such a thing an address when talking
about CPython. Even if my definition is wrong (it's not), your
definition is wrong too.

(And for the record, in C you can cast an integer into a pointer,
although the results are implementation-specific. There's no equivalent
in Python.)

Yes, but the lack of that operation doesn't mean that id() doesn't
return an address.

Adam

Adam Skutt · Apr 27, 2012

Perhaps you failed to notice that this "absurd" family tree, as you put
it, consists of grandparent+parent+sibling+in-law. What sort of families
are you familiar with that this seems absurd to you?

No, I noticed, but who talks like that? It's not remotely comparable
to the sort of difference we're talking about.

I think you have inadvertently demonstrated the point I am clumsily
trying to make. Even when two expressions are logically equivalent, the
form of the expressions make a big difference to the comprehensibility of
the text.

And if we were talking about 30, 20, 5, maybe even 2 line function
versus it's name, you might have a point. We're not talking about such
things though, and it's pretty disingenuous to pretend otherwise.
Yet, that's precisely what you did with your absurd family
relationship.

Which would you rather read?

for item in sequence[1:]: ...

for item in sequence[sum(ord(c) for c in 'avocado') % 183:]: ...

The two are logically equivalent, so logically you should have no
preference between the two, yes?

No, they're not logically equivalent. The first won't even execute,
as sequence is undefined. You need two lines in the first case.

A statement is "direct" in the sense I mean if it explicitly states the
thing you intend it to state.

And in the case of the two ways to compare identity, both statements
state exactly what I intend to state. They're synonyms.

"a is b" is a direct test of whether a is b. (Duh.)

"id(a) == id(b)" is an indirect test of whether a is b, since it requires
at least three indirect steps: the knowledge of what the id() function
does, the knowledge of what the == operator does, and the knowledge that
equal IDs imply identity.

The problem is that using 'is' correctly requires understanding all of
those three things.

Adam

Steven D'Aprano · Apr 27, 2012

Trivial: to break cyclical references in a deep copy operation.

I asked why the *caller* should care. If the caller has to break cyclical
references manually, the garbage collector is not doing its job.

If you're going to propose underpowered or buggy environments as an
objection, then I'll simply respond that I'm not talking about any
specific (underpowered or buggy) implementation, I'm talking about what
is logically possible.

[...]

I think you misunderstood me. Define a Borg class where somehow
identity is the same for all instances. Inherit from that class and add
per-instance members.

I think that if you're talking about per-instance members of a Borg
class, you're confused as to what Borg means. Since all instances share
state, you can't have *per-instance* data.

Now, identity can't be the same for all
instances. As a result, you've just violated the Liskov Substituion
Principal: code that relies on all Borg class instances having the same
identity will fail when passed an instance of the subclass.

Not at all. Obviously each Borg subclass will have it's own fake
identity. Code that requires instances of different types to be identical
is fundamentally broken, since the mere fact that they are different
types means they can't be identical.

I'll accept the blame for your confusion as I glossed over something
which I thought was obvious, but clearly wasn't.

When I said that Borg instances are indistinguishable except for
identity, I thought that was obvious that I was talking about instances
of a single type. Mea culpa.

Clearly if x is an instance of Borg, and y is an instance of
BorgSubclass, you can distinguish them by looking at the type. The point
is that you shouldn't be able to distinguish instances of a single type.

It's impossible to combine identities and not violate LSP, unless you
forbid subclasses. Your idea violates one of the most fundamental
tenants of object-oriented programming. This is because object identity
is one of the fundamental consequences of object-oriented programming.
You can't do away with it, and any attempt to do so really just suggests
that you don't understand OOP at all.

Oh please, enough of the religion of LSP.

Barbara Liskov first introduced this idea in 1987, twenty years after
Simula 67 first appeared and thirty years after MIT researchers came up
with the concept of object oriented programming. That's hardly
fundamental to the concept of OOP. People have, and still do, violate LSP
all the time.

LSP may be best practice but it's hardly essential. OOP was useful before
LSP and it will remain useful in the face of violations.

Besides:

- In real life, subtypes often violate LSP. An electric car is a type of
car, but it has no petrol tank. Wolf spiders have eyes, except for the
KauaÊ»i cave wolf spider, which is is a subtype of wolf spider but is
completely eyeless.

- Subclasses in Eiffel are not necessarily subtypes and may not be
substitutable for superclasses. If it's good enough for Eiffel, it's good
enough for my hypothetical Borg subclasses.

You can always declare that Bertrand Meyer doesn't "understand OOP at
all" too.

Adam Skutt · Apr 27, 2012

I asked why the *caller* should care. If the caller has to break cyclical
references manually, the garbage collector is not doing its job.

It's a necessary requirement to serialize any cyclical structure.
Garbage collection has nothing to do with it. If I have some
structure such that A --> B --> A, I need to be able to determine that
I've seen 'A' before in order to serialize the structure to disk, or I
will never write it out successfully.

There are plenty of situations where we legitimately care whether two
pointers are the same and don't give one whit about the state of
objects they point to. You cannot conflate the two tests, and that's
precisely what your 'give all borg instances the same identity' idea
does.

I think that if you're talking about per-instance members of a Borg
class, you're confused as to what Borg means.

I'm not. I'm talking about per-instance members of a subclass of a
Borg class. There's nothing about the Borg pattern that forbids such
behavior, which is one of the reasons it's such a terrible idea in
general. Borg implies promises that it cannot readily keep.

Since all instances share state, you can't have *per-instance* data.

I most certainly can do so in a subclass. Shared state in a parent
doesn't mandate shared state in a child.

Not at all. Obviously each Borg subclass will have it's own fake
identity.
When I said that Borg instances are indistinguishable except for
identity, I thought that was obvious that I was talking about instances
of a single type. Mea culpa.

Clearly if x is an instance of Borg, and y is an instance of
BorgSubclass, you can distinguish them by looking at the type. The point
is that you shouldn't be able to distinguish instances of a single type.

No, that's not the least bit obvious nor apparent, and it still
violates LSP. It means every function that takes a Borg as an
argument must know about every subclass in order to distinguish
between them.

The serialization function above would need to do so. Imagine an
object x that holds a Borg object and a BorgSubclass object. If the
serialization function keeps a list of objects it has seen before and
uses that to determine whether to write the object out, it will fail
to write out one or the other if we implemented your harebrained 'All
Borg objects have the same identity' idea.

Your idea means that 'x.borg is x.subborg' must return True. It also
means either x.borg isn't going to be written out, or x.subborg isn't
going to be written out. The program is broken.

If you modify your idea to ignore subtypes, than this function breaks:
def write_many(value, channel1, channel2):
channel1.write(value)
if channel2 is not channel1:
channel2.write(value)

Calling write_many("foo", x.borg, x.subborg) now gives different
behavior than write_many("foo", x.borg, x.borg). That's probably not
what the programmer intended!

Like it or not, whether you have only one object with shared state or
infinite objects with the same shared state is not an implementation
detail. Just because you write code that doesn't care about that fact
does not make it an implementation detail. I can write code that
depends on that fact, and there's not a single thing you can do to
stop me.

This is why the Borg pattern is a bad idea in general, because it
encourages programmers to write code that is subtly wrong. If you
have a Borg class, you can't ignore the fact that you have multiple
objects even if you want to do so. You will eventually end up writing
incorrect code as a result. Yet, many people try to do precisely
that, your idea is attempting to do precisely that!

Oh please, enough of the religion of LSP.

Barbara Liskov first introduced this idea in 1987, twenty years after
Simula 67 first appeared and thirty years after MIT researchers came up
with the concept of object oriented programming. That's hardly
fundamental to the concept of OOP.

People have, and still do, violate LSP all the time.

People write code with security flaws all of the time too. This
doesn't even being to approach being an reasonable argument. It's
completely disingenuous. People come up with ideas and fail to
properly formalize them all of the time. People come up with useful,
revolutionary ideas and get parts of them wrong all of the time.

If you violate LSP, then you enable interface users to write buggy
code. Correct class hierarchies must follow it; correct interface
implementations must follow it. There's nothing optional about it, it
even applies even if you don't have objects at all. It's just extra
inescapable for the sort of class hierarchies most OOP languages use.

Besides:

- In real life, subtypes often violate LSP. An electric car is a type of
car, but it has no petrol tank. Wolf spiders have eyes, except for the
KauaÊ»i cave wolf spider, which is is a subtype of wolf spider but is
completely eyeless.

Yes, real life is more complicated than the simplistic relationships
that most class hierarchies support. So what? Where did I advocate
encoding such relationships using class hierarchies?

Adam

Adam Skutt · Apr 28, 2012

With shallow copy I meant exactly that. I didn't think that my using the
term with a more general meaning would cause such a reaction.

It has a very strict, well-defined meaning in these contexts,
especially in languages such as C++.

So you're saying that I said that "Primitive constructs are references".
Right...

No, still wrong. What I said is correct, "References are a form of
primitive construct". In C, an int is a primitive but not a
reference. An int* is a pointer (reference), and is also
(essentially) a primitive.

?

Assignment to a C++ reference (T&) effects the underlying object, not
the reference itself. A reference can never be reseated once it is
bound to an object. Comparing equality on two references directly is
the same as comparing two values (it calls operator==). Comparing
identity requires doing (&x == &y), like one would do with a value.
However, unlike a value, the object is not destroyed when the
reference goes out of scope. Most importantly, references to base
classes do not slice derived class objects, so virtual calls work
correctly through references.

As a result, normally the right way to think about a value is as a
"temporary name" for an object and not worry about any of the details
about how the language makes it work.

Sure, but then it's illegal to allow the usage of '==' with floating
point numbers, which will never have these properties in any usable
implementation[1].

Click to expand...

???

The operator == is called the equality operator. Floating-point
numbers don't really obey those properties in any meaningful fashion.
The result is that portions of your view contradict others. Either we
must give '==' a different name, meaning what you consider equality is
irrelevant, or we must use method names like 'equals', which you find
objectionable.

You misunderstood what I said. You wouldn't treat variables as if they
were values because you wouldn't even know what that means and that
that's even a possibility.

Well, one hopes that is true. I think we have a misunderstanding over
language: you said "value and reference semantics" when you really
meant "value vs. reference semantics".

I've never heard an old C programmer talk about "value semantics" and
"reference semantics". When everything is a value, your world is pretty
simple.

Except if that were true, the comp.lang.c FAQ wouldn't have this
question and answer: http://c-faq.com/ptrs/passbyref.html, and several
others.

Much as you may not like it, most code doesn't care about a pointer's
value, doesn't need to know anything about it, and would just as soon
pretend that it doesn't exist. All it really wants is a controlled
way to mutate objects in different scopes. Which is precisely why
references are preferred over pointers in C++, as they're a better
expression of programmer intent, and far safe as a result.

Peaking under the covers in an attempt to simplify the definition of
'==' is silly. As I've hopefully shown by now, it's pretty much a
fool's errand anyway.

Will you pay me for my time?

No, because you'll never finish. There's a reason why std::fstream
lacks one, and it isn't because the committee is lazy.

Your problem is that you think that copy semantics requires real
copying. I really don't see any technical difficulty in virtualizing the
all thing.

Then you would have written one already. They do require real
copying, though.

Adam

OKB (not okblacke) · Apr 28, 2012

Adam said:
Yes, there is a way. You add a function deref() to the language.

This is getting pretty absurd. By that logic you could say "With
Python, you can end all life on earth! You just add a function to
the language called nuclear_winter() that remotely accesses warhead
launch sites in the US and Russia, enters the appropriate launch codes,
and launches the entire nuclear arsenal!"

--
--OKB (not okblacke)
Brendan Barnwell
"Do not follow where the path may lead. Go, instead, where there is
no path, and leave a trail."
--author unknown

John Nagle · Apr 29, 2012

Mathematics is more than arithmetics with real numbers. We can use FP
too (we actually do that!). We can say that NaN = NaN but that's just an
exception we're willing to make. We shouldn't say that the equivalence
relation rules shouldn't be followed just because *sometimes* we break
them.

If you do a signaling floating point comparison on IEEE floating
point numbers, you do get an exception. On some FPUs, though,
signaling operations are slower. On superscalar CPUs, exact
floating point exceptions are tough to implement. They are
done right on x86 machines, mostly for backwards compatibility.
This requires an elaborate "retirement unit" to unwind the
state of the CPU after a floating point exception. DEC Alphas
didn't have that; SPARC and MIPS machines varied by model.
ARM machines in their better modes do have that.
Most game console FPUs do not have a full IEEE implementation.

Proper language support for floating point exceptions varies
with the platform. Microsoft C++ on Windows does support
getting it right. (I had to deal with this once in a physics
engine, where an overflow or a NaN merely indicated that a
shorter time step was required.) But even there, it's
an OS exception, like a signal, not a language-level
exception. Other than Ada, which requires it, few
languages handle such exceptions as language level
exceptions.

John Nagle

Albert van der Horst · May 1, 2012

If I can't predict the output of

print (20+30 is 30+20) # check whether addition is commutative
print (20*30 is 30*20) # check whether multiplication is commutative

by just reading the language definition and the code, I'd have to say
"is" is ill-defined.

The output depends whether the compiler is clever enough to realise
that the outcome of the expressions is the same, such that only
one object needs to be created.

What is ill here is the users understanding of when it is appropriate
to use "is". Asking about identity of temporary objects fully
under control of the compiler is just sick.

Groetjes Albert

All CRUD operations work except POST. Why?	2	May 28, 2023
Why is this WordPress comments form not submitting?	1	Jan 12, 2020
What version of glibc is Python using?	10	Oct 12, 2013
Php combine identical lines in text file	4	Oct 11, 2023
Drawing missing in bitmap in a pure C win32 program	4	Jun 3, 2023
is list comprehension necessary?	15	Oct 26, 2010
copy on write	42	Jan 13, 2012
No matter what I do, IDLE will not work...	7	Nov 10, 2011

why () is () and [] is [] work in other way?

Steven D'Aprano

Adam Skutt

Adam Skutt

Steven D'Aprano

Adam Skutt

Adam Skutt

Steven D'Aprano

Adam Skutt

Adam Skutt

OKB (not okblacke)

John Nagle

Albert van der Horst

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads