Explanation of list reference

R

Rustom Mody

Can you give an example of an ambiguous case? Fundamentally, the 'is'
operator tells you whether its two operands are exactly the same
object, nothing more and nothing less, so I assume your "ambiguous
cases" are ones where it's possible for two things to be either the
same object or two indistinguishable ones.

Fundamentally your definition above is circular: In effect
the python expr "a is b" is the same as a is b.

The only way to move ahead on that circularity is to 'leak-out'
the under-belly of python's object-model.

My own preference: No is operator; only id when we deliberately need to
poke into the implementation.

Of course I am in a miniscule minority I guess on that :)
 
C

Chris Angelico

Fundamentally your definition above is circular: In effect
the python expr "a is b" is the same as a is b.

It's not circular, it's stating the definition of the operator. And
since the definition is so simple, it's impossible - at that level -
for it to be ambiguous. It's possible for equality to be ambiguous, if
you have two types which define __eq__:

class Everyone:
def __eq__(self, other): return True
class Noone:
def __eq__(self, other): return False
False

But it's not possible for 'is' to behave like that.

ChrisA
 
R

Rustom Mody

It's not circular, it's stating the definition of the operator. And
since the definition is so simple, it's impossible - at that level -
for it to be ambiguous. It's possible for equality to be ambiguous, if
you have two types which define __eq__:

At what level can you explain the following?

As against


"Interning" you will say.
Is interning a simple matter for example at the level of questioning of the OP?
 
R

Rustom Mody

Well, for a start, I'd use Python 3, so there's no need to explain why
some numbers have an L after them :)

Nice point!
And only sharpens what I am saying -- python 3 is probably more confusing than
2 wrt object identity
When it's utterly impossible for it to matter in any way, Python is
allowed to reuse objects.
I think that's simple enough to explain. There's nothing you can do to
distinguish one 6 from another, so Python's allowed to have them the
same.
Simple??
True

-------------
"utterly impossible to matter"...
"nothing you can do to distinguish one 6 from another"

All depend on doing one of these 3 for dealing with object identity
1. Circular definition
2. Delve into implementation
3. Wildly wave the hands

As a teacher Ive done more than my fair share of all especially 3 but if
you have a 4th option Id be interested to know!

Philosophically"to be" called the copula is such a knotty problem that there is
an entire movement to create a version of English without any form of
"is,are,be" etc

http://en.wikipedia.org/wiki/E-Prime
 
C

Chris Angelico

Nice point!
And only sharpens what I am saying -- python 3 is probably more confusing than
2 wrt object identity

How so? Py3 eliminates an unnecessary difference:
False

In Py3, this can't happen, because there simply is no distinction.

(That said, the Py3 unification does mean that small integers pay the
performance cost of long integers. I've mentioned before that it may
be worth having an "under the covers" optimization whereby small
integers are stored in a machine word - but this should be utterly
invisible to the user. As far as the programmer's concerned, an int is
an int is an int.)

In all three cases, Python is allowed to use separate objects. Nothing
forces them to be shared. But in all three cases, there's no way you
could distinguish one from another, so Python's allowed to reuse the
same object.
"utterly impossible to matter"...
"nothing you can do to distinguish one 6 from another"

All depend on doing one of these 3 for dealing with object identity
1. Circular definition
2. Delve into implementation
3. Wildly wave the hands

How do you distinguish between any other identical things? Once you've
decided that they're equal, somehow you need to separate identity from
value. I could have three six-sided dice, all made from the same
mould, and yet each one is a separate object. If I hold all three in
my hand and toss them onto the table, can I recognize which one is
which? No, they're identical. Are they distinct objects? Yes.

ChrisA
 
R

Rustom Mody

How so? Py3 eliminates an unnecessary difference:

In Py3, this can't happen, because there simply is no distinction.
(That said, the Py3 unification does mean that small integers pay the
performance cost of long integers. I've mentioned before that it may
be worth having an "under the covers" optimization whereby small
integers are stored in a machine word - but this should be utterly
invisible to the user. As far as the programmer's concerned, an int is
an int is an int.)

In all three cases, Python is allowed to use separate objects. Nothing
forces them to be shared. But in all three cases, there's no way you
could distinguish one from another, so Python's allowed to reuse the
same object.
How do you distinguish between any other identical things? Once you've
decided that they're equal, somehow you need to separate identity from
value.

Implicit circularity problem continues... See below
I could have three six-sided dice, all made from the same
mould, and yet each one is a separate object. If I hold all three in
my hand and toss them onto the table, can I recognize which one is
which? No, they're identical. Are they distinct objects? Yes.

In the case of physical objects like dice there is a fairly
unquestionable framing that makes identity straightforward --
4-dimensional space-time coordiantes. If the space-time coordinates of
2 objects are all equal then the objects are identical, else not.

Now we analogize the space-time identity of physical objects to
computer identity of computer objects (so-called) and all sorts of
problems ensue.

To start with we say two objects are identical if they have the same
memory address.
Then what happens to the same memory address on different computers?
If you say nothing on two different computers are identical then how do you
define the correctness of a serialization protocol?

And is 'different computer' even well-defined? Think of clusters, COWs NOWs,
and other beasties ending in the cloud...

IOW when we analogize 4-dim infinite space-time into the confines of
'a computer' weve bought bigger problems than we disposed off, because

- for space-time it is unreasonable to imagine a larger frame into which that is
embedded

- for computers that larger frame is a key part -- starting with the fact
that you and I are having a conversation right now

tl;dr Analogizing physical objects to computer 'objects' is a mistake
 
I

Ian Kelly

In the case of physical objects like dice there is a fairly
unquestionable framing that makes identity straightforward --
4-dimensional space-time coordiantes. If the space-time coordinates of
2 objects are all equal then the objects are identical, else not.

Now we analogize the space-time identity of physical objects to
computer identity of computer objects (so-called) and all sorts of
problems ensue.

To start with we say two objects are identical if they have the same
memory address.

This is false. It happens to hold for CPython, but that's an
implementation detail. The definition of object identity does not
depend on memory address. It also doesn't have anything to do with
space-time coordinates. The concept of object identity is an
abstraction, not an analogy from physics.

The language reference states, "Every object has an identity, a type
and a value. An object's identity never changes once it has been
created; you may think of it as the object's address in memory."
Okay, so that quote does bring up memory address, but in my
interpretation that's just an analogy to introduce the concept. The
more important part of that sentence is the first part, which ties an
object's identity to its creation. If two objects share the same
creation, then they're the same object.
 
C

Chris Angelico

This is false. It happens to hold for CPython, but that's an
implementation detail. The definition of object identity does not
depend on memory address. It also doesn't have anything to do with
space-time coordinates. The concept of object identity is an
abstraction, not an analogy from physics.

With the restrictions of computer memory, I suspect that two objects
with the same address must be identical, simply because it's not
possible for it to be otherwise. However, the converse isn't
necessarily true; two objects may have the same identity while being
at different addresses (or, more likely, one object may occupy
different memory addresses over time, if the gc moves it around). But
since memory addresses are completely inaccessible to Python code, we
can't say whether two objects have the same address.

ChrisA
 
S

Steven D'Aprano

You've got it backwards. In Python, /everything/ is a reference.

What's a reference?

How is the value 23 a reference? What is it a reference to?

The
variable is just a "pointer" to the actual value. When you change a
variable, you're just changing the memory location it points to.

What do memory locations have to do with Python code? When I execute
Python code in my head, perhaps using a pencil and paper, or build a
quantum computer (or analog clockwork device) to execute Python code,
where are the memory locations?

I think you are conflating the *implementation* of Python's virtual
machine in a C-like language written for a digital computer with the
*defined behaviour* of the Python virtual machine. If you think about the
Python execution model, there is almost nothing about memory locations in
it. The only exception I can think of is the id() function, which uses
the memory address of the object as the ID, and even that is *explicitly*
described as an implementation detail and not a language feature. And in
fact Jython and IronPython assign IDs to objects consecutively from 1,
and PyPy has to go through heroic and complicated measures to ensure that
objects have the same ID at all times.

Thinking about the implementation of Python as written for certain types
of digital computing devices can be useful, but we must be very careful
to avoid mixing up details at the underlying C (or Java, or Haskell, or
Lisp, or ...) layer with questions about the Python execution model.

As soon as you mention "pointers", you're in trouble, because Python has
no pointers. There is nothing in Python that will give you a pointer to
an object, or dereference a pointer to get the object at the other end.
Pointers in the sense of C or Pascal pointers to memory addresses simply
don't have any existence in Python. Python compilers can even be written
in languages like Java that don't have pointers. The fact that the C
implementation of Python uses pointers internally is not very
interesting, any more than the fact that a Python implementation running
on a Turing Machine would use a pencil and eraser that can draw marks on
a very long paper tape.

Strings, ints, tuples, and floats behave differently because they're
/immutable/. That means that they CANNOT modify themselves. That's why
all of the string methods return a new string. It also means that, when
you pass one two a function, a /copy/ of it is made and passed instead.

Yes, strings etc. are immutable, but no, they are not copied when you
pass them to a function. We can be sure of this for two reasons:

(1) We can check the id() of the string from the inside and the outside
of the function, and see that they are the same; and

(2) We can create a HUGE string, hundreds of megabytes, and pass it to
dozens of functions, and see no performance slowdown. It might take a
second or five to build the initial string, and microseconds or less to
pass it to function after function after function.

So, back to the original subject. Everything is a reference.

To really under stand Python's behaviour, we need to see that there are
two kinds of entities, names and values. Or another way to put it,
references and objects. Or another way to put it, there's actually only
one kind of thing in Python, that is, everything in Python is an object,
but Python *code* can refer to objects indirectly by names and other
references. Names aren't "things", but the things that names refer to are
things.

Objects have a clear definition in the Python world: they are an entity
that has a type (e.g. a string), a set of behaviour (methods), and a
value ("Hello World").

References can be names like `mystring`, or list items `mylist[0]`, or
items in mappings `mydict["key"]`, or attributes `myobject.attr`, or even
expressions `x+y*(1-z)`. References themselves aren't "things" as such
(although in Python, *names* are implemented as string keys in
namespaces), but a way to indirectly refer to things (values, objects).

When you do this:

x = [1,2,3]
x = [4,5,6]

x now points to a different memory location.

Memory locations are irrelevant. Objects may not even have a single, well-
defined memory location. (If you think this is impossible, you're
focusing too much on a single computer architecture.) They might use some
sort of copy-on-write mechanism so that objects don't even exist until
you modify them. Who knows?

Instead, we should say that x now refers to a different object.

An analogy, the name "President of the United States" stopped referring
to George Bush Jr and started referring to Barack Obama a few years back,
but the "objects" (people) have an existence separate from the name used
to refer to them.

(By the way, I try to avoid using the term "points to" if I can, since it
has connotations to those familiar with C which don't apply to Python.)

And, when you do this:

x[0] =99000
x[0] =100

you're just changing the memory location that |x[0]| points to.

Again, I'd say that x[0] now refers to a different object.
 
R

Rustom Mody

This is false. It happens to hold for CPython, but that's an
implementation detail. The definition of object identity does not
depend on memory address. It also doesn't have anything to do with
space-time coordinates. The concept of object identity is an
abstraction, not an analogy from physics.
The language reference states, "Every object has an identity, a type
and a value. An object's identity never changes once it has been
created; you may think of it as the object's address in memory."
Okay, so that quote does bring up memory address, but in my
interpretation that's just an analogy to introduce the concept. The
more important part of that sentence is the first part, which ties an
object's identity to its creation. If two objects share the same
creation, then they're the same object.

Whats the notion of object identity is the question.
Ok so you reject the memory addr as an 'implementation detail'
Then you are obliged to provide some other way of understanding object-identity


With the restrictions of computer memory, I suspect that two objects
with the same address must be identical, simply because it's not
possible for it to be otherwise. However, the converse isn't
necessarily true; two objects may have the same identity while being
at different addresses (or, more likely, one object may occupy
different memory addresses over time, if the gc moves it around). But
since memory addresses are completely inaccessible to Python code, we
can't say whether two objects have the same address.

Nice point!
I earlier talked of the macro problems of identity, viz across machines.
You are bringing up a more 'micro' angle, viz gc.
An even more micro (or lower level) example would be the mismatch between
physical and virtual memory, dram and cache etc etc.
Is memory such a clear concept?

Just different examples to show that object identity is anything but
straightforward
 
S

Steven D'Aprano

Yes, sometimes for teaching reasons, you have to over-simplify or even
introduce artificial constructs. I'd recommend acknowledging them as
such.

The mathematician Ian Stewart and biologist Jack Cohen call these "lies
for children". They don't mean it as a pejorative. In fact, calling them
"lies for children" is itself a lie for children :)

(Lies for children are not lies, nor are they just for children.)
 
S

Steven D'Aprano

My own preference: No is operator; only id when we deliberately need to
poke into the implementation.

Of course I am in a miniscule minority I guess on that :)

If I have understood you, I think that's a poor way of looking at it. We
can have an `is` operator to determine whether or not two objects are the
same, without having a concept of object IDs; but you can't have a
concept of object IDs without having distinct objects. `is` is more
fundamental than id().

IDs are a proxy for distinct objects. If you live in a country with an ID
card of some sort, then the IDs acts as an identifier for each unique
individual. But countries without ID cards don't lack unique individual
people.
 
C

Chris Angelico

Nice point!
I earlier talked of the macro problems of identity, viz across machines.
You are bringing up a more 'micro' angle, viz gc.
An even more micro (or lower level) example would be the mismatch between
physical and virtual memory, dram and cache etc etc.
Is memory such a clear concept?

Just different examples to show that object identity is anything but
straightforward

Not really; they just show that object identity is nothing to do with
memory location. An object is itself, and is not anything else, and
neither of those truisms has anything to do with memory. I could
implement Python using a pencil and paper, using physical pieces of
string to create references; if a gust of wind blows all the paper
around, it won't change anything. (Though it might be a problem if I
have any weak references...) You could walk up to me and look at my
pieces of paper, and you'll know if two strings are linking to the
same paper; it's obvious that that paper is the same thing as itself.

ChrisA
 
R

Rustom Mody

If I have understood you, I think that's a poor way of looking at it. We
can have an `is` operator to determine whether or not two objects are the
same, without having a concept of object IDs; but you can't have a
concept of object IDs without having distinct objects. `is` is more
fundamental than id().


Pick is or id. Matters little
To define either you need the notion of 'same'
IDs are a proxy for distinct objects. If you live in a country with an ID
card of some sort, then the IDs acts as an identifier for each unique
individual. But countries without ID cards don't lack unique individual
people.

With humans like Chris' dice the notion of 'same' is unexceptionable
[well sci-fi, teleportation etc aside :) ]

To define is or id or same you need to
1 use machine details
or
2 do mathematics
or
3 wave our hands -- Dont we all know that same is same? <Wild Wave>

My bet is that python's id/is cannot be defined with 2 so we use a
combo of 1 and 3
 
R

Rustom Mody

Rustom Mody writes:
How about: Every object has an identity, which is unique among all
concurrently-existing objects. The 'is' operator queries whether two
references are referring to objects with the same identity, which
implies they are actually referring to the same object.
Is that sufficient?

Are you explaining or defining identity?

As an explanation its ok though a bit tautologous
As a definition its circular
[Just for context remember the OP -- a noob father-son duo confused by
python's memory model]
Python doesn't make any promises about object identity beyond the
current run-time of a single instance of a program. So while the problem
you describe is interesting, it isn't relevant when talking about Python
object identity.

Hard as a nail the problem persists -- Non-promise of identity implies we
understand it!!
 
S

Steven D'Aprano

In the case of physical objects like dice there is a fairly
unquestionable framing that makes identity straightforward --
4-dimensional space-time coordiantes. If the space-time coordinates of 2
objects are all equal then the objects are identical, else not.

That simply is not correct.

(1) General relativity tells us that not all observers will agree on the
space-time coordinates of two objects, since not all observers agree on a
single frame of reference.

(2) Quantum mechanics tells us that objects are not located at a single
space-time coordinate. Objects are "smeared out" over space (and time).
We cannot really talk about the location of an object, but only about the
probability of a measurement registering the object at a certain location.

Now we analogize the space-time identity of physical objects to computer
identity of computer objects (so-called) and all sorts of problems
ensue.

To start with we say two objects are identical if they have the same
memory address.

Not quite. We say that two objects are identical (that is, that they
actually are the same object) if they exist simultaneously, in the same
process, and have the same ID. Normally we don't bother to say that they
must be in the same process, as that is implied by our understanding of
how computers normally work.

In fact, even the above is a simplification, since processes may share
certain objects. This is why Python prohibits Ruby-style monkey-patching
of built-in types. If we could add or remove methods from (say) lists,
that could affect more than one Python process.

But ignoring complications like that, we can say that for a wide range of
implementations, the condition "two objects exist at the same time and
have the same ID" is equivalent to saying that they exist at the same
time with the same memory address.

Then what happens to the same memory address on different computers? If
you say nothing on two different computers are identical then how do you
define the correctness of a serialization protocol?

I would not expect that the None object running on my computer is the
same None object running on another computer. None is a singleton *within
a single running process*, not over the entire universe of Python virtual
machines.

And is 'different computer' even well-defined? Think of clusters, COWs
NOWs, and other beasties ending in the cloud...

But for all of these things, we can create the abstraction of "a single
process running at a single moment". True, observers in orbit around a
black hole a thousand light-years away will disagree about that single
moment, but we're not talking to them and don't care what they think.

IOW when we analogize 4-dim infinite space-time into the confines of 'a
computer' weve bought bigger problems than we disposed off, because

- for space-time it is unreasonable to imagine a larger frame into which
that is embedded

- for computers that larger frame is a key part -- starting with the
fact that you and I are having a conversation right now

tl;dr Analogizing physical objects to computer 'objects' is a mistake

Over-philosophising abstractions without the sanity check of what happens
in real life is an even bigger mistake.

You may or may not choose to lie awake at night obsessing whether
changing a tyre on your car makes it a different car (see the paradox of
the Ax of my Grandfather), but the rest of us are too busy actually
programming to care about such edge cases. They make good discussions for
a philosophy class, but 99.999% of the time, the abstraction of computer
objects being like physical objects is a good one.

If you want to demonstrate the fact that this abstraction sometimes
leaks, you don't need to concern yourself with cloud computing, shared
process memory space, or any such thing. You just need to recognise that
objects can contain themselves:

py> L = []
py> L.append(L)
py> L in L
True


Of course, if you are a fan of the Doctor Who television show, you won't
be concerned by this. If the TARDIS can materialise inside itself, then
there isn't any problem with lists containing themselves either.
 
S

Steven D'Aprano

References can be names like `mystring`, or list items `mylist[0]`, or
items in mappings `mydict["key"]`, or attributes `myobject.attr`, or
even expressions `x+y*(1-z)`.

I agree with most of what you've said, but I'm not sure I like that last
bit. The expression evaluates to an object, yes, but it's not itself a
reference... is it?

[snip discussion]

You may be right. I will have to think about it a little more. Or a lot
more. Ah wait, I got it: I chose a bad example for the expression. Here
is a better one:

myobj.alist[12]["some key"].attribute


I think it is fair to call that both an expression and a reference.
 
S

Steven D'Aprano

On Sat, 15 Feb 2014 15:37:20 +1100, Chris Angelico wrote:

[...]
This is why dice exist in a variety of colors [1]. Indistinguishable yet
distinct dice...

Since they have different colours, they can be distinguished and aren't
indistinguishable.

One might also distinguish three dice by position in space ("the die
closest to the wall counts as the HUNDREDS digit, the one closest to me
counts as the TENS digit, and the one closest to you counts as the UNITS
digit") or by space ("roll this die first for the HUNDREDS, then roll
that one for TENS, then this third one for UNITS"). Or scent, or texture,
or the typeface used for the numbers.
 
C

Chris Angelico

References can be names like `mystring`, or list items `mylist[0]`, or
items in mappings `mydict["key"]`, or attributes `myobject.attr`, or
even expressions `x+y*(1-z)`.

I agree with most of what you've said, but I'm not sure I like that last
bit. The expression evaluates to an object, yes, but it's not itself a
reference... is it?

You may be right. I will have to think about it a little more. Or a lot
more. Ah wait, I got it: I chose a bad example for the expression. Here
is a better one:

myobj.alist[12]["some key"].attribute

I think it is fair to call that both an expression and a reference.

Sure. If it helps, the part that's a reference is the ".attribute" at
the end; the rest is an expression which determines what object's
attribute you're looking at. Same with the earlier parts; the [12] is
a form of reference (albeit one that requires another object). But
yes, this is an expression, and it evaluates to a reference.

In any case, your main point is still valid: each of those forms will
yield a reference to something. (Aside from the possibility of raising
KeyError. As an expression, it has to yield a reference to an object.)

ChrisA
 
S

Steven D'Aprano

This is false. It happens to hold for CPython, but that's an
implementation detail. The definition of object identity does not
depend on memory address. It also doesn't have anything to do with
space-time coordinates. The concept of object identity is an
abstraction, not an analogy from physics.

Correct.

CPython does not move objects around in memory during their lifespan, so
CPython can reuse the memory address as the ID. Jython and IronPython do
move objects around, so they cannot use memory addresses as IDs. Instead
they number each object sequentially.

PyPy is even more complicated. Objects, like particles in quantum
mechanics, can appear and disappear from existence between observations
as the optimizing compiler does its work. For example, a list of Python
float objects may be transparently converted into an array of machine
doubles, then turned back into a Python object when you try to access it
again. The PyPy compiler has to take great care to ensure that the list
(and the floats!) get the same IDs before and after, since that is
defined behaviour in the language spec.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,079
Messages
2,570,573
Members
47,205
Latest member
ElwoodDurh

Latest Threads

Top