What other languages use the same data model as Python?

Hans Georg Schaathun · May 9, 2011

Let me save you from guessing. I'm thinking of a piece of paper with
: a little box on it and the name 'a' written beside it. There is an
: arrow from that box to a bigger box.
:
: +-------------+
: +---+ | |
: a | --+---------------->| |
: +---+ | |
: +-------------+

The flaw of this model, and I am not discounting its merits, just
pointing out that it isn't perfect, is that it creates the illusion
that references are boxes (objects) just like data objects, leading
the reader to think that we could have a reference to a reference.
If they are all boxes, by can't we make reference thereto?

Chris Angelico · May 9, 2011

The flaw of this model, and I am not discounting its merits, just
pointing out that it isn't perfect, is that it creates the illusion
that references are boxes (objects) just like data objects, leading
the reader to think that we could have a reference to a reference.
If they are all boxes, by can't we make reference thereto?

http://www.xkcd.com/895/

Analogies are like diagrams. Not all of them are perfect or useful.

The boxes are different sizes. If you really want them to look
different, do one as squares and one as circles, but don't try that in
plain text.

Chris Angelico

Steven D'Aprano · May 9, 2011

Steven said:
Steven said:

Since you haven't explained what you think is happening, I can only
guess.

Click to expand...

Let me save you from guessing. I'm thinking of a piece of paper with a
little box on it and the name 'a' written beside it. There is an arrow
from that box to a bigger box.

+-------------+
+---+ | |
a | --+---------------->| |
+---+ | |
+-------------+

There is another little box labelled 'b'. After executing 'a = b', both
little boxes have arrows pointing to the same big box. [...]
In this model, a "reference" is an arrow. Manipulating references
consists of rubbing out the arrows and redrawing them differently.

All very good, but that's not what takes place at the level of Python
code. It's all implementation. I think Hans Georg Schaathun made a good
objection to the idea that "Python has references":

In Pascal a pointer is a distinct data type, and you can
have variables of a given type or of type pointer to that
given type. That makes the pointer a concrete concept
defined by the language.

The same can't be said of "references" in Python. It's not part of Python
the language, although it might be part of Python's implementation.

Also
in this model, a "variable" is a little box. It's *not* the same thing
as a name; a name is a label for a variable, not the variable itself.

That's essentially the same model used when dealing with pointers. I've
used it myself, programming in Pascal. The "little box" named a or b is
the pointer variable, and the "big box" is the data that the pointer
points to.

It's not an awful model for Python: a name binding a = obj is equivalent
to sticking a reference (a pointer?) in box a that points to obj.
Certainly there are advantages to it.

But one problem is, the model is ambiguous with b = a. You've drawn
little boxes a and b both pointing to the big box (which I deleted for
brevity). But surely, if a = 1234 creates a reference from a to the big
box 1234, then b = a should create a reference from b to the box a?

+-------------+
+---+ | |
a | --+---------------->| |
+---+ | |
^ +-------------+
|
+-|-+
b | | |
+---+

which is the reference (pointer) model as most people would recognise it.
That's how it works in C and Pascal (well, at least with the appropriate
type declarations). To get b pointing to the big box, you would need an
explicit dereference: "b = whatever a points to" rather than "b = a".

Of course, both of these concepts are models, which is another word for
"lies" *wink*. Your model is closer to what the CPython implementation
actually does, using actual C pointers, except of course you do need to
dereference the pointers appropriately. One of my objections to it is not
that it is wrong (all models are wrong) but that it will mislead some
people to reason incorrectly about Python's behaviour, e.g. that b now
points to the little box a, and therefore if you change what a points to,
b will follow along. The whole "call by reference" thing. I suppose you
might argue that you're not responsible for the misunderstandings of
blinkered C coders *wink*, and there's something to that.

But there's another objection... take, say, the line of Python code:

n = len('hello world')

I can identify the little box "n", which ends up pointing to the big box
holding int 11; another little box "len", which points to a big box
holding a function; and a third big box holding the string 'hello world'.
But where is its little box?

If len were written in pure Python, then *inside* len's namespace there
would be a local little box for the argument. I expect that there is an
analogous local little box for built-in functions too. But I don't care
what happens inside len. What about outside len? Where's the little box
pointing to 'hello world'?

So it seems your model fails to deal with sufficiently anonymous objects.
I say "sufficiently", because of course your model deals fine with
objects without names inside, say, lists: the little box there is the
list slot rather than a named entry in a namespace.

It's not just literals that your model fails to deal with, it's any
expression that isn't bound to a little box:

n = len('hello world') + func(y)

func(y) produces a new object, a big box. Where is the little box
pointing to it?

If we drop down an abstraction layer, we can see where objects live:
1 0 LOAD_NAME 0 (len)
3 LOAD_CONST 0 ('hello world')
6 CALL_FUNCTION 1
9 LOAD_NAME 1 (func)
12 LOAD_NAME 2 (y)
15 CALL_FUNCTION 1
18 BINARY_ADD
19 STORE_NAME 3 (n)
22 LOAD_CONST 1 (None)
25 RETURN_VALUE

Both the call to len and the call to func push their results onto the
stack. There's no little box pointing to the result. There's no little
box pointing to len, or y, or any of the others: there are just names and
abstract operations LOAD_NAME and friends.

Again, this is just an abstraction layer. Python objects can be huge,
potentially hundreds of megabytes or gigabytes. No way are they being
pushed and popped onto a stack, even if the virtual machine gives the
illusion that they are. For practical reasons, there must be some sort of
indirection. But that's implementation and not the VM's model.

It seems that you would prefer to eliminate the little boxes and arrows
and write the names directly beside the objects:

+-------------+
a | |
| |
b | |
+-------------+

+-------------+
c | |
| |
| |
+-------------+

That's not a bad model. But again, it's incomplete, because it would
suggest that the big box should be able to read its own name tags, which
of course is impossible in Python. But I suppose one might say, if the
tag is on the outside of the object, the object can't use introspection
to see it, can it?

But note that this is really your model in disguise: if you imagine the
name tags are stuck on with a little piece of blutack, and you carefully
pull on the name and stretch it away, you get a name sitting over here
with a tiny thread of blutack attaching it to the big box over there,
just like in your model.

I actually prefer to keep the nature of the mapping between name and
object abstract: names map to objects. Objects float around in space,
wherever the interpreter happens to put them, and you can optionally give
them names. Objects are dumb and don't know their own name, but the
Python interpreter knows the names. Names are not the only way to ask the
interpreter for an object: e.g. you can put them in a list, and ask for
them by position.

If people then ask, how does the interpreter know the names?, I can add
more detail: names are actually strings in a namespace, which is usually
nothing more than a dict. Oh, and inside functions, it's a bit more
complicated still. And so on.

There is a problem with my model of free-floating objects in space: it
relies on objects being able to be in two places at once, even *inside*
themselves (you can append a list to itself). If you hate that concept,
you'll hate my model. But if you're a science fiction fan from way back,
then you won't have any problem with the idea that objects can be inside
themselves:

Remember: it's just a model, and all models are lies. Abstractions all
leak. You can only chose where and how badly they break down.

Now, that's a good challenge for your model. Little boxes only point to
big boxes. So how do you model cycles, including lists that contain
themselves?

But what would you do about lists? With little boxes and arrows, you can
draw a diagram like this:

+---+ +---+
a | --+----->| | +-------------+
+---+ +---+ | |
| --+----->| |
+---+ | |
| | +-------------+
+---+

(Here, the list is represented as a collection of variables. That's why
variables and names are not the same thing -- the elements of the list
don't have textual names.)

But that's wrong! Names (little boxes) can't point to *slots in a list*,
any more than they can point to other names! This doesn't work:

L = [None, 42, None]
a = L[0]
L[0] = 23
print(a) # This doesn't work!

Click to expand...

Click to expand...

23

It's a pity that Python doesn't actually have references. Imagine how
much time we'd save: all the unproductive time we spend arguing about
whether Python has references, we could be fixing bugs caused by the use
of references instead...

But without any little boxes or arrows, you can't represent the list
itself as a coherent object. You would have to go around and label
various objects with 'a[0]', 'a[1]', etc.

+-------------+
a[0] | |
| |
| |
+-------------+

+-------------+
a[1] | |
| |
| |
+-------------+

This is not very satisfactory. If the binding of 'a' changes, you have
to hunt for all your a labels, rub them out and rewrite them next to
different objects. It's hardly conducive to imparting a clear
understanding of what is going on, whereas the boxes-and-arrows model
makes it instantly obvious.

But I wouldn't do it like that. I'd do it like this:

0 1 2 3 4
+--------+--------+--------+--------+--------+
a | | | | | |
| | | | | |
| | | | | |
+--------+--------+--------+--------+--------+

which conveniently happens to be the way Python lists actually are set
up. More or less.

[...]

Finally, there's another benefit of considering a reference to be a
distinct entity in the data model. If you think of the little boxes as
being of a fixed size, just big enough to hold a reference, then it's
obvious that you can only bind it to *one* object at a time. Otherwise
it might appear that you could draw more than one arrow coming out of a
name, or write the same name next to more than one object.

Click to expand...

But that's pretty much an arbitrary restriction. Why are the boxes so
small? Just because. Why can't you carefully tease the thread of blutack
apart, into a bifurcated Y shaped thread? Just because.

If objects can be in two places at once, why can't names? Just because.

(Actually, because Guido decrees it so. Also because it would be kinda
crazy to do otherwise. I can't think of any language that has a many:many
mapping between names and values.)

Tim Golden · May 9, 2011

On 09/05/2011 15:29, Steven D'Aprano wrote:

[... snippage galore ...]

Slightly abstract comment: while I don't usually get much
enjoyment out of the regular "Python is call-by-value; no
it isn't; yes it is" debates, I always enjoy reading
Steven D'Aprano's responses.

Thanks, Mr D'A.

TJG

Ethan Furman · May 9, 2011

Steven said:
But that's wrong! Names (little boxes) can't point to *slots in a list*,
any more than they can point to other names! This doesn't work:

--> L = [None, 42, None]
--> a = L[0]
--> L[0] = 23
--> print(a) # This doesn't work!
23

Minor nitpick -- having a comment saying "this doesn't work" then having
output showing that it does is confusing. I had to load up the
interpretor to make sure I was confused!

~Ethan~

Mel · May 9, 2011

Steven said:
It's not an awful model for Python: a name binding a = obj is equivalent
to sticking a reference (a pointer?) in box a that points to obj.
Certainly there are advantages to it.

But one problem is, the model is ambiguous with b = a. You've drawn
little boxes a and b both pointing to the big box (which I deleted for
brevity). But surely, if a = 1234 creates a reference from a to the big
box 1234, then b = a should create a reference from b to the box a?

There's a way around that too. Describe literals as "magic names" or
"Platonic names" that are bound to objects in ideal space. I actually
considered that for a while as a way of explaining to newbs why the
characters in a string literal could be different from the characters in the
string value. This would probably have troubles of its own; I never took it
through the knock-down drag-out disarticulation that would show what the
problems were.

Mel.

Terry Reedy · May 9, 2011

If people then ask, how does the interpreter know the names?, I can add
more detail: names are actually strings in a namespace, which is usually
nothing more than a dict. Oh, and inside functions, it's a bit more
complicated still. And so on.

Which is why I think it best to stick with 'A namespace is a many-to-one
mapping (in other words, a function) of names to objects'. Any
programmer should understand the abstractions 'mapping' and 'function'.
Asking how the interpreter finds the object associated with a name
amounts to asking how to do tabular lookup. Well, we basically know,
though the details depends on the implementation of the table (mapping).

An interpreter can *implement* namespaces various ways. One is to
objectify names and namespaces as strings and dicts. If the set of names
in a namespace is fixed, another way is to objectify names and
namespaces as ints and arrays. Python prohibits 'from x import *' within
functions precisely to keep the set of local namespace names fixed.
Therefore, CPython can and does always use C ints and array for function
local namespaces.

Hans Georg Schaathun · May 9, 2011

Analogies are like diagrams. Not all of them are perfect or useful.
:
: The boxes are different sizes. If you really want them to look
: different, do one as squares and one as circles, but don't try that in
: plain text.

Analogies, even imperfect ones, are good when we are clear about the
fact that they are analogies. Using C pointers to illustrate how to
use bound names in python may be useful, but only if we are clear about
the fact that it is an analogy and do not pretend that it explains it in
full.

harrismh777 · May 9, 2011

Gregory said:
+-------------+
+---+ | |
a | --+---------------->| |
+---+ | |
+-------------+
^
+---+ |
b | --+-----------------------|
+---+

In this model, a "reference" is an arrow. Manipulating references
consists of rubbing out the arrows and redrawing them differently.

Greg, this is an excellent model, thank you for taking the time to
put it together for the list... very helpful.

Both Summerfield and Lutz use the same model (and almost the
identical graphic symbolism) to explain dynamic typing in Python.
Summerfield's Programming in Python 3 2nd ed. has a good explanation
similar, see pages 17 and 32 (there are others). Lutz has an entire
chapter devoted to the topic in Learning Python 4th ed., see chapter
six. He calls it the Dynamic Typing Interlude.

The model is in-the-field and very workable; and yet, it does have
limitations, as most models do. For visual thinkers this model probably
comes closest to being most helpful, esp in the beginning.

kind regards,
m harris

Gregory Ewing · May 10, 2011

Steven said:
It's just that the term "variable" is so useful and so familiar that it's
easy to use it even for languages that don't have variables in the C/
Pascal/Fortran/etc sense.

Who says it has to have the Pascal/Fortran/etc sense? Why
should static languages have a monopoly on the use of the
term? That seems like a rather languagist attitude!

And BTW, applying it to Python is not inconsistent with its
usage in Pascal. In the technical vocabulary of Pascal,
a "variable" is anything that can appear on the left hand
side of an assignment. The analogous term in C is "lvalue".

Gregory Ewing · May 10, 2011

Steven said:
Or Chinese Gooseberries, better known by the name thought up by a
marketing firm, "kiwi fruit".

And I'm told that there is a language (one of the
Nordic ones, IIRC) where "kiwi" means "stone". So in
that country they wonder why they should be getting so
excited about something called a "stonefruit".

MRAB · May 10, 2011

And I'm told that there is a language (one of the
Nordic ones, IIRC) where "kiwi" means "stone". So in
that country they wonder why they should be getting so
excited about something called a "stonefruit".

I had heard something about the meaning of the word "gift", so I
checked in Google Translate. For Swedish "gift" it says:

noun
1. POISON
2. VENOM
3. TOXIN
4. VIRUS

adjective
1. MARRIED
2. WEDDED

Steven D'Aprano · May 10, 2011

Who says it has to have the Pascal/Fortran/etc sense? Why should static
languages have a monopoly on the use of the term? That seems like a
rather languagist attitude!

Established usage. They came first, and outnumber us :/

But I wouldn't quite say they have a monopoly of the term. Where there is
no risk of misunderstanding, it's fine to use the term. Mathematicians'
"variable" is different still, but there's very little risk of
misunderstanding. I'm far less cautious about using "variable" when I'm
talking to you, because I know you won't be confused, than I would be
when talking to a newbie, who may be.

When two people use the same words, but their understanding of them are
slightly different, it's often easier to change the terminology than it
is to break people's preconceptions and connotations.

And BTW, applying it to Python is not inconsistent with its usage in
Pascal. In the technical vocabulary of Pascal, a "variable" is anything
that can appear on the left hand side of an assignment. The analogous
term in C is "lvalue".

Sure, but if you think Python "variables" behave like Pascal "variables",
you'll may be surprised by Python and wonder why integer arguments are
call by value and list arguments are call by reference...

Dennis Lee Bieber · May 10, 2011

I had heard something about the meaning of the word "gift", so I
checked in Google Translate. For Swedish "gift" it says:

noun
1. POISON
2. VENOM
3. TOXIN
4. VIRUS

adjective
1. MARRIED
2. WEDDED

Guess that explains why I'm still single <G> If marriage is related
to poison!

Gregory Ewing · May 10, 2011

Steven said:
All very good, but that's not what takes place at the level of Python
code. It's all implementation.

Actually, you're right. What I've presented is a paper-and-pencil
implementation of the Python data model. Together with a set of
rules for manipulating the diagram under the direction of Python
code, you have a complete implementation of Python that you can
execute in your head.

And you NEED such an implementation in order to reason correctly
about Python programs under all circumstances.

I find it very difficult to imagine *any* implementation of
Python, computer-based or otherwise, that doesn't have something
equivalent to references. Whether you call them that, or pointers,
or arrows, or object bindings, or something else, the concept
needs to be there in some form.

But surely, if a = 1234 creates a reference from a to the big
box 1234, then b = a should create a reference from b to the box a?

+-------------+
+---+ | |
a | --+---------------->| |
+---+ | |
^ +-------------+
|
+-|-+
b | | |
+---+

You can't expect anyone to draw correct conclusions from the
diagram alone; you also need to explain the rules for manipulating
the diagram under control of Python code.

Here, the explanation goes something like this:

1. The right hand side of an assignment denotes a big box. The
literal 1234 in a program denotes a big box containing the integer
value 1234.

2. The left hand side of an assignment denotes a little box. The
effect of an assignment is to make the arrow from the left hand
side's little box point to the big box denoted by the right hand
side.

So the assignment a = 1234 results in

+-------------+
+---+ | |
a | --+---------------->| 1234 |
+---+ | |
+-------------+

3. When a name is used on the right hand side, it denotes whichever
big box is pointed to by the arrow from its little box. So given
the above diagram, the assignment b = a results in

+-------------+
+---+ | |
a | --+---------------->| 1234 |
+---+ | |
+-------------+
^
+---+ |
b | --+---------------------
+---+

Furthermore, from rule 2 alone it's evident that no assignment
can ever make an arrow lead from one little box to another
little box. Arrows can only lead from a little box to a big
box.

That's how it works in C and Pascal (well, at least with the appropriate
type declarations).

Um, no, it doesn't, really. There's no way 'b = a' can give
you that in C; you would have to write 'b = &a'. And you
couldn't do it at all in standard Pascal, because there is
no equivalent to the & operator there.

Your model is closer to what the CPython implementation
actually does,

I think it's close -- actually, I would say isomorphic --
to what *any conceivable* Python implementation would do in
some form.

n = len('hello world')

What about outside len? Where's the little box
pointing to 'hello world'?

So it seems your model fails to deal with sufficiently anonymous objects.

Anonymous objects are fine. You just draw a little box and
don't write any label beside it. Or you don't bother drawing
a little box at all and just draw a big box until such time
as some little box that you care about needs to point to it.

If that's a problem, then you have the same problem talking
about names bound to objects. An anonymous object obviously
doesn't have any name bound to it. So you have to admit that
objects can exist, at least temporarily, without being bound
to anything, or bound to some anonymous thing.

Both the call to len and the call to func push their results onto the
stack. There's no little box pointing to the result.

If you want to model things at that level of detail, then the
stack itself is an array of little boxes inside a frame object.
And the frame object is pointed to by a little box in its
calling frame object, etc. until you get to some global little
box, that doesn't have a name in Python, but exists somewhere
and keeps the chain of active stack frames alive.

But you don't have to draw any of that if you don't want to.

For practical reasons, there must be some sort of
indirection. But that's implementation and not the VM's model.

No, it's not just implementation. Indirection is needed for
*correct semantics*, not just practicality.

There is a problem with my model of free-floating objects in space: it
relies on objects being able to be in two places at once,

Yes, that's the point I'm trying to make. While it might be
possible to make such a model work, it would require bizarre
contortions that actually obscure what is going on instead
of clarifying it. Trying to teach someone about Python using
a model like that would be actively harmful, and probably
vilolate several human rights conventions.

But if you're a science fiction fan from way back,
then you won't have any problem with the idea that objects can be inside
themselves:

Yeah, it's fun to play around with ideas like that precisely
because they twist our brains into knots. But it's not a
good way to explain Python semantics clearly!

Now, that's a good challenge for your model. Little boxes only point to
big boxes. So how do you model cycles, including lists that contain
themselves?

I'll answer your next question first, and come back to that.

But that's wrong! Names (little boxes) can't point to *slots in a list*,

The arrow from a is to be understood as pointing to the whole
list, not any particular little box within the list. If you want
to clarify that, you can embellish the big boxes with some kind
of header area and point to that instead.

+---+
a | --+----->/---\
+---+ +---+
| |
+---+
| --+----->/-------------\
+---+ +-------------+
| | | |
+---+ | |
| |
+-------------+

Now, a list that "contains" itself:

--------
| |
+---+ V |
a | --+----->/---\ |
+---+ +---+ |
| | |
+---+ |
| --+-----
+---+
| |
+---+

But I wouldn't do it like that. I'd do it like this:

0 1 2 3 4
+--------+--------+--------+--------+--------+
a | | | | | |
| | | | | |
| | | | | |
+--------+--------+--------+--------+--------+

But then you can't model two list items bound to the same object,
unless you invoke the two-places-at-once idea. Even then, you would
have trouble unambiguously indicating that boxes draw in two places
are mean to actually represent the *same* object as against two
different objects with equal values.

Why are the boxes so
small? Just because. Why can't you carefully tease the thread of blutack
apart, into a bifurcated Y shaped thread? Just because.

Yes, it's probably just as good to leave it as an arbitrary rule
that a little box can only point to one big box at a time.

Chris Angelico · May 10, 2011

If objects can be in two places at once, why can't names? Just because.

Because then you'd need some way to identify which object you wanted
to refer to - something like name[0] and name[1]. A tuple is one
effective way to do this (sort of - actually, the name points at the
tuple and the tuple points at each of the objects).

I had heard something about the meaning of the word "gift", so I
checked in Google Translate. For Swedish "gift" it says:

noun
1. POISON
2. VENOM
3. TOXIN
4. VIRUS

Beware of Swedes bearing gifts!

Anonymous objects are fine. You just draw a little box and
don't write any label beside it. Or you don't bother drawing
a little box at all and just draw a big box until such time
as some little box that you care about needs to point to it.

If that's a problem, then you have the same problem talking
about names bound to objects. An anonymous object obviously
doesn't have any name bound to it. So you have to admit that
objects can exist, at least temporarily, without being bound
to anything, or bound to some anonymous thing.

There has to be a way to get from some mythical "home" location (which
we know in Python as locals()+globals()+current expression - the
"current namespace") to your object. That might involve several names,
or none at all, but if there's no such path, the object is
unreferenced and must be disposed of. IIRC that's not just an
implementation detail (the garbage collector), but a language
guarantee (that the __del__ method will be called).

Names are immaterial to that.

Chris Angelico

Grant Edwards · May 10, 2011

Who says it has to have the Pascal/Fortran/etc sense?

Because it's easier to communicate if everybody agrees on what a word
means.

Hans Georg Schaathun · May 10, 2011

Because it's easier to communicate if everybody agrees on what a word
: means.

Why should we agree on that particular word? Are there any other words
we agree about? Other key words, such as class, object, or function don't
have universal meanings.

Grant Edwards · May 10, 2011

Because it's easier to communicate if everybody agrees on what a word
: means.

Why should we agree on that particular word? Are there any other words
we agree about? Other key words, such as class, object, or function don't
have universal meanings.

And what do we mean by "agree"?

What do we mean by "mean"?

It's turtles all they down...

Chris Angelico · May 10, 2011

And what do we mean by "agree"?

What do we mean by "mean"?

It's turtles all they down...

When I use a word, it means just what I choose it to mean - neither
more nor less.
-- Humpty Dumpty.

Language is for communication. If we're not using the same meanings
for words, we will have problems.

Chris Angelico
PS. By "mean", I mean average. Except when I mean mean. But now I'm
just being mean.

The type/object distinction and possible synthesis of OOP andimperative programming languages	39	Apr 15, 2013
Python's Reference And Internal Model Of Computing Languages	7	Feb 3, 2010
With this artifact, everyone can easily invent new languages	5	Jan 11, 2014
Hobbyist - Python vs. other languages	8	Jul 31, 2008
Problems of Symbol Congestion in Computer Languages	54	Feb 16, 2011
Python component model	123	Oct 9, 2006
Python Internet Database	5	May 9, 2014
Python Unicode handling wins again -- mostly	67	Nov 30, 2013

What other languages use the same data model as Python?

Hans Georg Schaathun

Chris Angelico

Steven D'Aprano

Tim Golden

Ethan Furman

Mel

Terry Reedy

Hans Georg Schaathun

harrismh777

Gregory Ewing

Gregory Ewing

MRAB

Steven D'Aprano

Dennis Lee Bieber

Gregory Ewing

Chris Angelico

Grant Edwards

Hans Georg Schaathun

Grant Edwards

Chris Angelico

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads