Python's biggest compromises

Daniel Dittmar · Aug 1, 2003

Michael said:
<button nature="hot">
Reference counting *is* a form of garbage collection.
</button>

Saying "Ref. counting sucks, let's use GC instead" is a statement near
as dammit to meaningless.

This statement is not meaningless because most programmers will correctly
identify GC in this context as something like mark-and-sweep, generation
scavenging etc.

see also: DOS sucks, let's use an operating system instead.

current scheme's biggest drawback is its memory overhead, followed by
the cache-trashing tendencies of decrefs.

plus it doesn't support independent threads as all reference counting would
have to be protected, leading to poor performance.

But a lot of Python code depends on reference counting or more exactly it
depends on the timely call of the destructor. So even if a much better GC is
added to Python, reference counting would perhaps be kept for backwards
compatibility (see Python's biggest compromises)

Daniel

Daniel Dittmar · Aug 1, 2003

John said:
There's a flaw in your reasoning. The various techniques that
descend from mark and sweep (which is what you're
calling garbage collection) depend on being able to
identify all of the objects pointed to. For objects that are
owned by Python, that's a lengthy (that is, inefficient)

That's what generation scavenging was developed for. One shouldn't argue by
tradition alone, but the fact that the major implementations of dynamic
languages like LISP and Smalltalk don't use reference counting should carry
some weight.

process, and it's not possible in general for objects that
are created by extensions.

This is generally handled by registering and unregistering objects in the
extension code. Error prone as well, but probably less so than reference
counting.

It's easy to say that various languages would be improved
by adding "real" garbage collection, but those techniques
impose significant design constraints on the implementation
model.

True. But one could review these constraints from time to time.

Daniel

Paul Rubin · Aug 1, 2003

Michael Hudson said:
True. But the major implementations of these languages are also
usually less portable, and something more of a fiddle to write C
extensions for (at least, for the implementations I know about, which
are mostly CL impls).

I'd say the opposite, the Lisp implementations I've worked on are
considerably easier to write C extensions for, partly BECAUSE you
don't have to worry about constantly tweaking ref counts. In GNU
Emacs Lisp, for example, if you cons a new heap object and put it in a
C variable and (iirc) then call eval, you have to call a macro that
tells the GC not to sweep the object. But many C functions don't make
new objects, and most don't call eval, and you don't have to remember
what objects you've called the macro for. There's another macro that
you call before your function returns, and that cleans up all the GC
records in your stack frame made by any invocations of the first macro.

Robin Becker · Aug 1, 2003

John Roth said:
The trick with JITs is that they don't depend on absolute type
consistency. They depend on the observation that 99.44% of your
code is type consistent, and that consistency will turn up at run time. So
the code they generate depends on that discovered consistency, and
checks in front of each section to discover if the types are what the
code expects.

If it is, they execute it, if it isn't, they abandon it and go back to
the intepreter to discover what happened.

John Roth

Yes I suspected they have to do that, but that implies that a discovered
'float' object must carry along a whole lot of baggage (I guess I mean
be a more generic object) to allow for the testing. Loops without method
or function calls would be good candidates for JIT as methods and
functions could alter attribute types.

Is the JIT object literally just a union of

type,values

or would it be an actual Python object? For example would an
innerproduct be over a pair of lists or would the magic convert these
into actual double arrays.

Ian Bicking · Aug 1, 2003

Worst of both indeed. Maybe the decision to choose reference
counting was driven by speed considerations.

Reference counting spreads the speed hit over the entire program, while
other techniques tend to hit performance hard every so often. But all
together I think reference counting is usually slower than a good GC
algorithm, and incremental garbage collection algorithms can avoid
stalling. And I'm sure that the current state -- references counting
plus another kind of garbage collection for circular references -- must
be worse than either alone. The advantage is predictable collection
(unless you are using Jython), without memory leaks (due to circular
references).

Oh well...

Ian

Anthony_Barker · Aug 1, 2003

What to you think python largest compromises are?

I don't view any of these as "compromises". That word suggests that
something was conceded, or that an intermediate position between two
extremes was chosen to appease. I don't think that either sense really
applies to these features.

The three items that you listed are merely design choices. While arguments
over them are continuous, two of the design choices (interpreter, dynamic
typing) are consistent with Python's intended use as a language which
excels at rapid prototyping. The third (white space) is merely a stylistic
choice which is designed to encourage readable programs.

"Compromises" in language design occur usually when a committee tries to
standardize a language, and each has differing views about how the language
should be used. While this occurs somewhat in Python, other languages
have suffered more mightily from this particular disorder.

Mark

Excellent points - you are correct the ones I listed are design
choices.

Some people could be interpreted them as design "compromises". The
kind of people who would like to use the same tool for all problems.

OKB (not okblacke) · Aug 1, 2003

Gerald said:
What Python realy needs here is some means to make an expression
from a list of statements (called suite in the syntax defintion).

I believe this is called a "function".

John Roth · Aug 1, 2003

Robin Becker said:
Yes I suspected they have to do that, but that implies that a discovered
'float' object must carry along a whole lot of baggage (I guess I mean
be a more generic object) to allow for the testing. Loops without method
or function calls would be good candidates for JIT as methods and
functions could alter attribute types.

Is the JIT object literally just a union of

type,values

or would it be an actual Python object? For example would an
innerproduct be over a pair of lists or would the magic convert these
into actual double arrays.

As far as I'm aware, the JIT code doesn't fiddle with the data;
it just does the equivalent of assert tests at the beginning of the
blocks to verify that it's got the type of object it expects.

In other words, it does a very large amount of run-time type
checking. This only pays off if it can save even more expense
by compiling the code.

Now, this is going to be difficult for short segments of code,
but it can be quite a time saver if the JIT generated code can
make intermediate objects vanish so they don't have to be
created just to be discarded a short time later.

John Roth

John Roth · Aug 1, 2003

Anthony_Barker said:
I have been reading a book about the evolution of the Basic
programming language. The author states that Basic - particularly
Microsoft's version is full of compromises which crept in along the
language's 30+ year evolution.

What to you think python largest compromises are?

I'm not sure if we've beaten this one to death or not, but a real
example of a compromise just floated through my head.

Consider <list>.sort() and <list>.reverse(). These two otherwise
admirable methods don't return the object, so they can't be chained.
Why not? Because they update the list in place, and Guido decided
that not returning the object was a cheap way to make it clear that
they were doing something unusual.

Now, *that's* a compromise. The worst of both worlds.

John Roth

=?ISO-8859-1?Q?Hannu_Kankaanp=E4=E4?= · Aug 1, 2003

Michael Hudson said:
<button nature="hot">
Reference counting *is* a form of garbage collection.
</button>

You apparently have such a loose definition for garbage
collection, that even C programs have "a form of garbage
collection" on modern OSes: All garbage is reclaimed by
the OS when the program exits. It's just a very lazy collector.

I don't consider something a garbage collector unless it
collects all garbage (ref.counting doesn't) and is a bit more
agile than the one provided by OS.

Saying "Ref. counting sucks, let's use GC instead" is a statement near
as dammit to meaningless.

You, I and everyone knows what I was talking about, so it could
hardly be regarded as "meaningless".

Given the desires above, I really cannot think of a clearly better GC
strategy for Python that the one currently employed. AFAICS, the
current scheme's biggest drawback is its memory overhead, followed by
the cache-trashing tendencies of decrefs.

It's not "the one currently employed". It's the *two* currently
employed and that causes grief as I described in my previous post.
And AFAIK, Ruby does use GC (mark-and-sweep, if you wish) and
seems to be working. However, this is rather iffy knowledge. I'm
actually longing for real GC because I've seen it work well in
Java and C#, and I know that it's being used successfully in many
other languages.

What would you use instead?

A trick question?

Marc · Aug 1, 2003

In general, I suspect BASIC is more defined by compromises than
Python.

To me there is a compromise in Python's dependence on C. It seems that
at some point I will hit a performance or feature issue that will
require me to write a C extension. It seems to me VB6 has a similarly
awkward relationship with C++. Clearly the creators of Python were
expert C programmers; that should not be a requirement to become an
expert Python programmer.

- Marc

John Roth · Aug 1, 2003

Marc said:
In general, I suspect BASIC is more defined by compromises than
Python.

To me there is a compromise in Python's dependence on C. It seems that
at some point I will hit a performance or feature issue that will
require me to write a C extension. It seems to me VB6 has a similarly
awkward relationship with C++. Clearly the creators of Python were
expert C programmers; that should not be a requirement to become an
expert Python programmer.

As a number of people have said: if PyPy ever gets working...

John Roth

Andrew Dalke · Aug 1, 2003

Paul Rubin:

I'd say the opposite, the Lisp implementations I've worked on are
considerably easier to write C extensions for, partly BECAUSE you
don't have to worry about constantly tweaking ref counts.

I've mentioned in c.l.py before a library I used which can be called
from both C and FORTRAN. The latter doesn't support pointers,
so instead the library has a global instance table, indexed by integers.
The C/FORTRAN code just passes integers around.

In addition, there are dependencies between the objects, which
means that user code object deallocation can only occur in a
certain order.

With CPython it's possible to put a high-level OO interface to
that library, and provide hooks for the ref-counted gc to call
the proper deallocators in the correct order. This is done by
telling the finalizer how to do it and paying careful attention to
order.

The library also has Java bindings. As far as I can tell, it's
impossible to hook into Java's automatic gc. A C-level
gc like Boehm can't ever tell that data is no longer needed,
because the global table keeps a reference to every created
object, and Java's native gc doesn't make the proper
guarantees on finalization order.

Andrew
(e-mail address removed)

Andrew Dalke · Aug 1, 2003

Gerald Klix:

IHMO it is the lambda expression.
These "functions" are not real functions. You can not use
statements in them.

True. That's why they are "lambda expressions" and not
"lambda functions"

(Okay, that was a cheap shot ... but still true

Andrew
(e-mail address removed)

Andrew Dalke · Aug 1, 2003

Marc:

To me there is a compromise in Python's dependence on C.

Then explain Jython, which is an implementation of Python-the-language
on top of Java.

It seems that
at some point I will hit a performance or feature issue that will
require me to write a C extension.

change "will" to "may"

And if you write in C, at some point you may hit a performance
or feature issue that will require you to write assembly code.

Clearly the creators of Python were
expert C programmers; that should not be a requirement to become an
expert Python programmer.

Lack of knowledge of C does not strongly preclude becoming an
expert Python programmer.

To be an expert Python programmer, you should know how other
programming langauges work too, but that can be done by
learning other languages: Eiffel, Java, APL, Haskel, Caml, Ada,
Prolog, Java, ...

Andrew
(e-mail address removed)

Courageous · Aug 2, 2003

"Significant whitespace" isn't a "compromise," it's a design choice.
The Python interpreter actually inserts explicit scope tokens into
the symbol stream at the lexer; the parser deals with the symbols as
does any parser. It's really not all that hard, actually. One just
has to understand the bit that making the *parser* deal with the white
space is not the right thing.

C//

Gerald Klix · Aug 2, 2003

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

This is not a function. This just the same as blocks in Algol 68,
because both do not defer execution until the fuction is called.

Of course you can define the semantics of such a block in terms
of a function defintion and an immedirate call to the fuction.
This is done in the Scheme languges let syntax.

HTH,
Gerald

OKB (not okblacke) wrote:
| Gerald Klix wrote:
|
|
|>What Python realy needs here is some means to make an expression
|>from a list of statements (called suite in the syntax defintion).
|
|
| I believe this is called a "function".
|

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.1 (GNU/Linux)
Comment: Using GnuPG with Debian - http://enigmail.mozdev.org

iD8DBQE/K4adEDg9cqFA1jQRAnLkAJ0WXxqAwdzyqrKfOy2O1ycod1aCmQCfeKRH
doeQdEuwBJ8LS+gy6IOYzZQ=
=KB0f
-----END PGP SIGNATURE-----

Andy C · Aug 2, 2003

it isn't to a compiler). OTOH, *Python's* scheme is inferior to a

Well, that's been recognized as a problem for a long time. I believe
that the intention is to mandate spaces only in Python 3.0. On the
other hand, I'm not holding my breath waiting for Guido to decide

What's bad about tabs? I'm new to Python. Tabs are good because then you
can view the source however you want. I can't write in 4 space tabs and
someone else can change it to 2 if they prefer. But I can see the problem
with continuation lines. Also it must be ALL tabs or ALL spaces. This
shouldn't be too hard -- most editors have the "tabify" option.

Andy

Andy C · Aug 2, 2003

Many tools don't allow you to configure tabs, and of those that do,

each uses its own incompatible syntax and has its own quirks. In other
words, tabs may seem like a good thing if you use just one or two tools
that do what you want, but as soon as your program moves out into
the wild world, things change.

What are these quirks? By far the most common I've seen is mixing tabs and
spaces, but this should be relatively easily solved by requiring one or the
other (minus continuation lines, which are still a problem). Using spaces
has some disadvantages too, since not everyone will use the same number of
spaces, and editors don't behave as nicely. I like when you hit the arrow
key at a tab, and it jumps the full tab, rather than having to press an
arrow key 4 times.

Terry Reedy · Aug 2, 2003

Andy C said:
What's bad about tabs?

Because there is no 'official' meanings, despite claims to the
contrary, some software like Outlook Express ignores them on receipt.

<flameshield on> tjr

python-dev summary for 2005-07-01 to 2005-07-15	1	Jul 31, 2005
anybody help me	1	Feb 10, 2006
comp.lang.c Answers (Abridged) to Frequently Asked Questions (FAQ)	0	Mar 1, 2008
comp.lang.c Answers (Abridged) to Frequently Asked Questions (FAQ)	0	Jan 12, 2008
comp.lang.c Answers (Abridged) to Frequently Asked Questions (FAQ)	0	Dec 15, 2007
comp.lang.c Answers (Abridged) to Frequently Asked Questions (FAQ)	0	Nov 1, 2007
comp.lang.c Answers (Abridged) to Frequently Asked Questions (FAQ)	0	Aug 1, 2007
comp.lang.c Answers to Frequently Asked Questions (FAQ List)	15	Apr 1, 2006

Python's biggest compromises

Daniel Dittmar

Daniel Dittmar

Paul Rubin

Robin Becker

Ian Bicking

Anthony_Barker

OKB (not okblacke)

John Roth

John Roth

=?ISO-8859-1?Q?Hannu_Kankaanp=E4=E4?=

Marc

John Roth

Andrew Dalke

Andrew Dalke

Andrew Dalke

Courageous

Gerald Klix

Andy C

Andy C

Terry Reedy

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads