You still dream of this, isn't it? Type inference in dynamic languages
doesn't scale. It didn't scale in twenty years of research on
SmallTalk and it doesn't in Python. However there is no no-go theorem
type inference sure is difficult business, and I won't deny there are
scalability issues, but the situation has improved a lot since back in
the smalltalk days. since then, type inference theory has not stood
still: agesen' cartesian product algorithm and plevyak's iterative
flow analysis (both published around '96) have greatly improved the
situation; a 1000-fold or more increase in computer speeds have
additionally made actual type inference (experimentation) much more
practical. (for anyone interested in the techniques powering shed
skin, see agesen and plevyak's phd theses for a relatively recent
update on the field.)
but in any case, I believe there are several reasons why type
inference scalability is actually not _that_ important (as long as it
works and doesn't take infinite time):
-I don't think we want to do type inference on large Python programs.
this is indeed asking for problems, and it is not such a bad approach
to only compile critical parts of programs (why would we want to
compile PyQt code, for example.) I do think type inference scales well
enough to analyze arbitrary programs of up to, say, 5,000 lines. I'm
not there yet with Shed Skin, but I don't think it's that far away (of
course I'll need to prove this now
)
-type inference can be assisted by profiling (so dramatically less
iterations are necessary to come to a full proof). profiling doesn't
have to fully cover code, because type inference fills in the gaps;
type inference can also be assisted by storing and reusing analysis
results, so profiling only has to be done once, or the analysis can be
made easier by running it multiple times during development. because
Shed Skin doesn't use profiling or memorization, and I know many
things to improve the type analysis scalability, I'm confident it can
scale much further than the programs it works for now (see ss-
progs.tgz from the homepage for a collection of 27 programs, such as
ray tracers, chess engines, sat solvers, sudoku solvers, pystone and
richards..).
besides, (as john points out I think), there is a difference between
analyzing an actual dynamic language and a essentially static language
(such as the Python subset that Shed Skin accepts). it allows one to
make certain assumptions that make type inference easier.
that prevents ambitious newbies to type theory wasting their time and
efforts.
yes, it's probably a waste of time to try and analyze large, actually
dynamic, Python programs, but I don't think we should want this at
all. computer speeds make Python fast enough for many purposes, and
global type inference scalability would demand us to throw out many
nice Python features. a JIT compiler seems better here..
where I think Shed Skin and similar tools can shine is in compiling
pure Python extensions modules and relatively small programs. having
worked on type inference for some time now, with modern techniques
,
I see no reason why we can't compile statically typed Python programs,
up to several thousands of lines. my analysis works pretty well
already (see ss-progs.tgz), and there are many things I can still
improve, besides adding profiling and memorization..
Read the related PEP, John. You will see that Guidos genius is that of
a good project manager in that case who knows that the community works
for him. The trade is that he supplies the syntax/interface and the
hackers around him fantasize about semantics and start
implementations. Not only annotations are optional but also their
meaning. This has nothing to do with VB and it has not even much to do
with what existed before in language design.
I think it's more Pythonic to just profile a program to learn about
actual types..
mark dufour (Shed Skin author).