I thought about this in the context of reading Anton's latest post to
me, but I'm just throwing out an idea. It's certainly too fuzzy in my
mind at the moment to consider it in any way developed. I'm sure it's
just as problematic, if not more so, as any existing accounts of dynamic
types. Here it is, anyway, just because it's a different way of seeing
things.
(I'll first point out here, because it's about to become significant,
that contrary to what's been said several times, static type systems
assign types to expressions rather than variables.)
First of all, it's been pointed out several times that static typing
assumes a static program, and that it becomes difficult to describe if
the program itself is being modified as it is evaluated. The
interesting thing that occurred to me earlier today is that for
languages that are reducible to the lambda calculus at least, ALL
programs are from some perspective self-modifying. Specifically, no
single expression is ever evaluated more than once. When a statement is
evaluated several times, that simply means that there is actually a more
complex statement that, when it is evaluated, writes a new statement,
which consists of a copy of itself, plus something else. Thus it makes
little sense, from that perspective, to worry about the possibility that
an expression might have a different type each time it's evaluated, and
therefore violate static typing. It is never evaluated more than once.
What if I were to describe a dynamically typed system as one in which
expressions have types just as they do in a static type system, but in
which programs are seen as self-modifying? Rather than limiting forms
of recursion for well-typed programs to, say, the standard fixed-point
operator and then assigning some special-purpose rule for it, we simply
admit that the program is being modified as it is run. Type inference
(and therefore checking) would need to be performed incrementally. When
they do, the results of evaluating everything up to this point will have
changed the type environment, providing more information than was
previously known. The type environment for assigning types to an
expression could still be treated statically at the time that the
expression first comes into existence; but it would be different then
from what it was at the beginning of the program.
Of course, this is only a formal description of the meaning, and not a
description of any kind of a reasonable implementation. However, this
initially appears, on the surface, to take us some distance toward
unifying dynamic types and static types.
At least two interesting correlations just sort of "fall out".
First, Marshall has observed (and I agree) that static and dynamic types
form not different ways of looking at verifying correctness, but rather
different forms of programming in the first place. It appears to be
encouraging, then, that this perspective classifies static and dynamic
types based on how we define the program rather than how we define the
type checking.
Second, type theory is sometimes concerned with the distinction between
structural and nominal types and various hybrids. We've concerned
ourself a good bit in this thread with exactly what kinds of "tags" or
other runtime type notations are acceptable for defining a dynamically
typed language. In this view, Smalltalk (or some simplification thereof
wherein doesNotUnderstand is considered an error handling mechanism
rather than a first-class part of the language semantics) would
represent the dynamic version of structural types, whereas type
"tagging" systems with a user-declared finite set of type tags would be
the dynamic equivalent to nominal types. Just like in type theory,
dynamic type systems can also have hybrids between structural and
nominal types.
Obviously, that has a lot of problems... some of the same ones that I've
been complaining about. However, it has some interesting
characteristics as well. If an expression is assumed to be different
each time it is encountered, then type the type of "one" expression is
easy to evaluate in the initial environment in which it occurs (for some
well-defined order of evaluation).
Now, to Anton's response, with admittedly rather liberal skimming
(sorry!) to avoid my staying up all night thinking about this.
It's obviously going to be difficult to formally pin down something that
is fundamentally informal. It's fundamentally informal because if
reasoning about the static properties of terms in DT languages were
formalized, it would essentially be something like a formal type system.
We may be talking past each other, then, but I'm not particularly
interested in the informal aspects of what makes up a type. I agree
that formalizing the notion appears quite difficult, especially if my
conjecture above were to become part of it.
However, there are some pretty concrete things we can look at. One of
them, which as I've mentioned elsewhere is part of what led me to my
position, is to look at what a soft type inferencer does. It takes a
program in a dynamically typed language, and infers a static type scheme
for it (obviously, it also defines an appropriate type system for the
language.) This has been done in both Scheme and Erlang, for example.
How do you account for such a feat in the absence of something like
latent types?
There are undoubtedly any number of typing relations and systems of
rules which can be assigned to any program in a dynamically typed
language. I think the more interesting question is how these type
inference systems choose between the limitless possibilities in a way
that tends to be acceptable. I suspect that they do it by assuming
certain kinds of type systems, and then finding them. I doubt they are
going to find a type system in some language until the basics of that
system have been fed to the software as something to look for.
I'd be delighted to discover that I'm wrong. If nothing else, it would
mean that applying these programs to significant amounts of Scheme and
Erlang software would become an invaluable tool for doing research in
type theory.
I think that's a severe understatement.
Yes, I agree. I did not intend the connotations that ended up there.
In fact, I'd say that something like 90% (and yes, I just grabbed that
number out of thin air) of reasoning performed by programmers about
their programs could be performed by a static type system.
BTW, I notice you didn't answer any of the significant questions which I
posed to Vesa. So let me pose one directly to you: how should I rewrite
the first sentence in the preceding paragraph to avoid appealing to an
admittedly informal notion of type?
I'll try to answer your question. My answer is that I wouldn't try. I
would, instead, merely point out that the notion of type is informal,
and that it is not necessarily even meaningful. The two possibilities
that concern me are that either reasoning with types in this informal
sense is just a word, which gets arbitrarily applied to some things but
not others such that distinctions are made that have no real value, and
that any sufficient definition of type-based reasoning is going to be
broad enough to cover most reasoning that programmers ever do, once we
start applying enough rigor to try to distinguish between what counts as
a type and what doesn't. The fact that programmers think about types is
not disputed; my concern is that the types we think about may not
meaningfully exist.
Note, also, that I'm using the word
in the sense of static properties, not in the sense of tagged values.
In the static sense of types, as I just wrote to Chris Uppal, I can then
definitively assert at least the second of the two possibilities above.
Every possible property of program execution is a type once the problem
of how to statically check it has been solved.
For now, I'll stand on what I've written above. When I see if or how
that doesn't convince you, I can go further.
To be clear, I am certainly convinced that people talk and think about
types and also about other things regarding their programs which they do
not call types. I am concerned only with how accurate it is to make
that distinction, on something other than psychological grounds. If we
are addressing different problems, then so be it. If however, you
believe there is a type/non-type distinction that can be built on some
logical foundation, then I'd love to hash it out.
I agree that in the narrow context of considering to what extent a
dynamically typed language has a formal static type system, you can call
it untyped.
OK.
However, what that essentially says is that formal type
theory doesn't have the tools to deal with that language, and you can't
go much further than that. As long as that's what you mean by untyped,
I'm OK with it.
I can see that in a sense... yes, if a specific program has certain
properties, then it may be possible to extend the proof-generator (i.e.,
the type checker) to be able to prove those properties. However, I
remain inconvinced that the same is true of the LANGUAGE. Comments in
this thread indicate to me that even if I could write a type-checker
that checks correctness in a general-purpose programming language, and
that language was Scheme, there would be Scheme programmers who would
prefer not to use my Scheme+Types language because it would pester them
until they got everything perfect before they could run their code, and
Scheme would remain a separate dynamically typed (i.e., from a type
theory perspective, untyped) language despite my having achieved all of
the wildest dreams of the field of type theory. In that sense, I don't
consider this a limitation of type theory.
(Two notes: first, I don't actually believe that this is possible, since
the descriptions people would give for "correct" are nowhere near so
formal as the type theory that would try to check them; and second, I am
not so sure that I'd want to use this new language either, for anything
except applications in which bugs are truly life-threatening.)
Think of it like this: the more ambitious a language's type system is,
the fewer uncaptured static properties remain in the code of programs in
that language. However, there are plenty of languages with rather weak
static type systems. In those cases, code has more static properties
that aren't captured by the type system.
Yes, I agree with this.
I'm pointing out that in many
of these cases, those properties resemble types, to the point that it
can make sense to think of them and reason about them as such, applying
the same sort of reasoning that an automated type inferencer applies.
It often makes sense to apply the same sort of reasoning as would be
involved in static type inference. I also agree that there are
properties called types, which are frequently the same properties caught
by commonly occurring type systems, and which enough developers already
think of under the word "type" to make it worth trying to change that
EXCEPT when there is an explicit discussion going on that touches on
type theory. In that latter case, though, we need to be very careful
about defining some group of things as not-types across the scope of all
possible type systems. There are no such things a priori, and even when
you take into account the common requirement that a type system is
tractable, the set of things that are not-type because it's provably
impossible to check them in polynomial time is smaller than the
intuitive sense of that word.
Essentially, this use of "type" becomes problematic when it is used to
draw conclusions about the scope, nature, purpose, etc. of static types.
This answers a good number of the following concerns.
A complete formal definition of what I'm talking about may be impossible
in principle, because if you could completely formally define it, you'd
have a static type system.
That may not actually be the case. Lots of things may be formally
defined that would nevertheless be different from a static type system.
For example, if someone did provide a rigorous definition of some
category of type errors that could be recognized by some defined process
as such, then the set of language features that detect and react to
these errors could be defined as a type system. Chris Uppal just posted
such a thing in a different part of this thread, and I'm unsure at this
point whether it works or not. The problem with earlier attempts to
define type errors in this thread is that they either tend to include
all possible errors, or are distinguished by some nebulously defined
condition such as being commonly thought of in some way by some set of
human beings.
If that makes you throw up your hands, then all you're saying is that
you're unwilling to deal with a very real phenomenon that has obvious
connections to type theory, examples of which I've given above. That's
your choice, but at the same time, you have to give up exclusive claim
to any variation of the word "type".
I really think it's been a while since I claimed any such thing... and
even at the beginning, I only meant to challenge that use of type in
comparing static and dynamic type systems.
I am not so sure, though, about these obvious connections to type
theory. Those connections may be, for example, some combination of the
following rather uninteresting phenomena:
1. Type theory got the word "type" from somewhere a century ago, and the
word has continued to be used in its original sense as well, and it just
happens that despite the divergence of these uses, there remains some
common ground between the two different meanings.
or, 2. Many programmers approach type theory from the perspective of
having some intuitive definition of a type in mind, and they tend to mix
it in with a few of the ideas of type theory in their minds, and then go
off and use the word in other contexts.
Terms are used in a context, and it's perfectly reasonable to call
something a "latent type" or even a "dynamic type" in a certain context
and point out connections between those terms and their cousins (or if
you insist, their completely unrelated namesakes) static types.
Yes, I agree. I'm yet to be convinced that they are completely
unrelated namesakes. I am actually trying to reconstruct such a thing.
The problem is that, knowing how "type" is used in the static sense, I
am forced to reject attempts that share only basic superficial
similarities and yet miss the main point.
The definition of static type system is not in question. However,
realistically, as I pointed out above, you have to acknowledge that type
systems exist in, and are inextricably connected to, a larger, less
formal context.
No, but it certainly seems that the reason behind calling them a "type
system" is being placed very much in doubt. Simply saying static type
system is horribly wrong, and I do it only to avoid inventing words
wholesale... but I greatly regret the implication that it means "a type
system, which happens to be static."
In some senses, this is akin to describing a boat as "a floating
platform". That may be somewhat accurate as a rough definition for
someone who is familiar with platforms but not with boats. It's a
terrible failure, though, in that it implies that being a platform is
the important bit; where actually the important bit is that it floats!
In fact, it may turn out that boats have very little to do with common
platforms... for example, while some boats (barges) have many of the
same properties that platforms do, there are whole other classes of
boats, which are considerably more powerful than barges, that end up
looking very little like a platform at all. If you press hard enough, I
can be forced to admit that there is some kind of connection between
platforms and floating platforms, because after all one has to stand on
something while using a boat, right?
Well, the important part of the phrase "static type" isn't "type" but
"static." Without even having to undertake the dead-end task of
deciding who's got the right to the word "type", let's assume that I
should really be saying some other word, like boat. However, that word
doesn't really exist, so I am stuck with "static type". I just have to
be very careful about explaining that what I mean by type is completely
different from what others mean by type. There is some overlap, which
potentially arises from one of the uninteresting sources I've listed
above, or maybe... just maybe... arises from some actual similarity that
we haven't identified yet. But we still haven't identified it.
This is very much the situation I see ourselves in. I very much want to
agree with everyone's right to use the word "type", but then people keep
saying things that seem to me to be clear misunderstandings that the
important part is the "static", not the "type". When qualification is
not needed
Fundamentally, the important aspect of types, in the static sense, is
that they are used in a formal, syntactic, tractable method for proving
the absence of program behaviors. Even if it turns out that, in
practice, all type systems contain mechanisms for solving a certain
rigorously definable class of problems, that will not be part of the
definition of types, when the word is used in the static sense. So we
can search for a more formal similarity, or we can agree that it's
incidentally true that they sometimes solve the same problems but also
agree that it's pointless and misleading to ever talk about a "type
system" without clarifying the usage either explicitly or by context.
It remains incorrect to relax the degree of formality and talk about
static types in an intuitive and undefinable way, because than they lose
the entire property of being types at all. At that point, it's best to
drop "static", which is not true anyway, and switch over to the other
set of terminology where it makes sense to call them types.
There's a sense in which one can say that yes, the informal types I'm
referring to have an interesting degree of overlap with static types;
and also that static types do, loosely speaking, have the "purpose" of
formalizing the informal properties I'm talking about.
Static types have the property of proving ANY properties of a program
that may be proven. Certainly, research is pragmatically restricted to
specific problems that are (1) useful and (2) solvable. Aside from
that, I can't imagine any competent language designer building a type
system, but rejecting some useful and solvable feature for the sole
reason that he or she doesn't think of it as solving a type error.
But I hardly see why such an informal characterization should bother
you. It doesn't affect the definition of static type.
I hope I've explained how it does affect the definition of a static
type.
This is based on the assumption that all we're talking about is
"well-defined semantics".
Indeed. My statement there will not apply if we find some formal
definition of a dynamic type system that doesn't reduce to well-defined
semantics once you remove the intuitive bits from it.