What is Expressiveness in a Computer Language

Chris F Clark · Jun 25, 2006

Chris F Clark said:
Do you reject that there could be something more general than what a
type theorist discusses? Or do you reject calling such things a type?

Chris Smith said:
I think that the correspondence partly in the wrong direction to
describe it that way. ....
What I can't agree to is that what you propose is actually more general.
It is more general in some ways, and more limited in others. As such,
the best you can say is that is analogous to formal types in some ways,
but certainly not that it's a generalization of them.

Yes, I see your point.

Me:

I'm particularly interested if something unsound (and perhaps
ambiguous) could be called a type system.

Chris Smith:

Yes, although this is sort of a pedantic question in the first place, I
believe that an unsound type system would still generally be called a
type system in formal type theory. However, I have to qualify that with
a clarification of what is meant by unsound. We mean that it DOES
assign types to expressions, but that those types are either not
sufficient to prove what we want about program behavior, or they are not
consistent so that a program that was well typed may reduce to poorly
typed program during its execution. Obviously, such type systems don't
prove anything at all, but they probably still count as type systems.

Yes, and you see mine.

These informal systems, which may not prove what they claim to prove
are my concept of a "type system". That's because at some point,
there is a person reasoning informally about the program. (There may
also people reasoning formally about the program, but I am going to
neglect them.) It is to the people who are reasoning informally about
the program we wish to communicate that there is a type error. That
is, we want someone who is dealing with the program informally to
realize that there is an error, and that this error is somehow related
to some input not being kept within proper bounds (inside the domain
set) and that as a result the program will compute an unexpected and
incorrect result.

I stressed the informal aspect, because the intended client of the
program is almost universally *NOT* someone who is thinking rigorously
about some mathematical model, but is dealing with some "real world"
phenomena which they wish to model and may not have a completely
formed concrete representation of. Even the author of the program is
not likely to have a completely formed rigorous model in mind, only
some approxiamtion to such a model.

I also stress the informality, because beyond a certain nearly trivial
level of complexity, people are not capable of dealing with truly
formal systems. As a compiler person, I often have to read sections
of language standards to try to discern what the standard is requiring
the compiler to do. Many language standards attempt to describe some
formal model. However, most are so imprecise and ambiguous when they
attempt to do so that the result is only slightly better than not
trying at all. Only when one understands the standard in the context
of typical practice can one begin to fathom what the standard is
trying to say and why they say it the way they do.

A typical example of this is the grammars for most programming
languages, they are expressed in a BNF variant. However, most of the
BNF descriptions omit important details and gloss over certain
inconsistencies. Moreover, most BNF descriptions are totally
unsuitable to a formal translation (e.g. an LL or LR parser generator,
or even an Earley or CYK one). Yet, when considered in context of
typical implementations, the errors in the formal specification can
easily be overlooked and resolved to produce what the standard authors
intended.

For example, a few years ago I wrote a Verilog parser by
transliterating the grammar in the standard (IEEE 1364-1995) into the
notation used by my parser generator, Yacc++. It took a couple of
days to do so. The resulting parser essentially worked on the first
try. However, the reason it did so, is that I had a lot of outside
knowledge when I was making the transliteration. There were places
where I knew that this part of the grammar was lexical for lexing and
this part was for parsing and cut the grammar correctly into two
parts, despite there being nothing in the standard which described
that dichotomy. There were other parts, like the embedded
preprocessor, that I knew were borrowed from C, so that I could use
the reoultions of issues the way a C compiler would do so, to guide
resolving the ambiguities in the Verilog standard. There were other
parts, I could tell the standard was simply written wrong, that is
where the grammar did not specify the correct BNF, because the BNF
they had written did not specifiy something which would make
sense--for example, the endif on an if-then-else-statement was bound
to the else, so that one would write an if then without an endif, but
and if then else with the endif--and I knew that was not the correct
syntax.

The point of the above paragraph is that the very knowledgable people
writing the Verilog standard made mistakes in their formal
specification, and in fact these mistakes made it passed several
layers of review and had been formally ratified. Given the correct
outside knowledge these mistakes are relatively trivial to ignore and
correct. However, from a completely formal perspective, the standard
is simply incorrect.

More importantly, chip designers use knowledge of the standard to
guide the way they write chip descriptions. Thus, at some level these
formal erros, creep into chip designs. Fortunately, the chip
designers also look past the inconsistencies and "program" (design) in
a language which looks very like standard Verilog, but actually says
what they mean.

Moreover, not only is the grammar in the standard a place where
Verilog is not properly formalized. There are several parts of the
type model which suffer similar problems. Again, we have what the
standard specifies, and what is a useful (and expected) description of
the target model. There are very useful and informal models of how
the Verilog language work and what types are involved. There is also
an attempt in the standard to convey some of these aspects formally.
Occassionally, the formal description is write and illuminating, at
other times it is opaque and at still other times, it is inconsistent,
saying contradictory things in different parts of the standard.

Some of the more egregious errors were corrected in the 2001 revision.
However, at the same time, new features were added which were (as far
as I can tell) just as inconsistently specified. And, this is not the
case of incompetence or unprofessionalism. The Verilog standard is
not alone in these problems--it's just my most recent experience, and
thus the one I am most familiar with at the moment.

I would expect the Python, Scheme, Haskell, and ML standards to be
more precise and less inconsistent, because of their communities.
However, I would be surprised if they were entirely unambiguous and
error-free. In general, I think most non-trivial attempts at
formalization flounder on our inability.

Therefore, I do not want to exlcude from type systems, things wich are
informal and unsound. They are simply artifacts of human creation.

-Chris

David Hopwood · Jun 25, 2006

Scott said:
So you claim Java and Objective C are "simply bugs in the theory."

Java's type system was unsound for much of its life. I think that supports
the point that it's inadequate to simply "wish and hope" for soundness,
and that a proof should be insisted on.

Gabriel Dos Reis · Jun 25, 2006

[...]

| (Yes, I'm being silly. But the point is very serious. Even with less
| silly examples, whether a language or subset is "safe" entirely
| depends on what you define to be "safe", and these definitions tend to
| vary vastly across language communities.)

Yes, I know. I hoped we would escape sillyland for the sake of
productive conversation.

-- Gaby

Gabriel Dos Reis · Jun 25, 2006

(e-mail address removed)-sb.de writes:

| think that it is too relevant for the discussion at hand. Moreover,
| Harper talks about a relative concept of "C-safety".

Then, I believe you missed the entire point.

First point: "safety" is a *per-language* property. Each language
comes with its own notion of safety. ML is ML-safe; C is C-safe;
etc. I'm not being facetious; I think this is the core of the
confusion.

Safety is an internal consistency check on the formal definition of
a language. In a sense it is not interesting that a language is
safe, precisely because if it weren't, we'd change the language to
make sure it is! I regard safety as a tool for the language
designer, rather than a criterion with which we can compare
languages.

[...]

| Or are you trying to suggest that we should indeed consider C safe for
| the purpose of this discussion?

I'm essentially suggesting "silly arguments" (as noted by someone
in another message) be left out for the sake of productive
conversation. Apparently, I did not communicate that point well enough.

-- Gaby

Darren New · Jun 26, 2006

Gabriel said:
I would suggest you give more thoughts to the claims made in
http://www.seas.upenn.edu/~sweirich/types/archive/1999-2003/msg00298.html

I'm not sure I understand this. Looking at "Example 2", where C is
claimed to be "C-safe", he makes two points that I disagree with.

First, he says Unix misimplements the operations semantics of C. It
seems more like Unix simply changes the transition system such that
programs which access unmapped memory are terminal.

Second, he says that he hasn't seen a formal definition of C which
provides precide operational semantics. However, the language standard
itself specified numerous "undefined" results from operations such as
accessing unallocated memory, or even pointing pointers to unallocated
memory. So unless he wants to modify the C standard to indicate what
should happen in such situations, I'm not sure why he thinks it's "safe"
by his definition. There is no one "q" that a "p" goes to based on the
language. He's confusing language with particular implementations. He
seems to be saying "C is safe, oh, except for all the parts of the
language that are undefined and therefore unsafe. But if you just
clearly defined every part of C, it would be safe." It would also seem
to be the case that one could not say that a program "p" is either
terminal or not, as that would rely on input, right? gets() is "safe" as
long as you don't read more than the buffer you allocated.

What's the difference between "safe" and "well-defined semantics"?
(Ignoring for the moment things like two threads modifying the same
memory at the same time and other such hard-to-define situations?)

Chris Smith · Jun 26, 2006

Chris F Clark said:
These informal systems, which may not prove what they claim to prove
are my concept of a "type system".

Okay, that works. I'm not sure where it gets us, though, except for
gaining you the right to use "type system" in a completely uninteresting
sense to describe your mental processes. You still need to be careful,
since most statements from type theory about type systems tend to
implicitly assume the property of being sound.

I'm not exactly convinced that it's possible to separate that set of
reasoning people do about programs into types and not-types, though. If
you want to stretch the definition like this, then I'm afraid that you
end up with the entire realm of human thought counting as a type system.
I seem to recall that there are epistemologists who would be happy with
that conclusion (I don't recall who, and I'm not actually interested
enough to seek them out), but it doesn't lead to a very helpful sort of
characterization.

It is to the people who are reasoning informally about
the program we wish to communicate that there is a type error. That
is, we want someone who is dealing with the program informally to
realize that there is an error, and that this error is somehow related
to some input not being kept within proper bounds (inside the domain
set) and that as a result the program will compute an unexpected and
incorrect result.

I've lost the context for this part of the thread. We want this for
what purpose and in what situation?

I don't think we necessarily want this as the goal of a dynamic type
system, if that's what you mean. There is always a large (infinite? I
think so...) list of potential type systems under which the error would
not have been a type error. How does the dynamic type system know which
informally defined, potentially unsound, mental type system to check
against? If it doesn't know, then it can't possibly know that something
is a type error. If it does know, then the mental type system must be
formally defined (if nothing else, operationally by the type checker.)

I also stress the informality, because beyond a certain nearly trivial
level of complexity, people are not capable of dealing with truly
formal systems.

I think (hope, anyway) that you overstate the case here. We can deal
with formal systems. We simply make errors. When those errors are
found, they are corrected. Nevertheless, I don't suspect that a large
part of the established base of theorems of modern mathematics are
false, simply because they are sometimes rather involved reasoning in a
complex system.

Specifications are another thing, because they are often developed under
time pressure and then extremely difficult to change. I agree that we
should write better specs, but I think the main challenges are political
and pragmatic, rather than fundamental to human understanding.

Therefore, I do not want to exlcude from type systems, things wich are
informal and unsound. They are simply artifacts of human creation.

Certainly, though, when it is realized that a type system is unsound, it
should be fixed as soon as possible. Otherwise, the type system doesn't
do much good. It's also true, I suppose, that as soon as a person
realizes that their mental thought process of types is unsound, they
would want to fix it. The difference, though, is that there is then no
formal definition to fix.

Matthias Blume · Jun 26, 2006

Gabriel Dos Reis said:
(e-mail address removed)-sb.de writes:

| think that it is too relevant for the discussion at hand. Moreover,
| Harper talks about a relative concept of "C-safety".

Then, I believe you missed the entire point.

First point: "safety" is a *per-language* property. Each language
comes with its own notion of safety. ML is ML-safe; C is C-safe;
etc. I'm not being facetious; I think this is the core of the
confusion.

Safety is an internal consistency check on the formal definition of
a language. In a sense it is not interesting that a language is
safe, precisely because if it weren't, we'd change the language to
make sure it is! I regard safety as a tool for the language
designer, rather than a criterion with which we can compare
languages.

I agree with Bob Harper about safety being language-specific and all
that. But, with all due respect, I think his characterization of C is
not accurate. In particular, the semantics of C (as specified by the
standard) is *not* just a mapping from memories to memories. Instead,
there are lots of situations that are quite clearly marked in C's
definition as "undefined" (or whatever the technical term may be). In
other words, C's specified transition system (to the extend that we
can actually call it "specified") has lots of places where programs
can "get stuck", i.e., where there is no transition to take. Most
actual implementations of C (including "Unix") are actually quite
generous by filling in some of the transitions, so that programs that
are officially "legal C" will run anyway.

Example:

The following program (IIRC) is not legal C, even though I expect it
to run without problem on almost any machine/OS, printing 0 to stdout:

#include <stdio.h>

int a[] = { 0, 1, 2 };

int main (void)
{
int *p, *q;
p = a;
q = p-1; /** illegal **/
printf("%d\n", q[1]);
return 0;
}

Nevertheless, the line marked with the comment is illegal because it
creates a pointer that is not pointing into a memory object. Still, a
C compiler is not required to reject this program. It is allowed,
though, to make the program behave in unexpected ways. In particular,
you can't really blame the compiler if the program does not print 0,
or if it even crashes.

AFAIC, C is C-unsafe by Bob's reasoning.

---

Of course, C can be made safe quite easily:

Define a state "undefined" that is considered "safe" and add a
transition to "undefined" wherever necessary.

Kind regards,
Matthias

John Thingstad · Jun 26, 2006

The "subset" characterization is not important for what I'm saying. The
fact that they are different languages is what's important. If you
agree about that, then you can at least understand which language I'm
referring to when I say "latently-typed language".

Besides, many dynamically-typed languages have no formal models, in
which case the untyped formal model I've referred to is just a
speculative construct. The language I'm referring to with
"latently-typed language" is the language that programmers are familiar
with, and work with.

That's a pretty apt comparison, and it probably explains how it is that
the software we all use, which relies so heavily on C, works as well as
it does.

But the comparison critiques the practice of operating without static
guarantees, it's not a critique of the terminology.

Anton

Actually I have never developed a C/C++ program
without a bounds checker the last 15 years.
It checks all memory references and on program shutdown
checks for memory leaks. What is it about you guys that make you blind
to these fact's. Allocation problem's haven't really bugged me at all
since forever. Now debugging fluky templates on the other hands..
But then debugging Lisp macro's isn't much better.

Anton van Straaten · Jun 26, 2006

Chris said:
What makes static type systems interesting is not the fact that these
logical processes of reasoning exist; it is the fact that they are
formalized with definite axioms and rules of inference, performed
entirely on the program before execution, and designed to be entirely
evaluatable in finite time bounds according to some procedure or set of
rules, so that a type system may be checked and the same conclusion
reached regardless of who (or what) is doing the checking. All that,
and they still reach interesting conclusions about programs.

There's only one issue mentioned there that needs to be negotiated
w.r.t. latent types: the requirement that they are "performed entirely
on the program before execution". More below.

If
informal reasoning about types doesn't meet these criteria (and it
doesn't), then it simply doesn't make sense to call it a type system
(I'm using the static terminology here). It may make sense to discuss
some of the ways that programmer reasoning resembles types, if indeed
there are resemblances beyond just that they use the same basic rules of
logic. It may even make sense to try to identify a subset of programmer
reasoning that even more resembles... or perhaps even is... a type
system. It still doesn't make sense to call programmer reasoning in
general a type system.

I'm not trying to call programmer reasoning in general a type system.
I'd certainly agree that a programmer muddling his way through the
development of a program, perhaps using a debugger to find all his
problems empirically, may not be reasoning about anything that's
sufficiently close to types to call them that. But "latent" means what
it implies: someone who is more aware of types can do better at
developing the latent types.

As a starting point, let's assume we're dealing with a disciplined
programmer who (following the instructions found in books such as the
one at htdp.org) reasons about types, writes them in his comments, and
perhaps includes assertions (or contracts in the sense of "Contracts for
Higher Order Functions[1]), to help check his types at runtime (and I'm
talking about real type-checking, not just tag checking).

When such a programmer writes a type signature as a comment associated
with a function definition, in many cases he's making a claim that's
provable in principle about the inputs and outputs of that function.

For example, if the type in question is "int -> int", then the provable
claim is that given an int, the function cannot return anything other
than an int. Voila, assuming we can actually prove our claim, then
we've satisfied the requirement in Pierce's definition of type system,
i.e. "proving the absence of certain program behaviors by classifying
phrases according to the kinds of values they compute."

The fact that the proof in question may not be automatically verified is
no more relevant than the fact that so many proofs in mathematics have
not been automatically verified. Besides, if the proof turns out to be
wrong, we at least have an automated mechanism for finding witnesses to
the error: runtime tag checks will generate an error. Such detection is
not guaranteed, but again, "proof" does not imply "automation".

What I've just described walks like a type and talks like a type, or at
least it appears to do so according to Pierce's definition. We can
classify many terms in a given dynamically-typed program on this basis
(although some languages may be better at supporting this than others).

So, on what grounds do you reject this as not being an example of a
type, or at the very least, something which has clear and obvious
connections to formal types, as opposed to simply being arbitrary
programmer reasoning?

Is it the lack of a formalized type system, perhaps? I assume you
recognize that it's not difficult to define such a system.

Regarding errors, we haven't proved that the function in question can
never be called with something other than an int, but we haven't claimed
to prove that, so there's no problem there. I've already described how
errors in our proofs can be detected.

Another possible objection is that I'm talking about local cases, and
not making any claims about extending it to an entire program (other
than to observe that it can be done). But let's take this a step at a
time: considered in isolation, if we can assign types to terms at the
local level in a program, do you agree that these are really types, in
the type theory sense?

If you were to agree, then we could move on to looking at what it means
to have many local situations in which we have fairly well-defined
types, which may all be defined within a common type system.

As an initial next step, we could simply rely on the good old universal
type everywhere else in the program - it ought to be possible to make
the program well-typed in that case, it would just mean that the
provable properties would be nowhere near as pervasive as with a
traditional statically-typed program. But the point is we would now
have put our types into a formal context.

In the same post here, you simultaneously suppose that there's something
inherently informal about the type reasoning in dynamic languages; and
then talk about the type system "in the static sense" of a dynamic
language. How is this possibly consistent?

The point about inherent informality is just that if you fully formalize
a static type system for a dynamic language, such that entire programs
can be statically typed, you're likely to end up with a static type
system with the same disadvantages that the dynamic language was trying
to avoid.

However, that doesn't preclude type system being defined which can be
used to reason locally about dynamic programs. Some people actually do
this, albeit informally.

No. What we have is quite a bit of evidence about properties remaining
true in dynamically typed programs, which could have been verified by
static typing.

Using my example above, at least some of those properties would have
been verified by manual static typing. So those properties, at least,
seem to be types, at least w.r.t. the local fragment they're proved within.

I agree with that statement. However, I think the conversation
regarding static typing is also over when you decide that the answer is
to weaken static typing until the word applies to informal reasoning.
If the goal isn't to reach a formal understanding, then the word "static
type" shouldn't be used

Well, I'm trying to use the term "latent type", as a way to avoid the
ambiguity of "type" alone, to distinguish it from "static type", and to
get away from the obvious problems with "dynamic type".

But I'm much less interested in the term itself than in the issues
surrounding dealing with "real" types in dynamically-checked programs -
if someone had a better term, I'd be all in favor of it.

and when that is the goal, it still shouldn't
be applied to process that aren't formalized until they manage to become
formalized.

This comes back to the phrase I highlighted at the beginning of this
message: "performed entirely on the program before execution".
Formalizing local reasoning about types is pretty trivial, in many
cases. It's just that there are other cases where that's less
straightforward.

So when well-typed program fragments are considered as part of a larger
untyped program, you're suggesting that this so radically changes the
picture, that we can no longer distinguish the types we identified as
being anything beyond programmer reasoning in general? All the hard
work we did figuring out the types, writing them down, writing
assertions for them, and testing the program to look for violations
evaporates into the epistemological black hole that exists outside the
boundaries of formal type theory?

Hopefully, this isn't perceived as too picky. I've already conceded
that we can use "type" in a way that's incompatible with all existing
research literature.

Not *all* research literature. There's literature on various notions of
dynamic type.

I would, however, like to retain some word with
actually has that meaning. Otherwise, we are just going to be
linguistically prevented from discussing it.

For now, you're probably stuck with "static type". But I think it was
never likely to be the case that type theory would succeed in absconding
with the only technically valid use of the English word "type".
Although not for want of trying!

Besides, I think it's worth considering why Bertrand Russell used the
term "types" in the first place. His work had deliberate connections to
the real world. Type theory in computer science unavoidably has similar
connections. You could switch to calling it Zoltar Theory, and you'd
still have to deal with the fact that a zoltar would have numerous
connections to the English word "type". In CS, we don't have the luxury
of using the classic mathematician's excuse when confronted with
inconvenient philosophical issues, of claiming that type theory is just
a formal symbolic game, with no meaning outside the formalism.

Anton

[1]
http://people.cs.uchicago.edu/~robby/pubs/papers/ho-contracts-techreport.pdf

Anton van Straaten · Jun 26, 2006

John said:
Actually I have never developed a C/C++ program
without a bounds checker the last 15 years.
It checks all memory references and on program shutdown
checks for memory leaks. What is it about you guys that make you blind
to these fact's.

You misunderstand -- for the purposes of the above comparison, a bounds
checker serves essentially the same purpose as "dynamic memory page
protection checks". The point is that it happens dynamically, i.e. at
runtime, and that there's a lack of static guarantees about memory
safety in C or C++. That's why, as I said, the comparison to latent vs.
static typing is an apt one.

Anton

Anton van Straaten · Jun 26, 2006

Joachim said:
Ah, finally that terminology starts to make sense to me. I have been
wondering whether there's any useful difference between "latent" and
"run-time" typing. (I tend to avoid the term "dynamic typing" because
it's overloaded with too many vague ideas.)

Right, that's the reason for using a different term. Runtime and
dynamic types are both usually associated with the idea of tagged
values, and don't address the types of program phrases.

This seems to apply to latent types as well.

That's what I meant, yes.

Actually the set of latent types seems to be the set of possible static
type schemes.
Um, well, a superset of these - static type schemes tend to be slightly
less expressive than what the programmer in his head. (Most type schemes
cannot really express things like "the range of this index variable is
such-and-so", and the boundary to general assertions about the code is
quite blurry anyway.)

Yes, although this raises the type theory objection of how you know
something is a type if you haven't formalized it. For the purposes of
comparison to static types, I'm inclined to be conservative and stick to
things that have close correlates in traditional static types.

Would that be a profound difference, or is it just that annotating a
value with a full type expression would cause just too much runtime
overhead?

It's a profound difference. The issue is that it's not just the values
that need to be annotated with types, it's also other program terms. In
addition, during a single run of a program, all it can ever normally do
is record the types seen on the path followed during that run, which
doesn't get you to static types of terms. To figure out the static
types, you really need to do static analysis.

Of course, static types give an approximation of the actual types of
values that flow at runtime, and if you come at it from the other
direction, recording the types of values flowing through terms on
multiple runs, you can get an approximation to the static type. Some
systems use that kind of thing for method lookup optimization.

In your terminology:

Would it be possible to establish such a correspondence, would it be
common consensus that such a system should be called "tags" anymore, or
are there other, even more profound differences?

There's a non 1:1 correspondence which I gave an example of in the
quoted message. For 1:1 correspondence, you'd need to associate types
with terms, which would need some static analysis. It might result in
an interesting program browsing and debugging tool...

Anton

Chris Smith · Jun 26, 2006

Anton van Straaten said:
I'm not trying to call programmer reasoning in general a type system.
I'd certainly agree that a programmer muddling his way through the
development of a program, perhaps using a debugger to find all his
problems empirically, may not be reasoning about anything that's
sufficiently close to types to call them that.

Let me be clear, then. I am far less agreeable than you that this
couldn't be called a type system. One of my goals here is to try to
understand, if we are going to work out what can and can't be called a
type system in human thought, how we can possible draw a boundary
between kinds of logic and say that one of them is a type system while
the other is not. I haven't seen much of a convincing way to do it.
The only things that I can see rejecting out of hand are the empirical
things; confidence gained through testing or observation of the program
in a debugger. Everything else seems to qualify, and that's the
problem.

So let's go ahead and agree about everything up until...

[...]

As an initial next step, we could simply rely on the good old universal
type everywhere else in the program - it ought to be possible to make
the program well-typed in that case, it would just mean that the
provable properties would be nowhere near as pervasive as with a
traditional statically-typed program.

It would, in fact, mean that interesting provable properties would only
be provable if data can't flow outside of those individual small bits
where the programmer reasoned with types. Of course that's not true, or
the programmer wouldn't have written the rest of the program in the
first place. If we define the universal type as a variant type
(basically, tagged or something like it, so that the type can be tested
at runtime) then certain statements become provable, of the form "either
X or some error condition is raised because a value is inappropriate".
That is indeed a very useful property; but wouldn't it be easier to
simply prove it by appealing to the language semantics that say that "if
not X, the operation raises an exception," or to the existence of
assertions within the function that verify X, or some other such thing
(which must exist, or the statement wouldn't be true)?

But the point is we would now
have put our types into a formal context.

Yes, but somewhat vacuously so...

The point about inherent informality is just that if you fully formalize
a static type system for a dynamic language, such that entire programs
can be statically typed, you're likely to end up with a static type
system with the same disadvantages that the dynamic language was trying
to avoid.

Okay, so you've forced the admission that you have a static type system
that isn't checked and doesn't prove anything interesting. If you
extended the static type system to cover the whole program, then you'd
have a statically typed language that lacks an implementation of the
type checker.

I don't see what we lose in generality by saying that the language lacks
a static type system, since it isn't checked, and doesn't prove anything
anyway. The remaining statement is that the programmer may use static
types to reason about the code. But, once again, the problem here is
that I don't see any meaning to that. It would basically mean the same
thing as that the programmer may use logic to reason about code. That
isn't particularly surprising to me. I suppose this is why I'm
searching for what is meant by saying that programmers reason about
types, and am having that conversation in several places throughout this
thread... because otherwise, it doesn't make sense to point out that
programmers reason with types.

Well, I'm trying to use the term "latent type", as a way to avoid the
ambiguity of "type" alone, to distinguish it from "static type", and to
get away from the obvious problems with "dynamic type".

I replied to your saying something to Marshall about the "type systems
(in the static sense) of dynamically-typed languages." That's what my
comments are addressing. If you somehow meant something other than the
static sense, then I suppose this was all a big misunderstanding.

But I'm much less interested in the term itself than in the issues
surrounding dealing with "real" types in dynamically-checked programs -
if someone had a better term, I'd be all in favor of it.

I will only repeat again that in static type systems, the "type" becomes
a degenerate concept when it is removed from the type system. It is
only the type system that makes something a type. Therefore, if you
want "real" types, then my first intuition is that you should avoid
looking in the static types of programming languages. At best, they
happen to coincide frequently with "real" types, but that's only a
pragmatic issue if it means anything at all.

So when well-typed program fragments are considered as part of a larger
untyped program, you're suggesting that this so radically changes the
picture, that we can no longer distinguish the types we identified as
being anything beyond programmer reasoning in general?

Is this so awfully surprising, when the type system itself is just
programmer reasoning? When there ceases to be anything that's provable
about the program, there also ceases to be anything meaningful about the
type system. You still have the reasoning, and that's all you ever had.
You haven't really lost anything, except a name for it. Since I doubt
the name had much meaning at all (as explained above), I don't count
that as a great loss. If you do, though, then this simple demonstrates
that you didn't really mean "type systems (in the static sense)".

Not *all* research literature. There's literature on various notions of
dynamic type.

I haven't seen it, but I'll take your word that some of it exists. I
wouldn't tend to read research literature on dynamic types, so that
could explain the discrepancy.

For now, you're probably stuck with "static type".

That's fine. I replied to your attempting to apply "static type" in a
sense where it doesn't make sense.

In CS, we don't have the luxury
of using the classic mathematician's excuse when confronted with
inconvenient philosophical issues, of claiming that type theory is just
a formal symbolic game, with no meaning outside the formalism.

As it turns out, my whole purpose here is to address the idea that type
theory for programming languages is limited to some subset of problems
which people happen to think of as type problems. If type theorists
fourty years ago had limited themselves to considering what people
thought of as types back then, we'd be in a considerably worse position
today with regard to the expressive power of types in commonly used
programming languages. Hence, your implication that insisting on a
formal rather than intuitive definition is opposed to the real-world
usefulness of the field is actually rather ironic. The danger lies in
assuming that unsolved problems are unsolvable, and therefore defining
them out of the discussion until the very idea that someone would solve
that problem with these techniques starts to seem strange.

Ketil Malde · Jun 26, 2006

But I mean the value as the semantics of the program itself sees it.
Which mostly means the datum in memory.

I don't agree with that. Generally, a language specifies a virtual
machine and doesn't need to concern itself with things like "memory"
at all. Although langauges like C tries to, look at all the undefined
behavior you get when you make assumptions about memory layout etc.

Memory representation is just an artifact of a particular
implementation of the language for a particular architecture.

-k

Ketil Malde · Jun 26, 2006

Chris Smith said:
I've since abandoned any attempt to be picky about use of the word
"type". That was a mistake on my part. I still think it's legitimate
to object to statements of the form "statically typed languages X, but
dynamically typed languages Y", in which it is implied that Y is
distinct from X. When I used the word "type" above, I was adopting the
working definition of a type from the dynamic sense.

Umm...what *is* the definition of a "dynamic type"? In this thread
it's been argued from tags and memory representation to (almost?)
arbitrary predicates.

That is, I'm considering whether statically typed languages may be
considered to also have dynamic types, and it's pretty clear to me
that they do.

Well, if you equate tags with dynamic types:

data Universal = Str String | Int Integer | Real Double
| List [Universal] ...

A pattern match failure can then be seen as a dynamic type error.

Another question is how to treat a system (and I think C++'s RTTI
qualifies, and probably related languages like Java) where static type
information is available at runtime. Does this fit with the term
"dynamic typing" with "typing" in the same sense as "static typing"?

-k

Ketil Malde · Jun 26, 2006

So the compiler now attempts to prove theorems about the program, but
once it has done so it uses the results merely to optimize its runtime
behavior and then throws the results away.

Hmm... here's my compiler front end (for Haskell or other languages
that ..er.. define 'undefined'):

while(type error) {
replace an incorrectly typed function with 'undefined'
}

Wouldn't the resulting program still be statically typed?

In practice I prefer to (and do) define troublesome functions as
'undefined' manually during development.

-k

Ketil Malde · Jun 26, 2006

Anton van Straaten said:
But a program as seen by the programmer has types: the programmer
performs (static) type inference when reasoning about the program, and
debugs those inferences when debugging the program, finally ending up
with a program which has a perfectly good type scheme.

I'd be tempted to go further, and say that the programmer performs
(informally, incompletely, and probably incorrectly ) a proof of
correctness, and that type correctness is just a part of general
correctness.

-k

Dirk Thierbach · Jun 26, 2006

Maybe the paper "Linear types and non-size-increasing polynomial time
computation" by Martin Hofmann is interesting in this respect.
From the abstract:

We propose a linear type system with recursion operators for inductive
datatypes which ensures that all definable functions are polynomial
time computable. The system improves upon previous such systems in
that recursive definitions can be arbitrarily nested; in particular,
no predicativity or modality restrictions are made.

It does not only ensure termination, but termination in polynomial time,
so you can use those types to carry information about this as well.

If the annotation marks not-provably-terminating subprograms, then it
calls attention to those subprograms. This is what we want, since it is
less safe/correct to use a nonterminating subprogram than a terminating
one (in some contexts).

That would be certainly nice, but it may be almost impossible to do in
practice. It's already hard enough to guarantee termination with the
extra information present in the type annotation. If this information
is not present, then the language has to be probably restricted so
severely to ensure termination that it is more or less useless.

- Dirk

Andreas Rossberg · Jun 26, 2006

Gabriel said:
(e-mail address removed)-sb.de writes:

| think that it is too relevant for the discussion at hand. Moreover,
| Harper talks about a relative concept of "C-safety".

Then, I believe you missed the entire point.

I think the part of my reply you snipped addressed it well enough.

Anyway, I can't believe that we actually need to argue about the fact
that - for any *useful* and *practical* notion of safety - C is *not* a
safe language. I refrain from continuing the discussion along this line,
because *that* is *really* silly.

- Andreas

David Hopwood · Jun 26, 2006

Matthias said:
I agree with Bob Harper about safety being language-specific and all
that. But, with all due respect, I think his characterization of C is
not accurate. [...]
AFAIC, C is C-unsafe by Bob's reasoning.
Agreed.

Of course, C can be made safe quite easily:

Define a state "undefined" that is considered "safe" and add a
transition to "undefined" wherever necessary.

I wouldn't say that was "quite easy" at all.

C99 4 #2:
# If a "shall" or "shall not" requirement that appears outside of a constraint
# is violated, the behavior is undefined. Undefined behavior is otherwise
# indicated in this International Standard by the words "undefined behavior"
# *or by the omission of any explicit definition of behavior*. [...]

In other words, to fix C to be a safe language (compatible with Standard C89
or C99), you first have to resolve all the ambiguities in the standard where
the behaviour is *implicitly* undefined. There are a lot of them.

Matthias Blume · Jun 26, 2006

David Hopwood said:
Matthias said:

I agree with Bob Harper about safety being language-specific and all
that. But, with all due respect, I think his characterization of C is
not accurate. [...]
AFAIC, C is C-unsafe by Bob's reasoning.
Agreed.

Of course, C can be made safe quite easily:

Define a state "undefined" that is considered "safe" and add a
transition to "undefined" wherever necessary.

Click to expand...

I wouldn't say that was "quite easy" at all.

C99 4 #2:
# If a "shall" or "shall not" requirement that appears outside of a constraint
# is violated, the behavior is undefined. Undefined behavior is otherwise
# indicated in this International Standard by the words "undefined behavior"
# *or by the omission of any explicit definition of behavior*. [...]

In other words, to fix C to be a safe language (compatible with Standard C89
or C99), you first have to resolve all the ambiguities in the standard where
the behaviour is *implicitly* undefined. There are a lot of them.

Yes, if you want to make the transition system completely explict, it
won't be easy. I was thinking of a catch-all rule that says:
transition to "undefined" unless specified otherwise. (Note that I am
not actually advocating this approach to making a language "safe".
For practical purposes, C is unsafe. (And so is C++.))

What is Expresiveness in a Computer Language?	24	Jul 10, 2005
proliferation of computer languages	8	Jul 18, 2008
is laziness a programer's virtue?	13	Apr 16, 2007
Hacker News, Xahlee.Org, and What is Politics?	1	Mar 15, 2010
is laziness a programer's virtue?	12	Apr 15, 2007
Math Notations, Computer Languages, and the “Form” in Formalism	4	Aug 31, 2009
general function for sorting a matrix	4	Aug 29, 2007
A Lambda Logo Tour	5	Apr 5, 2006

What is Expressiveness in a Computer Language

Chris F Clark

David Hopwood

Gabriel Dos Reis

Gabriel Dos Reis

Darren New

Chris Smith

Matthias Blume

John Thingstad

Anton van Straaten

Anton van Straaten

Anton van Straaten

Chris Smith

Ketil Malde

Ketil Malde

Ketil Malde

Ketil Malde

Dirk Thierbach

Andreas Rossberg

David Hopwood

Matthias Blume

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads