Re: "Strong typing vs. strong testing"

T

TheFlyingDutchman

I would agree that the third sentence is arguably wrong, simply
because there's no such thing (outside #error) of a mandate to stop
compiling.  However, my understanding was that the dispute was over
the second sentence, and that's certainly correct.
Why do you consider the term "compile error" a "mandate to stop
compiling"? What do you say to refer to the situation when you have a
statement in your program that the compiler finds is an error? And is
it really material whether the compiler flagged an error and stopped,
or flagged an error and looked for additional errors???
 
T

TheFlyingDutchman

        Message-ID:
        <0497e39d-6bd1-429d-a86f-f4c89babe...@u31g2000pru.googlegroups.com>
        From: TheFlyingDutchman <[email protected]>
        Newsgroups: comp.lang.lisp
        [...]
        in C I can have a function maximum(int a, int b) that will always
        work. Never blow up, and never give an invalid answer. If someone
        tries to call it incorrectly it is a compile error.
        [...]
___________________________________________________________________________­___
_
Don Geddis                  http://don.geddis.org/             
(e-mail address removed)

Thanks, Don.

rg-

Thanks from me as well, Don. I was worried that people would start to
believe that the original statement was what you said it was:

"I'm not even saying it's a flaw in the language. All I'm saying is
that
the original claim -- that any error in a C program will be caught by
the compiler -- is false, and more specifically, that it can be
demonstrated to be false without appeal to unknown run-time input."
 
P

Pascal J. Bourguignon

Gene said:
The FA or TM dichotomy is more painful to contemplate than you say.
Making appropriate simplifications for input, any modern computer is a
FA with 2^(a few trillion) states. Consequently, the gestalt of
computer science seems to be to take it on faith that at some very
large number of states, the FA behavior makes a transition to TM
behavior for all possible practical purposes (and I mean all). So
what is it--really--that's trivial to analyze? And what is
impossible? I'm sorry this is drifting OT and will stop here.


Don't worry, this thread is becoming interesting at least.
 
S

Steven D'Aprano

So far as I know, that actually just means that the test suite is
insufficient.

+1 QOTW


Now can we (by which I mean *you*) stop cross-posting C talk to multiple
newsgroups that don't have anything to do with C?



Thank you.
 
N

Nick Keighley

Fact is:  almost all user data from the external words comes into
programs as strings.  No typesystem or compiler handles this fact all
that graceful...

snobol?
 
P

Pascal J. Bourguignon

Seebs said:
I would agree that the third sentence is arguably wrong, simply
because there's no such thing (outside #error) of a mandate to stop
compiling. However, my understanding was that the dispute was over
the second sentence, and that's certainly correct.

The obvious simple maximum() in C will not raise an exception nor return
something which isn't an int in any program which is not on its face
invalid in the call. This is by definite contrast with several of the
interpreted languages,

This has nothing to do with the fact that these languages have
implementations using the interpreter pattern instead of a compiler.

Matter of fact, most Common Lisp implementations just do not have any
interpreter! (Which doesn't prevent them to have a REPL).

where a function or subroutine like that cannot
specify that its argument must be some kind of integer.

This is correct, but this is not a characteristic of dynamic programming
languages. There are dynamic programming languages where you can
declare the type of the parameters, to allow for static checking. For
example in Common Lisp (where these declarations are advisory, so the
compiler is free to take them into account or not, depending on what it
can infer from its own side, so any problem here is just a warning: the
program can always handle the problems at run-time).
 
P

Pascal J. Bourguignon

rustom said:
Some points that seem to be missed (or Ive missed them?)

1. A dichotomy is being made between 'static' languages like C and
'dynamic' languages like python/lisp. This dichotomy was valid 30
years ago, not today. In Haskell for example

- static checking is stronger than in C/C++ -- its very hard if not
impossible to core dump haskell except through memory exhaustion

- dynamic-ness is almost that of python/lisp -- on can write
significant haskell programs without type-declaring a single variable/
function

You're confunding type strongness with the requirement that the
programmer should declare the types, and with the time when the types
are checked.

http://en.wikipedia.org/wiki/Comparison_of_programming_languages#Type_systems


type strong static explicit Ada
type strong static implicit Haskell
type weak static explicit C
type weak static implicit ?
type strong dynamic explicit (*)
type strong dynamic implicit Common Lisp
type weak dynamic explicit Objective-C
type weak dynamic implicit JavaScript


(*) Usually languages provide explicit typing as an option, but can
deal with implicit typing, when they're dynamic.


There are also a few languages with no type checking, such as assembler
or Forth.


Much more mainstream, C# is almost as 'managed' as dynamic languages
and has efficiency comparable to C.

Nothing extraordinary here. Common Lisp is more efficient than C.
http://www.lrde.epita.fr/~didier/research/verna.06.ecoop.pdf
http://portal.acm.org/citation.cfm?id=1144168

Actually, it's hard to find a language that has no compiler generating
faster code than C...

2. The dichotomy above misses a more pervasive dichotomy -- hardware
vs software -- as real today as 30 years ago.

To see this let us lift the discussion from that of *languages* C vs
Python/Lisp to philosophies:
-- C-philosophy: the purpose of type-checking is to maximize (runtime)
efficiency
-- Lisp-philosophy: the purpose of type-checking is zero-errors (aka
seg-faults) via continuous checks at all levels.

If one is honest (and not polemical :) ) it would admitted that both
sides are needed in different contexts.

Now Dijkstra pointed (40 years ago) in Discipline of Programming that
this unfortunate dilemma arises due to lack of hardware support. I am
unable to reproduce the elegance and succinctness of his language but
the argument is as follows:

Let us say that for a typical profile of a computer we have for every
one instruction of the pathological one typified by the maximum
function, a trillion 'normal' instructions. This is what he calls a
very-skew test -- an if-then-else that checks this would go the if-way
way one trillion times for one else-way. It is natural for a
programmer to feel the pinch of these trillion checks and (be inclined
to) throw them away.

If however the check was put into hardware there would be no such
dilemma. If every arithmetic operation was always checked for overflow
*by hardware* even languages committed to efficiency like C could trap
on errors with no extra cost.
Likewise Lisp/python-like languages could easily be made more
efficient.

The diff arises from the fact that software costs per use whereas
hardware costs per installation -- a transistor, unlike an if, does
not cost any more if its used once or a trillion times.

In short the problem is not C vs Lisp/Python but architectures like
Intel wherein:

1. an overflow bit harmlessly set by a compare operation is
indistinguishable from one set by a signed arithmetic operation --
almost certainly a problem

2. An into instruction (interrupt on overflow) must be inserted into
the software stream rather than raised as a hardware interrupt.

Hence the use of virtual machine: when your machine doesn't do what you
want, you have to write your own.

When Intel will realize that 99% of its users are running VM, perhaps
they'll start to wonder what they're making wrong...
 
P

Pascal J. Bourguignon

Seebs said:
So far as I know, that actually just means that the test suite is
insufficient. :)

Based on my experience thus far, anyway, I am pretty sure it's essentially
not what happens that the tests and code are both correct, and it is usually
the case either that the tests fail or that there are not enough tests.

It also shows that for languages such as C, you cannot limit the unit tests
to the types declared for the function, but that you should try all the
possible values of the language.

Which basically, is the same as with dynamically typed programming
language, only now, some unit tests will fail early, when trying to
compile them while others will give wrong results later.


static dynamic

compiler detects wrong type fail at compile fails at run-time
(with exception
explaining this is
the wrong type)

compiler passes wrong type wrong result fails at run-time
(the programmer (with exception
spends hours explaining this is
finding the the wrong type)
problem)

compiler passes correct type wrong result wrong result
(normal bug to be corrected)

compiler passes correct type correct result correct result
 
B

BartC

Pascal J. Bourguignon said:

It seems that to make Lisp fast, you have to introduce static typing. Which
is not much different to writing in C, other than the syntax.
Actually, it's hard to find a language that has no compiler generating
faster code than C...

But those implementers have to try very hard to beat C. Meanwhile C can be
plenty fast without doing anything special.
When Intel will realize that 99% of its users are running VM

Which one?
 
R

Rui Maciel

namekuseijin said:
that is a lie.

Compilation only makes sure that values provided at compilation-time
are of the right datatype.

What happens though is that in the real world, pretty much all
computation depends on user provided values at runtime. See where are
we heading?

You are confusing two completely different and independent concepts, which is a language's typing
sytem and input validation. TheFlyingDutchman pointed out the typical problems associated with
weakly typed languages while you tried to contradict him by complaining about input sanity issues.
The thing is, input sanity issues are perfectly independent of a language's typing system.
Therefore, arguing about the need to perform sanity checks on programs written on language X or Y
does nothing to tackle the issues related to passing a variable/object of the "wrong" type as a
parameter to some function.


Rui Maciel
 
P

Pascal J. Bourguignon

BartC said:
It seems that to make Lisp fast, you have to introduce static
typing. Which is not much different to writing in C, other than the
syntax.


But those implementers have to try very hard to beat C. Meanwhile C
can be plenty fast without doing anything special.


Which one?

Any implementation of a controlled environment is a virtual machine.
Sometimes it is explicitely defined, such as in clisp, parot or jvm, but
more often it is implicit, such as in sbcl, or worse, developed in an
ad-hoc way in applications (eg. written in C++).
 
R

Rui Maciel

George said:
That's true. But it is a situation where the conversion to SI units
loses precision and therefore probably shouldn't be done.

I don't care to check it ... the fact that the SI unit involves 12
decimal places whereas the imperial unit involves 3 tells me the
conversion probably shouldn't be done in a program that wants
accuracy.


Your comment is absurd for multiple reasons. As we are focusing on the computational aspect of
this issue then I will only say this: If we are dealing with a computer representation of numbers
then, as long as the numeric data type provides enough precision, it is perfectly irrelevant if a
decimal representation of a certain number involves 12 or 3 decimal places. The only precision
issues which affect a calculation is the ones arising from a) the ability to exactly represent a
certain number in a specific representation and b) the precision errors produced by arithmetic
operations.


Rui Maciel
 
B

BartC

Pascal J. Bourguignon said:
Any implementation of a controlled environment is a virtual machine.
Sometimes it is explicitely defined, such as in clisp, parot or jvm, but
more often it is implicit, such as in sbcl, or worse, developed in an
ad-hoc way in applications (eg. written in C++).

But if you had to implement a VM directly in hardware, which one (of the
several varieties) would you choose?

And having chosen one, how would that impact the performance of a language
with an incompatible VM?

Perhaps processors executing native code as it is now, aren't such a bad
idea.
 
R

Rui Maciel

Pascal said:

I don't know if you are intentionally trying to be deceitful or if you honestly didn't spent much
time thinking about this issue. To be brief I will only point out the following topics:


a) no language is inherently more or less efficient than any other language. The efficiency
aspect is only related to how those languages are implemented (i.e., the investments made in
optimizing the compilers/interpreters)
b) Just because someone invested enough effort to optimize a specific implementation of language X
to run, under a specific scenario, a benchmark faster than some other implementation of language Y
it doesn't mean that language X's implementation outperforms or even matches every implementation
of language Y under every conceivable scenario.


Regarding the links that you've provided, again I don't know if you intended to be dishonest or if
you simply didn't read them. The first link, entitled "Beating C in Scientiï¬c Computing
Applications On the Behavior and Performance of LISP, Part 1", basically compares a highly
optimized implementation of lisp (quite literally the "current state of the art in COMMON -LISP
compiler technology") with a standard, run of the mill C implementation by performing a very
specific benchmark. If that wasn't enough, the C implementation they adopted to represent C was
none other than GCC 4.0.3. As we all know, the 4.0 branch of GCC was still experimental an ran
notoriously worse than the 3.4 branch[1].

But even though you've ignored this, the article's authors haven't. They've stated the following
on their article:

<quote>
We must admit however that this point of view is not totally unjustiï¬ed. Recent studies (Neuss,
2003; Quam, 2005) on various numerical computation algorithms ï¬nd that LISP code compiled with C
MU - CL can run at 60% of the speed of equivalent C code.
</quote>

So, where exactly do you base your claims?


Actually, it's hard to find a language that has no compiler generating
faster code than C...

Once again, I don't know if you are intentionally trying to be deceitful. If an undergraduate
student happens to write a C compiler for a compiler class which employs no optimization
whatsoevere then that will not mean that every single C compiler is incapable of generating
efficient code.


Rui Maciel


[1] http://coyotegulch.com/reviews/gcc4/index.html
 
S

Seebs

Again, you can't have it both ways. Either a warning is a "compiler
error" according to the claim at issue (see below) or it is not. If it
is, then this is a false positive.

No, it isn't. It's a correctly identified type mismatch.

You keep moving the goal posts from the actual standard of a false positive
(the compiler warns that something is of the wrong type when it's not of
the wrong type) to a made-up standard (the compiler warns that something is
of the wrong type when it is indeed of the wrong type, but could be safely
converted to the right type).

It doesn't matter whether, in a given case, you *could* safely perform
the conversion. If you don't perform the conversion, and the compiler points
this out, that's not a false positive.
At this point I would like to quote a wise man who once said:
Red shifted?

Moving away fast enough that their color has visibly changed.
The truth of this claim hinges on the definitions of "work", "never blow
up", "invalid", "call incorrectly" and "compile error." Define these
however you like, the result will be that the claim is either false or
vacuous.

Not really. If you use the most obvious and natural meanings *for a
statically typed language*, it is obvious that it is true.

And indeed, significantly so. In the real world, programs written in
scripting languages with runtime typing are fairly likely to throw occasional
exceptions because something is of the wrong type. In a statically typed
language, the of-the-wrong-type is something which can, by definition, be
caught at compile time.

The fundamental thing you seem to be getting stuck on is that you're assuming
that if a conversion could be made, that it should be and it should be
automatic and silent. That, however, is at odds with discussion of a
statically typed language. There's a reason we have the option of converting
things from one type to another.

-s
 
S

Seebs

Why do you consider the term "compile error" a "mandate to stop
compiling"?

Because that's what people normally mean -- compilation failed.
What do you say to refer to the situation when you have a
statement in your program that the compiler finds is an error? And is
it really material whether the compiler flagged an error and stopped,
or flagged an error and looked for additional errors???

It might be, because someone might argue that if the compiler will generate
code for a bad construct, it hasn't really produced a "compiler error", just
a warning.

-s
 
S

Seebs

This has nothing to do with the fact that these languages have
implementations using the interpreter pattern instead of a compiler.

True, but it's a good enough statistical correlation to serve in many
cases.

-s
 
S

Seebs

static dynamic

compiler detects wrong type fail at compile fails at run-time
(with exception
explaining this is
the wrong type)

Unless, of course, the "wrong type" happens to be compatible enough to
pass. In which case, it's unclear whether it is the "wrong type" or not.
compiler passes wrong type wrong result fails at run-time
(the programmer (with exception
spends hours explaining this is
finding the the wrong type)
problem)

I have no clue what exact scenario you're talking about here. I've never
seen a bug that could plausibly be described as "compiler passes wrong
type" which wasn't picked up quickly by running with more warnings enabled.

And on the other end of things, it is not always obvious or straightforward
to figure out *why* the dynamic language has ended up with something of the
wrong type, or what's wrong with that type.

-s
 
S

Seebs

Now can we (by which I mean *you*) stop cross-posting C talk to multiple
newsgroups that don't have anything to do with C?

Fair enough. The original thread does seem to have been crossposted in
an innovative way.

-s
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,169
Messages
2,570,919
Members
47,460
Latest member
eibafima

Latest Threads

Top