Re: "Strong typing vs. strong testing"

S

Squeamizh

 Squeamizh said:
<996bd4e6-37ff-4a55-8db5-6e7574fbd...@k22g2000prb.googlegroups.com>,
On Sep 27, 12:58 am, (e-mail address removed) (Pascal J. Bourguignon)
wrote:
<7df0eb06-9be1-4c9c-8057-e9fdb7f0b...@q16g2000prf.googlegroups.com
,
This might have been mentioned here before, but I just came
across
it: a
2003 essay by Bruce Eckel on how reliable systems can get
built in
dynamically-typed languages.  It echoes things we've all said
here, but
I think it's interesting because it describes a conversion
experience:
Eckel started out in the strong-typing camp and was won over.
   https://docs.google.com/View?id=dcsvntt2_25wpjvbbhk
-- Scott
If you are writing a function to determine the maximum of two
numbers
passed as arguents in a dynamic typed language, what is the
normal
procedure used by Eckel and others to handle someone passing in
invalid values - such as a file handle for one varible and an
array
for the other?
The normal procedure is to hit such a person over the head with a
stick
and shout "FOO".
Moreover, the functions returning the maximum may be able to work
on
non-numbers, as long as they're comparable.  What's more, there are
numbers that are NOT comparable by the operator you're thinking
about!.
So to implement your specifications, that function would have to be
implemented for example as:
(defmethod lessp ((x real) (y real)) (< x y))
(defmethod lessp ((x complex) (y complex))
  (or (< (real-part x) (real-part y))
      (and (= (real-part x) (real-part y))
           (< (imag-part x) (imag-part y)))))
(defun maximum (a b)
  (if (lessp a b) b a))
And then the client of that function could very well add methods:
(defmethod lessp ((x symbol) (y t)) (lessp (string x) y))
(defmethod lessp ((x t) (y symbol)) (lessp x (string y)))
(defmethod lessp ((x string) (y string)) (string< x y))
and call:
(maximum 'hello "WORLD") --> "WORLD"
and who are you to forbid it!?
--
__Pascal Bourguignon__                  
 http://www.informatimago.com/-Hidequotedtext-
- Show quoted text -
in C I can have a function maximum(int a, int b) that will always
work. Never blow up, and never give an invalid answer. If someone
tries to call it incorrectly it is a compile error.
In a dynamic typed language maximum(a, b) can be called with
incorrect
datatypes. Even if I make it so it can handle many types as you did
above, it could still be inadvertantly called with a file handle for
a
parameter or some other type not provided for. So does Eckel and
others, when they are writing their dynamically typed code advocate
just letting the function blow up or give a bogus answer, or do they
check for valid types passed? If they are checking for valid types it
would seem that any benefits gained by not specifying type are lost
by
checking for type. And if they don't check for type it would seem
that
their code's error handling is poor.
that is a lie.
Compilation only makes sure that values provided at compilation-time
are of the right datatype.
What happens though is that in the real world, pretty much all
computation depends on user provided values at runtime.  See where are
we heading?
this works at compilation time without warnings:
int m=numbermax( 2, 6 );
this too:
int a, b, m;
scanf( "%d", &a );
scanf( "%d", &b );
m=numbermax( a, b );
no compiler issues, but will not work just as much as in python if
user provides "foo" and "bar" for a and b... fail.
What you do if you're feeling insecure and paranoid?  Just what
dynamically typed languages do:  add runtime checks.  Unit tests are
great to assert those.
Fact is:  almost all user data from the external words comes into
programs as strings.  No typesystem or compiler handles this fact all
that graceful...
I disagree with your conclusion.  Sure, the data was textual when it
was initially read by the program, but that should only be relevant to
the input processing code.  The data is likely converted to some
internal representation immediately after it is read and validated,
and in a sanely-designed program, it maintains this representation
throughout its life time.  If the structure of some data needs to
change during development, the compiler of a statically-typed language
will automatically tell you about any client code that was not updated
to account for the change.  Dynamically typed languages do not provide
this assurance.
This is a red herring.  You don't have to invoke run-time input to
demonstrate bugs in a statically typed language that are not caught by
the compiler.  For example:
[ron@mighty:~]$ cat foo.c
#include <stdio.h>
int maximum(int a, int b) {
  return (a > b ? a : b);
}
int foo(int x) { return 9223372036854775807+x; }
int main () {
  printf("%d\n", maximum(foo(1), 1));
  return 0;}
[ron@mighty:~]$ gcc -Wall foo.c
[ron@mighty:~]$ ./a.out
1
Even simple arithmetic is Turing-complete, so catching all type-related
errors at compile time would entail solving the halting problem.
rg
In short, static typing doesn't solve all conceivable problems.

More specifically, the claim made above:
in C I can have a function maximum(int a, int b) that will always
work. Never blow up, and never give an invalid answer.

is false.  And it is not necessary to invoke the vagaries of run-time
input to demonstrate that it is false.

OK. You finished your post with a reference to the halting problem,
which does not help to bolster any practical argument. That is why I
summarized your post in the manner I did.

I agree that static typed languages do not prevent these types of
overflow errors.
 
S

Seebs

That the problem is "elsewhere in the program" ought to be small
comfort.

It is, perhaps, but it's also an important technical point: You CAN write
correct code for such a thing.
int maximum(int a, int b) { return a > b ? a : b; }
int main() {
long x = 8589934592;
printf("Max of %ld and 1 is %d\n", x, maximum(x,1));

You invoked implementation-defined behavior here by calling maximum() with
a value which was outside the range. The defined behavior is that the
arguments are converted to the given type, namely int. The conversion
is implementation-defined and could include yielding an implementation-defined
signal which aborts execution.

Again, the maximum() function is 100% correct -- your call of it is incorrect.
You didn't pass it the right sort of data. That's your problem.

(And no, the lack of a diagnostic doesn't necessarily prove anything; see
the gcc documentation for details of what it does when converting an out
of range value into a signed type, it may well have done exactly what it
is defined to do.)

-s
 
K

Keith Thompson

RG said:
That the problem is "elsewhere in the program" ought to be small
comfort.

I don't claim that it's comforting, merely that it's true.
But very well, try this instead:

[ron@mighty:~]$ cat foo.c
#include <stdio.h>

int maximum(int a, int b) { return a > b ? a : b; }

int main() {
long x = 8589934592;
printf("Max of %ld and 1 is %d\n", x, maximum(x,1));
return 0;
}
[ron@mighty:~]$ gcc -Wall foo.c
[ron@mighty:~]$ ./a.out
Max of 8589934592 and 1 is 1

That exhibits a very similar problem.

8589934592 is 2**33.

Given the output you got, I presume your system has 32-bit int and
64-bit long. The call maximum(x, 1) implicitly converts the long
value 8589934592 to int. The result is implementation-defined,
but typically 0. So maximum() is called with arguments of 0 and 1,
as you could see by adding a printf call to maximum().

Even here, maximum() did exactly what was asked of it.

I'll grant you that having a conversion from a larger type to a smaller
type quietly discard high-order bits is unfriendly. But it matches the
behavior of most CPUs.

Here's another example:

#include <stdio.h>

int maximum(int a, int b) { return a > b ? a : b; }

int main(void) {
double x = 1.8;
printf("Max of %f and 1 is %d\n", x, maximum(x, 1));
return 0;
}

Output:

Max of 1.800000 and 1 is 1
 
I

Ian Collins

It is, perhaps, but it's also an important technical point: You CAN write
correct code for such a thing.



You invoked implementation-defined behavior here by calling maximum() with
a value which was outside the range. The defined behavior is that the
arguments are converted to the given type, namely int. The conversion
is implementation-defined and could include yielding an implementation-defined
signal which aborts execution.

Again, the maximum() function is 100% correct -- your call of it is incorrect.
You didn't pass it the right sort of data. That's your problem.

(And no, the lack of a diagnostic doesn't necessarily prove anything; see
the gcc documentation for details of what it does when converting an out
of range value into a signed type, it may well have done exactly what it
is defined to do.)

Note that the mistake can be diagnosed:

lint /tmp/u.c -m64 -errchk=all
(7) warning: passing 64-bit integer arg, expecting 32-bit integer:
maximum(arg 1)
 
R

RG

Keith Thompson said:
RG said:
That the problem is "elsewhere in the program" ought to be small
comfort.

I don't claim that it's comforting, merely that it's true.
But very well, try this instead:

[ron@mighty:~]$ cat foo.c
#include <stdio.h>

int maximum(int a, int b) { return a > b ? a : b; }

int main() {
long x = 8589934592;
printf("Max of %ld and 1 is %d\n", x, maximum(x,1));
return 0;
}
[ron@mighty:~]$ gcc -Wall foo.c
[ron@mighty:~]$ ./a.out
Max of 8589934592 and 1 is 1

That exhibits a very similar problem.

8589934592 is 2**33.

Given the output you got, I presume your system has 32-bit int and
64-bit long. The call maximum(x, 1) implicitly converts the long
value 8589934592 to int. The result is implementation-defined,
but typically 0. So maximum() is called with arguments of 0 and 1,
as you could see by adding a printf call to maximum().

Even here, maximum() did exactly what was asked of it.

Of course. Computers always do only exactly what you ask of them. On
this view there is, by definition, no such thing as a bug, only
specifications that don't correspond to one's intentions.
Unfortunately, correspondence to intentions is the thing that actually
matters when writing code.
I'll grant you that having a conversion from a larger type to a smaller
type quietly discard high-order bits is unfriendly.

"Unfriendly" is not the adjective that I would choose to describe this
behavior.

There is a whole hierarchy of this sort of "unfriendly" behavior, some
of which can be caught at compile time using a corresponding hierarchy
of ever more sophisticated tools. But sooner or later if you are using
Turing-complete operations you will encounter the halting problem, at
which point your compile-time tools will fail. (c.f. the Collatz
problem)

I'm not saying one should not use compile-time tools, only that one
should not rely on them. "Compiling without errors" is not -- and
cannot ever be -- be a synonym for "bug-free."

rg
 
S

Seebs

Of course. Computers always do only exactly what you ask of them. On
this view there is, by definition, no such thing as a bug, only
specifications that don't correspond to one's intentions.

f00f.

That said... I think you're missing Keith's point.
Unfortunately, correspondence to intentions is the thing that actually
matters when writing code.

Yes. Nonetheless, the maximum() function does exactly what it is intended
to do *with the inputs it receives*. The failure is outside the function;
it did the right thing with the data actually passed to it, the problem
was a user misunderstanding as to what data were being passed to it.

So there's a bug -- there's code which does not do what it was intended
to do. However, that bug is in the caller, not in the maximum()
function.

This is an important distinction -- it means we can write a function
which performs that function reliably. Now we just need to figure out
how to call it with valid data... :)

-s
 
I

Ian Collins

I'm not saying one should not use compile-time tools, only that one
should not rely on them. "Compiling without errors" is not -- and
cannot ever be -- be a synonym for "bug-free."

We is why wee all have run time tools called unit tests, don't we?
 
L

Lie Ryan

It is, perhaps, but it's also an important technical point: You CAN write
correct code for such a thing.



You invoked implementation-defined behavior here by calling maximum() with
a value which was outside the range. The defined behavior is that the
arguments are converted to the given type, namely int. The conversion
is implementation-defined and could include yielding an implementation-defined
signal which aborts execution.

Again, the maximum() function is 100% correct -- your call of it is incorrect.
You didn't pass it the right sort of data. That's your problem.

That argument can be made for dynamic language as well. If you write in
dynamic language (e.g. python):

def maximum(a, b):
return a if a > b else b

The dynamic language's version of maximum() function is 100% correct --
if you passed an uncomparable object, instead of a number, your call of
it is incorrect; you just didn't pass the right sort of data. And that's
your problem as a caller.

In fact, since Python's integer is infinite precision (only bounded by
available memory); in practice, Python's version of maximum() has less
chance of producing erroneous result.

The /most/ correct version of maximum() function is probably one written
in Haskell as:

maximum :: Integer -> Integer -> Integer
maximum a b = if a > b then a else b

Integer in Haskell has infinite precision (like python's int, only
bounded by memory), but Haskell also have static type checking, so you
can't pass just any arbitrary objects.

But even then, it's still not 100% correct. If you pass a really large
values that exhaust the memory, the maximum() could still produce
unwanted result.

Second problem is that Haskell has Int, the bounded integer, and if you
have a calculation in Int that overflowed in some previous calculation,
then you can still get an incorrect result. In practice, the
type-agnostic language with *mandatory* infinite precision arithmetic
wins in terms of correctness. Any language which only has optional
infinite precision arithmetic can always produce erroneous result.

Anyone can dream of 100% correct program; but anyone who believes they
can write a 100% correct program is just a dreamer. In reality, we don't
usually need 100% correct program; we just need a program that runs
correctly enough most of the times that the 0.0000001% chance of
producing erroneous result becomes irrelevant.

In summary, in this particular case with maximum() function, static
checking does not help in producing the most correct code; if you need
to ensure the highest correctness, you must use a language with
*mandatory* infinite precision integers.
 
T

TheFlyingDutchman

More specifically, the claim made above:


is false.  And it is not necessary to invoke the vagaries of run-time
input to demonstrate that it is false.
I don't think you demonstrated it is false. Any values larger than an
int get truncated before they ever get to maximum. The problem does
not lie with the maximum function. It correctly returns the maximum of
whatever two integers it is provided. Calling it with values that are
larger than an int, that get converted to an int _before_ maximum is
called, is an issue outside of maximum.
 
K

Keith Thompson

RG said:
Of course. Computers always do only exactly what you ask of them. On
this view there is, by definition, no such thing as a bug, only
specifications that don't correspond to one's intentions.
Unfortunately, correspondence to intentions is the thing that actually
matters when writing code.

Of course there's such a thing as a bug.

This version of maximum:

int maximum(int a, int b) {
return a > b ? a : a;
}

has a bug. This version:

int maximum(int a, int b) {
return a > b ? a : b;
}

I would argue, does not. The fact that it might be included in a
buggy program does not mean that it is itself buggy.

[...]
I'm not saying one should not use compile-time tools, only that one
should not rely on them. "Compiling without errors" is not -- and
cannot ever be -- be a synonym for "bug-free."

Agreed. (Though C does make it notoriously easy to sneak buggy code
past the compiler.)
 
T

TheFlyingDutchman

That argument can be made for dynamic language as well. If you write in
dynamic language (e.g. python):

def maximum(a, b):
    return a if a > b else b

The dynamic language's version of maximum() function is 100% correct --
if you passed an uncomparable object, instead of a number, your call of
it is incorrect; you just didn't pass the right sort of data. And that's
your problem as a caller.

In fact, since Python's integer is infinite precision (only bounded by
available memory); in practice, Python's version of maximum() has less
chance of producing erroneous result.

"in C I can have a function maximum(int a, int b) that will always
work. Never blow up, and never give an invalid answer. "

Dynamic typed languages like Python fail in this case on "Never blows
up".
 
R

RG

I'm not saying one should not use compile-time tools, only that one
should not rely on them. "Compiling without errors" is not -- and
cannot ever be -- be a synonym for "bug-free."

Agreed. (Though C does make it notoriously easy to sneak buggy code
past the compiler.)[/QUOTE]

Let's just leave it at that then.

rg
 
R

RG

Seebs said:
f00f.

That said... I think you're missing Keith's point.


Yes. Nonetheless, the maximum() function does exactly what it is intended
to do *with the inputs it receives*. The failure is outside the function;
it did the right thing with the data actually passed to it, the problem
was a user misunderstanding as to what data were being passed to it.

So there's a bug -- there's code which does not do what it was intended
to do. However, that bug is in the caller, not in the maximum()
function.

This is an important distinction -- it means we can write a function
which performs that function reliably. Now we just need to figure out
how to call it with valid data... :)

We lost some important context somewhere along the line:

Please take note of the second sentence.

One way or another, this claim is plainly false. The point I was trying
to make is not so much that the claim is false (someone else was already
doing that), but that it can be demonstrated to be false without having
to rely on any run-time input.

rg
 
N

Nick

Ian Collins said:
We is why wee all have run time tools called unit tests, don't we?

But you have to know a lot about the language to know that there's a
problem. You cannot sensibly test your max function on every
combination of (even int) input which it's designed for (and, of course,
it works for those).
 
I

Ian Collins

The /most/ correct version of maximum() function is probably one written
in Haskell as:

maximum :: Integer -> Integer -> Integer
maximum a b = if a> b then a else b

Integer in Haskell has infinite precision (like python's int, only
bounded by memory), but Haskell also have static type checking, so you
can't pass just any arbitrary objects.

But even then, it's still not 100% correct. If you pass a really large
values that exhaust the memory, the maximum() could still produce
unwanted result.

Second problem is that Haskell has Int, the bounded integer, and if you
have a calculation in Int that overflowed in some previous calculation,
then you can still get an incorrect result. In practice, the
type-agnostic language with *mandatory* infinite precision arithmetic
wins in terms of correctness. Any language which only has optional
infinite precision arithmetic can always produce erroneous result.

Anyone can dream of 100% correct program; but anyone who believes they
can write a 100% correct program is just a dreamer. In reality, we don't
usually need 100% correct program; we just need a program that runs
correctly enough most of the times that the 0.0000001% chance of
producing erroneous result becomes irrelevant.

In summary, in this particular case with maximum() function, static
checking does not help in producing the most correct code; if you need
to ensure the highest correctness, you must use a language with
*mandatory* infinite precision integers.

Or using the new suffix return syntax in C++0x. Something like

template <typename T0, typename T1>
[] maximum( T0 a, T1 b) { return a > b ? a : b; }

Where the return type is deduced at compile time.
 
T

TheFlyingDutchman

We lost some important context somewhere along the line:


Please take note of the second sentence.

One way or another, this claim is plainly false.  The point I was trying
to make is not so much that the claim is false (someone else was already
doing that), but that it can be demonstrated to be false without having
to rely on any run-time input.

The second sentence is not disproved by a cast from one datatype to
another (which changes the value) that happens before maximum() is
called.
 
P

Paul Rubin

in C I can have a function maximum(int a, int b) that will always
The second sentence is not disproved by a cast from one datatype to
another (which changes the value) that happens before maximum() is called.

int maximum(int a, int b);

int foo() {
int (*barf)() = maximum;
return barf(3);
}

This compiles fine for me. Where is the cast? Where is the error message?
Are you saying barf(3) doesn't call maximum?
 
P

Pascal J. Bourguignon

Ian Collins said:
The /most/ correct version of maximum() function is probably one written
in Haskell as:

maximum :: Integer -> Integer -> Integer
maximum a b = if a> b then a else b

Integer in Haskell has infinite precision (like python's int, only
bounded by memory), but Haskell also have static type checking, so you
can't pass just any arbitrary objects.

But even then, it's still not 100% correct. If you pass a really large
values that exhaust the memory, the maximum() could still produce
unwanted result.

Second problem is that Haskell has Int, the bounded integer, and if you
have a calculation in Int that overflowed in some previous calculation,
then you can still get an incorrect result. In practice, the
type-agnostic language with *mandatory* infinite precision arithmetic
wins in terms of correctness. Any language which only has optional
infinite precision arithmetic can always produce erroneous result.

Anyone can dream of 100% correct program; but anyone who believes they
can write a 100% correct program is just a dreamer. In reality, we don't
usually need 100% correct program; we just need a program that runs
correctly enough most of the times that the 0.0000001% chance of
producing erroneous result becomes irrelevant.

In summary, in this particular case with maximum() function, static
checking does not help in producing the most correct code; if you need
to ensure the highest correctness, you must use a language with
*mandatory* infinite precision integers.

Or using the new suffix return syntax in C++0x. Something like

template <typename T0, typename T1>
[] maximum( T0 a, T1 b) { return a > b ? a : b; }

Where the return type is deduced at compile time.

Indeed. This is generic programming. And it happens that in Lisp (and
I assume in languages such as Python), sinte types are not checked at
compilation time, all the functions you write are always generic
functions.

In particular, the property "arguments are not comparable" is not
something that can be determined at compilation time, since the program
may add a compare method for the given argument at run-time (if the
comparison operator used is a generic function).
 
R

RG

TheFlyingDutchman said:
The second sentence is not disproved by a cast from one datatype to
another (which changes the value) that happens before maximum() is
called.

You can't have it both ways. Either I am calling it incorrectly, in
which case I should get a compiler error, or I am calling it correctly,
and I should get the right answer. That I got neither does in fact
falsify the claim. The only way out of this is to say that
maximum(8589934592, 1) returning 1 is in fact "correct", in which case
we'll just have to agree to disagree.

rg
 
T

TheFlyingDutchman

    int maximum(int a, int b);

    int foo() {
      int (*barf)() = maximum;
      return barf(3);
    }

This compiles fine for me.  Where is the cast?  Where is the error message?
Are you saying barf(3) doesn't call maximum?

With Tiny C on my system, your code does not cause maximum to give an
incorrect value, or to blow up:

int maximum(int a, int b)
{
printf("entering maximum %d %d\n",a,b);
if ( a > b )
return a;
else
return b;
}

int foo()
{
int (*barf)() = maximum;
return barf(3);
}

int main (int argc, char *argv[])
{
printf("maximum is %d\n",foo());
}

------------- output -----------------------------------
entering maximum 3 4198400
maximum is 4198400
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,169
Messages
2,570,920
Members
47,463
Latest member
FinleyMoye

Latest Threads

Top