Re: "Strong typing vs. strong testing"

P

Paul Rubin

Raffael Cavallaro said:
I think you're mistaken. fib(0) = 0, fib(1) = 1, fib(2) = 1, fib(3) =
2 ... fib(11)= 89 ...

Whoops, you're right, I messed up my program while refactoring it. Sorry.
you like we can do it iteratively instead, which, as in haskel, takes
no perceptible time:

I couldn't tell whether the earlier version with the expt 5 was a
direct solution of the recurrence, or what.
2. the result of fib(1000) is going to overflow c integer types

Yes, it's a real shame that most cpu's these days don't have a hardware
trap for int overflow. That would allow much safer programming for the
large amount of code where the programmer expects to never overflow
but can be caught by surprise.
 
S

Steven D'Aprano

Actually, it's hard to find a language that has no compiler generating
faster code than C...

Perl. Python. Ruby. Applescript. Hypertalk. Tcl. RPL. Frink. Inform 7.
ActionScript. Dylan. Emerald. And hundreds more serious languages, to say
nothing of dozens of esoteric languages like Chef, Oook, Whitespace,
Shakespeare, and Intercal.

It's actually hard to think of languages with implementations that
*consistently* beat or match the gcc C compiler: C++ and Forth come to
mind immediately. Lua and Javascript sometimes manages to beat C. Scala,
Java, Pascal, Fortran, Ada often come close, where "close" means within a
factor of 5 times slower.

See http://shootout.alioth.debian.org/ for more benchmarks.

Of course, all benchmarks are flawed, if not broken. The very concept of
"which language is faster" is nonsense. High-level languages aren't
executed directly, and if they were, the parse/interpret/execute cycle is
*so slow* that they would all lose (Think of the speed of scripting
languages like bash.) That's why virtually all languages are compiled, to
byte-code that is executed in a virtual machine, or machine code that is
executed directly in a real machine. So the real questions are:

* which virtual machines are faster when performing a certain task?
* which compilers generate the best (virtual or real) machine code for a
given task?

and most interestingly:

* given the competing demands of ease of development and maintenance,
versus the demands of speed of execution, versus correctness of code,
which language is best for my specific needs?

Which is why we're not all writing in COBOL.
 
N

Nick Keighley

It also shows that for languages such as C, you cannot limit the unit tests
to the types declared for the function, but that you should try all the
possible values of the language.

I'm not sure what you mean by "all the possible values of the
language". But some meanings of that statement are plainly wrong! For
instance you don't test every possible pair of ints in a max()
function (maybe 2^64 distinct tests!)
Which basically, is the same as with dynamically typed programming
language, only now, some unit tests will fail early, when trying to
compile them while others will give wrong results later.

<snip>
 
N

Nick Keighley

On 2010-10-01, RG <[email protected]> wrote:
Those goal posts are sorta red shifted at this point. [...]
Red shifted?
Moving away fast enough that their color has visibly changed.

doppler shift for instance or one of them cosmological thingies when
space itself stretches

Any time something is true "by definition" that is an indication that
it's not a particularly useful fact.

I'm not sure I agree. On occaision knowing something is true-by-
definition is very useful!
 The whole concept of "type" is a
red herring.  It's like this: there are some properties of programs that
can be determined statically, and others that can't.  Some of the
properties that can't be determined statically matter in practice.  But
all of the properties that can be determined statically can also be
determined dynamically.

sometimes with quite a lot of effort. I know the halting problem gets
thrown around a lot but to execute every path may take an inordinate
amount of effort.
 The *only* advantage that static analysis has
is that *sometimes* it can determine *some* properties of a program
faster or with less effort than a dynamic approach would.
agreed


What I take issue with is the implication made by advocates of static
analysis that static analysis is somehow *inherently* superior to
dynamic analysis,

probably a fair point

that static analysis can provide some sort of
"guarantee" of reliability that actually has some sort of practical
meaning in the real world.  It doesn't.  The net effect of static
analysis in the real world is to make programmers complacent about
properties of programs that can only be determined at run time, to make
them think that compiling without errors means something,

who are these people? I don't think I've ever met any programmer that
claimed a clean compile implied a correct program. Ada people come
close with things like "when you finally get an Ada program to compile
you've got a good chance it's actually right". Part of their
justification is that you tend to have to think hard about your type
system up front. But given the fields Ada programmers typically
operate in I don't think they'd dare ship a program that merely
compiled.
and that if a
program compiles without errors then there is no need for run-time
checking of any sort.

again this is a massive straw man. Yes, people who claim this are
wrong. But they are much less prevalent than you claim.

Though the lack of enthusiasm I encounter when I suggest the liberal
use of assert() is a data point in your favour.
 *You* may not believe these things, but the vast
majority of programmers who use statically typed languages do believe
these things, even if only tacitly.

not the programmers I've encountered
 The result is a world where
software by and large is a horrific mess of stack overflows, null
pointer exceptions, core dumps, and security holes.

I'm not saying that static analysis is not useful.  It is.  What I'm
saying is that static analysis is nowhere near as useful as its
advocates like to imply that it is.  And it's better to forego static
analysis and *know* that you're flying without a net at run-time than to
use it and think that you're safe when you're really not.

obviously some of us disagree
 
G

guthrie

An interesting archive article on the topic of correctness, and the
layers thereof:

Program verification: the very idea;
Communications of the ACM
Volume 31 , Issue 9 (September 1988)
Pages: 1048 - 1063
Year of Publication: 1988
ISSN:0001-0782
"The notion of program verification appears to trade upon an
equivocation. Algorithms, as logical structures, are appropriate
subjects for deductive verification. Programs, as causal models of
those structures, are not. The success of program verification as a
generally applicable and completely reliable method for guaranteeing
program performance is not even a theoretical possibility."

And:
The proof of correctness wars;
Communications of the ACM archive
Volume 45 , Issue 8 (August 2002)
COLUMN: Practical programmer Pages: 19 - 21
Year of Publication: 2002
ISSN:0001-0782
"Whether mild or raging, wars about topics ranging from programming
languages to methods of indentation are healthy for our field"

One central point in the discussions is that the correctness of a
(running) program is different than just that of a piece of code - it
also depends on the compiler, OS, and underlying hardware layers.
 
A

Aleksej Saushev

RG said:
There are only two possibilities: either you have a finite-state
machine, or you have a Turning machine. (Well, OK, you could have a
pushdown automaton, but there are no programming languages that model a
PDA. Well, OK, there's Forth, but AFAIK there are no static type
checkers for Forth.
http://home.vrweb.de/stephan.becher/forth/

Besides, who uses Forth? ;-)


(JFYI, Forth has at least 2 stacks, thus it is equivalent to Turing Machine.)
 
S

salil

The /most/ correct version of maximum() function is probably one written
in Haskell as:

maximum :: Integer -> Integer -> Integer
maximum a b = if a > b then a else b

Integer in Haskell has infinite precision (like python's int, only
bounded by memory), but Haskell also have static type checking, so you
can't pass just any arbitrary objects.

But even then, it's still not 100% correct. If you pass a really large
values that exhaust the memory, the maximum() could still produce
unwanted result.

Second problem is that Haskell has Int, the bounded integer, and if you
have a calculation in Int that overflowed in some previous calculation,
then you can still get an incorrect result. In practice, the
type-agnostic language with *mandatory* infinite precision arithmetic
wins in terms of correctness. Any language which only has optional
infinite precision arithmetic can always produce erroneous result.


I have not programmed in Haskell that much, but I think Haskell
inferences type "Integer" (the infinite precision) by default and not
"Int" (finite precision) type for the integers. So, the programmer who
specifically mentions "Int" in the signature of the function, is
basically overriding this default behavior for specific reasons
relevant to the application, for example, for performance. I think
Haskell's way is the right. It is providing "safe behavior" as
default and at the same time treating programmer as adults, at least
in this case.

I think dynamic languages are attractive because they make programs
less verbose. But, statically typed languages with type inference
(Haskell, OCaML, Scala, F#) is a very good compromise because they
offer both type safety and succinctness. And when we need algorithms
that should work the same independent of types, Haskell has
typeclasses which are pretty intuitive, unlike the horrible C++
templates.
 
P

Pascal Costanza

I have not programmed in Haskell that much, but I think Haskell
inferences type "Integer" (the infinite precision) by default and not
"Int" (finite precision) type for the integers. So, the programmer who
specifically mentions "Int" in the signature of the function, is
basically overriding this default behavior for specific reasons
relevant to the application, for example, for performance. I think
Haskell's way is the right. It is providing "safe behavior" as
default and at the same time treating programmer as adults, at least
in this case.

I think dynamic languages are attractive because they make programs
less verbose. But, statically typed languages with type inference
(Haskell, OCaML, Scala, F#) is a very good compromise because they
offer both type safety and succinctness. And when we need algorithms
that should work the same independent of types, Haskell has
typeclasses which are pretty intuitive, unlike the horrible C++
templates.

Static typing still doesn't mesh well with certain kinds of reflection.


Pascal
 
K

Keith H Duggar

That the problem is "elsewhere in the program" ought to be small
comfort.  But very well, try this instead:

[ron@mighty:~]$ cat foo.c
#include <stdio.h>

int maximum(int a, int b) { return a > b ? a : b; }

int main() {
  long x = 8589934592;
  printf("Max of %ld and 1 is %d\n", x, maximum(x,1));
  return 0;}

[ron@mighty:~]$ gcc -Wall foo.c
[ron@mighty:~]$ ./a.out
Max of 8589934592 and 1 is 1

$ gcc -Wconversion -Werror foo.c
cc1: warnings being treated as errors
foo.c: In function 'main':
foo.c:5: warning: passing argument 1 of 'maximum' with different width
due to prototype

It's called "learning to compile". And, yes, those warnings (and
nearly
every other one) should be enabled and treated as errors if a shop
wants
maximum protection. I only wish more (library) vendors did so.

KHD
 
R

RG

Keith H Duggar said:
That the problem is "elsewhere in the program" ought to be small
comfort.  But very well, try this instead:

[ron@mighty:~]$ cat foo.c
#include <stdio.h>

int maximum(int a, int b) { return a > b ? a : b; }

int main() {
  long x = 8589934592;
  printf("Max of %ld and 1 is %d\n", x, maximum(x,1));
  return 0;}

[ron@mighty:~]$ gcc -Wall foo.c
[ron@mighty:~]$ ./a.out
Max of 8589934592 and 1 is 1

$ gcc -Wconversion -Werror foo.c
cc1: warnings being treated as errors
foo.c: In function 'main':
foo.c:5: warning: passing argument 1 of 'maximum' with different width
due to prototype

It's called "learning to compile".

It's called "being late for the game." This has already been raised and
addressed. Read the rest of the thread.

rg
 
P

Pascal J. Bourguignon

Keith H Duggar said:
That the problem is "elsewhere in the program" ought to be small
comfort.  But very well, try this instead:

[ron@mighty:~]$ cat foo.c
#include <stdio.h>

int maximum(int a, int b) { return a > b ? a : b; }

int main() {
  long x = 8589934592;
  printf("Max of %ld and 1 is %d\n", x, maximum(x,1));
  return 0;}

[ron@mighty:~]$ gcc -Wall foo.c
[ron@mighty:~]$ ./a.out
Max of 8589934592 and 1 is 1

$ gcc -Wconversion -Werror foo.c
cc1: warnings being treated as errors
foo.c: In function 'main':
foo.c:5: warning: passing argument 1 of 'maximum' with different width
due to prototype

It's called "learning to compile". And, yes, those warnings (and
nearly
every other one) should be enabled and treated as errors if a shop
wants
maximum protection. I only wish more (library) vendors did so.

So you're wishing that they'd be active by default.
 
L

Lie Ryan

I wasn't speaking generally, just in the case of which of only two
choices RG's code should be referred to - "blowing up" or "giving an
invalid answer".

I can make any application never blows up. Just put this in the main()
of every application you write:

while True:
try:
... program goes here ...
except:
pass

Done. Program never halted execution, it never blows up. My program is
bug free, heck why do thousands of computer scientists can't figure out
that such a simple construct will fix all bugs once and for all?
I think error handling in personal computer and website software has
improved over the years but there is still some room for improvement
as you will still get error messages that don't tell you something you
can relay to tech support more than that an error occurred or that
some operation can't be performed.

I think the ideal error message tells the user exactly what's wrong and
how to fix the problem. A generic error message like "This program
encountered a problem, and need to close." is as useless as an error
message could be.
But I worked with programmers doing in-house software who were
incredibly turned off by exception handling in C++. I thought that
meant that they preferred to return and check error codes from
functions as they had done in C, and for some of them it did seem to
mean that.

Introduce them to RAII (Resource Acquisition Is Initialization), that's
a superior alternative to exception handling in C++.
But for others it seemed that they didn't want to
anticipate errors at all ("that file is always gonna be there!"). I
read a Java book by Deitel and Deitel and they pointed out what might
have lead to that attitude - the homework and test solutions in
college usually didn't require much if any error handling - the
student could assume files were present, data was all there and in the
format expected, user input was valid and complete, etc.

Personally, the amount of error message I'd design into the program
depends on the expected level of proficiency of the expected users. If
I'm developing for an in-house software, and most of the users of my
program are expected to be experts or have necessary training, I'd be
less inclined to include sophisticated error checking in favor of
simplicity (but I usually work in high-level language, where errors like
non-existent file will throw an exception instead of passing silently,
so take it with a grain of salt). If I'm designing software for a
general public though, I'd include multitudes of idiot checking.
 
L

Lie Ryan

I'm not sure I agree. On occaision knowing something is true-by-
definition is very useful!

Something that is true by definition is just as useful as saying: "my
program is correct, by definition, because my requirement is what my
code is doing". It's a circular argument, your program requirement, for
which the program is supposed to be tested against, is the code itself;
so whatever undesirable behavior the program might have is parts of the
requirement, so the program is, by definition, bug free and it's user
expectation that's wrong.
 
L

Lie Ryan

So, the programmer who
specifically mentions "Int" in the signature of the function, is
basically overriding this default behavior for specific reasons
relevant to the application, for example, for performance. I think
Haskell's way is the right.

I agree that in general, what Haskell did is correct, and a perfectly
reasonable design decision. In terms of *correctness* though, mandatory
infinite precision is slightly more correct, as you can't have a
clueless programmer writing Int, because he read somewhere that it's
faster, and blowing up the program. In practice, this never matters, as
we don't expect to see a lot of clueless programmers.
It is providing "safe behavior" as
default and at the same time treating programmer as adults, at least
in this case.

I think dynamic languages are attractive because they make programs
less verbose. But, statically typed languages with type inference
(Haskell, OCaML, Scala, F#) is a very good compromise because they
offer both type safety and succinctness. And when we need algorithms
that should work the same independent of types, Haskell has
typeclasses which are pretty intuitive, unlike the horrible C++
templates.

Agreed. Dynamic language and implicit static typing are two very good
compromises for writing succinct codes. Implicit static typing lends to
better type safety, while dynamic typing lends to better reflection and
meta-programming. Both while preserving code succinctness.
 
L

Lie Ryan

But if you had to implement a VM directly in hardware, which one (of the
several varieties) would you choose?

Virtual Machine in Hardware... isn't that a contradiction?
 
M

Martin Gregorie

Virtual Machine in Hardware... isn't that a contradiction?
Nope. Several mainframes did that.

Two that I knew well were both British - the ICL 1900 and 2900. The
Burroughs x700 series also used hardware virtualisation. Both Burroughs
and 2900 series of machines could even run different addressing and
virtual hardware architectures, e.g. word-addressing vs. byte addressing,
in each virtual machine.

The 1900's 'hardware' registers, PC, CCR, etc were the first few words
in each program's address space. That was late 60s tech, so scarcely a
new concept nowadays.

The ICL 2900 and, by extension Multics, since the 2900 hardware's process
protection scheme was swiped from Multics, used a different approach -
again hardware-implemented virtual per-process registers but in addition
'rings of protection' which meant the OS could affect your code but you
couldn't touch it and the OS in turn couldn't touch the hardware drivers
etc.
 
D

Dave Angel

Virtual Machine in Hardware... isn't that a contradiction?
Intel's processors aren't just hardware, they already have lots of
microcode, as has nearly every machine built in the last 40 years or
so. However, the hardware is heavily biased towards the "Intel x86
instruction" architecture. But there are at least 3 vm's already in the
Pentium processor: one emulates the 8086 PC model, real mode, another
is 32bit, and the third is 64 bit.

Years ago, somebody microcoded the UCSD p machine, which is probably the
logical forerunner to the byte code of both the jvm and the Python byte
code architecture. The machine was pretty quick, but it wasn't a
commercial success, perhaps in part because of thermal problems, but
primarily because it 'wasn't compatible."

DaveA
 
V

Vic Kelson

There's a large existing body of knowledge on dimensional analysis
(it's a very important tool for physics, for instance), and obviously
the answer is to do whatever it does.  Raising to any power is fine, I
think (but transcendental functions, for instance, are never fine,
because they are equivalent to summing things with different
dimensions, which is obvious if you think about the Taylor expansion of
a transcendental function).

--tim

Umm.. Not so. The terms in the Taylor series are made of repeated
derivatives with respect to the same variable as the function's
argument. That is, in the vicinity of the value a,

f(x) = f(a) + f'(a)*(x-a)/1! + f''(a)*(x-a)^2/2! + f'''(a)(x-a)^3/3!
+ ...

each of the derivatives has units that are the reciprocal of the units
of the (x-a)^n terms, e.g. the second term's "units" would be "units
of f per units of x" * "the units of x", which is simply "the units of
f". It had better be true that each of the terms in the series has the
same units. Otherwise, they could not be summed.

That said, I'm having a hard time thinking of a transcendental
function that doesn't take a dimensionless argument, e.g. what on
earth would be the units of ln(4.0 ft)?

--v
 
T

Torben Ægidius Mogensen

That said, I'm having a hard time thinking of a transcendental
function that doesn't take a dimensionless argument, e.g. what on
earth would be the units of ln(4.0 ft)?

Trigonometric functions do take arguments of particular units: radians
or (less often) degrees, with conversion needed if you use the "wrong"
unit.

Torben
 
B

Ben

Trigonometric functions do take arguments of particular units: radians
or (less often) degrees, with conversion needed if you use the "wrong"
unit.

        Torben

Angles aren't "true" units, as they are ratios of two lengths. They
are more of a "pseudo" unit.

Ben
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,169
Messages
2,570,919
Members
47,459
Latest member
Vida00R129

Latest Threads

Top