size of a sizeof(pointer)

M

Michael Wojcik

Could the distinction between a compiler and an interpreter be that when
they encounter program code, compilers translate it into another
language, while interpreters execute it? In other words, more or less,
compilers store away code for later execution while interpreters execute
it when they see it?

I'd call that a usable definition as well. We just recently had this
discussion in alt.folklore.computers, actually, with much debate about
where "tokenizing" BASIC implementations fit, for example.

I think this definition is largely parallel to the one I offered. An
interpreter could process an entire program as a unit, but it doesn't
need to; it's job, put simply, is "what do I do next?". A compiler
does something very different: it transforms some unit of source that
is in some way (as defined by the language) sufficiently syntactically
complete into a target form. What makes a compiler a compiler is not
that it targets any particular form, but that transforms a unit of
source data into a unit of target data (and since a compiler is not
an interpreter - though there are combined compiler-interpreters -
it probably stores the result for later use).

(In practice this definition would also have to note that by
convention "compiler" refers specifically to a tool in a software
build chain, so transformation tools for non-program data wouldn't
be included.)
 
D

Dan Pop

In said:
Doesn't IEEE have an official dictionary of computer terms? Can
someone who has a copy look up "compiler"?

I don't have access to their dictionary. But I think that most people
would agree that the compilation process consists in the translation from
a higher level language to a lower level language. I.e. if someone ever
manages to write a translator from binary executables to meaningful C
source code, no one would call this translator a compiler. Not even Aho
and his buddies ;-)

FWIW, the Dictionary of Computing edited by Oxford Science Publications
gives the following definition:

Compiler: a program that translates high-level language into absolute
code, or sometimes into assembly language. The input to the compiler
(the source code) is a description of an algorithm or program in
a problem-oriented language; its output (the object code) is an
equivalent description of the algorithm in a machine-oriented language.


Dan
 
C

CBFalconer

Malcolm said:
.... snip ...
Well every system I know uses fixed-size pointers. There is one
main exception to the rule that the size of the pointer represents
the size of the address space, and that's quite an important one,
old x86 compilers with their segmented architecture.

No exception at all. Such systems either restricted the size of
the data area to 64k (which is adequately handled by a 16 bit
pointer) or used a 32 bit pointer.
 
C

CBFalconer

Malcolm said:
Michael Wojcik said:
C programmers working on the AS/400 will find that expectation
[that pointer dereferences compile to single machine
instructions] is incorrect. In C on the AS/400, *nothing*
compiles to machine instructions, single or otherwise. It
compiles to a pseudoassembly language called "MI".
This really is the exception that proves the point. A platform
that disallows native machine langauge programs cannot really be
said to have a compiler. Nor is C the ideal language for such an
environment - you need something which does memory management for
you.

Nonsense. There are various environments that do precisely that.
Ones known to me include:

UCSD p-code system
My own pcd system
Verifone terminals
Java Bytecode machines

all of which port binary executable modules across widely
differing machine architectures. Only the Verifone system was
specifically designed to accommodate C.
 
M

Malcolm

Michael Wojcik said:
It is by some of us. And I'm well aware of how language works,
thanks.
Ho ho. So you understand how language works do you?
I was simply hoping to point out that you were employing a
bogus argument through an unfortunately popular turn of phrase,
apparently under the misapprehension that its common misuse made it
logically valid. Now it appears that you are simply unable to
understand that it's a bogus argument.
The exception proves (tests) the rule. The exception to the rule that C
compilers don't provide bounds checking is not really a compiler (we can
argue about exactly how to define "compiler", the AS/400 is in the penumbra
of meaning, but as someone who understands how language works thank you you
will have appreciated that). The exception is an oddball, the rule holds.
See "mammals are viviparous" for another example.
Chapter and verse, please. (Oh, I realize this is standard C lore.
But the language does not *depend* on this trade-off, and it has
many admirable qualities which have nothing to do with it.)
See the other threads on bounds checking. There is no point using C if it
does not compile to efficient code, just as there is no point having a
sports car if you are driving it down a traffic-calmed street. However if
you already own a sports car you might not buy a hatchback for just one
journey.Because registers are probably 32 bits, and there are probably not too many
of them, so you will thrash memory.
So your contention is that C is a sensible language to use only on
platforms where it can be used in a dangerous mannner? Perhaps we
should take a little c.l.c poll. How many people here use C because
it lets you do unsafe things?
How many people use C because it compiles to efficient machine code, and
tolerate the lack of safety because putting in bounds checking would
compromise speed?
 
M

Malcolm

CBFalconer said:
Nonsense. There are various environments that do precisely that.
Ones known to me include:

UCSD p-code system
My own pcd system
Verifone terminals
Java Bytecode machines

all of which port binary executable modules across widely
differing machine architectures. Only the Verifone system was
specifically designed to accommodate C.
Well let's say we have a c interpreter. Someone decides to make the system a
bit more efficient, and strip out comments before passing the source to the
interpreter. Is the comment stripper a "compiler?".
Someone decides to speed things up a little bit more and tokenize the C
keywords (i.e. we use non-ASCII characters to represent "if" and "for" etc).
Is the tokenizer a compiler?
See where this is leading? At some point we will arrive at a "binary
executable module" and you will say "this is a compiler". But extension of
the word to marginal cases is arbitrary.
Of the examples you give only one supports C, suggesting that C isn't a good
langauge for interpreted bytecodes.
Why is memory management a good idea in these environments? Consider a
linked list. In C the user sets up the pointers himself. This means that
every time you traverse the list it is necessary to check each pointer for
validity. You also have problems because the addresses of the pointers
themselves are taken, so you need some system to prevent the pointers being
corrupted. This adds up to quite a lot of overhead.
In a language that manages memory for you, linked lists can be provided as
built in strutures. The user has no way of accessing the pointers, so
internally they can be represented as raw addresses with perfect safety.
 
J

Joona I Palaste

Malcolm said:
Well let's say we have a c interpreter. Someone decides to make the system a
bit more efficient, and strip out comments before passing the source to the
interpreter. Is the comment stripper a "compiler?".

Well, as much a compiler as the C preprocessor is. The resulting code
is also valid C source.
Someone decides to speed things up a little bit more and tokenize the C
keywords (i.e. we use non-ASCII characters to represent "if" and "for" etc).
Is the tokenizer a compiler?

Only a bit more than the comment stripper was.
See where this is leading? At some point we will arrive at a "binary
executable module" and you will say "this is a compiler". But extension of
the word to marginal cases is arbitrary.

I still think of javac (translating Java source code into bytecode) as
a compiler, even though you can't directly execute bytecode.
Look at it another way - isn't the distinction "only a translator that
outputs natively executable code can be called a compiler" also
arbitrary? Why single out natively executable code? The C compiling
process distinguishes between compiling and linking. Is the pure
compiler not a compiler, because unlinked object files aren't natively
executable?

(BTW, you forgot to ask if UNIX cat was a compiler. After all, it takes
in source code and outputs source code.)

--
/-- Joona Palaste ([email protected]) ------------- Finland --------\
\-- http://www.helsinki.fi/~palaste --------------------- rules! --------/
"The obvious mathematical breakthrough would be development of an easy way to
factor large prime numbers."
- Bill Gates
 
P

pete

Malcolm said:
At some point we will arrive at a "binary
executable module" and you will say "this is a compiler".

It has to be compiler before that point.
You can compile C files, which are not complete programs,
and the output is not executable.
 
K

Keith Thompson

Malcolm said:
Because registers are probably 32 bits, and there are probably not too many
of them, so you will thrash memory.
[...]

I know very little about the AS/400. I've never used one. But (and
this is just a wild guess), if a system has 16-byte pointers, perhaps
it would be designed so it can deal with them efficiently, either by
having 16-byte registers, or by having a large number of registers and
the ability to use several of them together, or by designing it so
that thrashing memory isn't a problem, or by doing some other very
clever thing that I haven't thought of, or by having a user community
that doesn't care that much about blazing speed.

Malcolm, unless you're much more familiar with the AS/400 than I am
(and more familiar with it than you've let on so far in this thread),
you might consider refraining from detailed speculation about it.
Discussions of AS/400 architecture are marginally topical here,
because they can illuminate the range of systems on which C can be
implemented and the assumptions (like word-sized pointers) that may
not be universally valid. But I suggest such discussions should be
based on actual knowledge of the architecture.

As for exceptions proving rules, the actual origin of the saying isn't
particularly relevant. One common use of it is that an exception
actually strengthens a rule -- for example, that finding a black swan
makes the statement that "all swans are white" more valid. This is,
of course, nonsense. When you used a variant of the saying upthread,
you gave the impression, probably unintentionally, that you were using
it in that common nonsensical way. I suggest we drop the discussion
of the origin of the saying (or take it to alt.usage.english) and
instead talk about what you actually meant.

What I think you actually meant is that because the C translation
system on the AS/400 doesn't generate machine code it's not a true
compiler. The issue is the meaning of the word "compiler". The C99
standard doesn't define the word compiler (it only uses it briefly in
one footnote), and I think the C90 standard is similar. Most of the
participants in this thread seem to think that a translator that
generates something other than machine code can legitimately be called
a compiler. There's a quote from Aho, Sethi, and Ullman that agrees
with that point of view. It might be time for you to drop the
argument unless you can actually prove that you're right and the rest
of us are wrong. This might also be a better topic for
alt.usage.english.

As far as the standard is concerned, the C translation system on the
AS/400, whatever we call it, is part of the implementation, along with
the library, the OS, and whatever else is needed to make C programs
run.
 
D

Dan Pop

In said:
No exception at all. Such systems either restricted the size of
the data area to 64k (which is adequately handled by a 16 bit
pointer) or used a 32 bit pointer.

Engage your brain, Chuck! The 8086 machines have a 20-bit address space
and their pointers have either 16 or 32 bits. In neither case, the
pointer size reflects the size of the address space.

Dan
 
F

Flash Gordon

Well let's say we have a c interpreter. Someone decides to make the
system a bit more efficient, and strip out comments before passing the
source to the interpreter. Is the comment stripper a "compiler?".
Someone decides to speed things up a little bit more and tokenize the
C keywords (i.e. we use non-ASCII characters to represent "if" and
"for" etc). Is the tokenizer a compiler?
See where this is leading? At some point we will arrive at a "binary
executable module" and you will say "this is a compiler". But
extension of the word to marginal cases is arbitrary.

I would suggest that a C compiler is a program that performs a
translation on the source code producing such diagnostics as required
by the standard such that if it was a valid source file when linked to
produce something the C standard says should be executable it can be
executed by the target environment. If the code is byte code, assembler
or anything else that is irrelevant.
Of the examples you give only one supports C, suggesting that C isn't
a good langauge for interpreted bytecodes.

Pascal is a compiled language and I've used a UCSD Pascal system.
Why is memory management a good idea in these environments? Consider a
linked list. In C the user sets up the pointers himself. This means
that every time you traverse the list it is necessary to check each
pointer for validity.

If you are talking about the byte code interpreter (or whatever) then
you are wrong. The C standard allows your environment to crash or do
anything else it happens to do if you use an invalid pointer.
You also have problems because the addresses of
the pointers themselves are taken, so you need some system to prevent
the pointers being corrupted. This adds up to quite a lot of overhead.

The byte code interpreter (or whatever) does NOT have to do that. It can
crash if your executable gets it wrong.
In a language that manages memory for you, linked lists can be
provided as built in strutures. The user has no way of accessing the
pointers, so internally they can be represented as raw addresses with
perfect safety.

For perfect safety it still has to do array bounds checking etc
otherwise you might have run off the end of an array and corrupted your
pointer, and if you don't then check your pointer who knows what will
happen?

A C implementation that uses some form of interpreter can do as little
checking as a C implementation that compiles to native machine code and
still be no less safe. However, systems like UCSD can provide the
ability to distribute binaries without knowing anything about the target
environment apart from it having a suitable interpreter.

Also, a lot of modern processors run microcode programs that interpret
the machine code, so even if you compile to machine code you could well
still have your program running under an interpreter.
 
J

Joona I Palaste

Well, as much a compiler as the C preprocessor is. The resulting code
is also valid C source.
Only a bit more than the comment stripper was.
I still think of javac (translating Java source code into bytecode) as
a compiler, even though you can't directly execute bytecode.
Look at it another way - isn't the distinction "only a translator that
outputs natively executable code can be called a compiler" also
arbitrary? Why single out natively executable code? The C compiling
process distinguishes between compiling and linking. Is the pure
compiler not a compiler, because unlinked object files aren't natively
executable?
(BTW, you forgot to ask if UNIX cat was a compiler. After all, it takes
in source code and outputs source code.)

Actually, a good definition for a compiler might be a translator for
translating a higher-level program into a lower-level one. If the
languages are of the same level it's a (pre)processor. If their levels
are the other way around it's a decompiler.

--
/-- Joona Palaste ([email protected]) ------------- Finland --------\
\-- http://www.helsinki.fi/~palaste --------------------- rules! --------/
"The large yellow ships hung in the sky in exactly the same way that bricks
don't."
- Douglas Adams
 
M

Malcolm

Keith Thompson said:
I know very little about the AS/400. I've never used one. But (and
this is just a wild guess), if a system has 16-byte pointers, perhaps
it would be designed so it can deal with them efficiently, either by
having 16-byte registers, or by having a large number of registers and
the ability to use several of them together, or by designing it so
that thrashing memory isn't a problem, or by doing some other very
clever thing that I haven't thought of, or by having a user community
that doesn't care that much about blazing speed.
Obviously I don't know exactly how the AS/400 works, but I do know that if
pointers are 16 bytes then they will be slow. Since the pointer contains
bounds information as well as a validity bit then you need at least three
registers to process it (address, bounds, validity) even if the registers
are 128 bits wide. Unless of course the various pieces of information can be
combined in one instruction, but then the bytecode is supposed to be
platform independent, sp using something that can only be implemeted
efficiently on special hardware is cheating.
As for exceptions proving rules, the actual origin of the saying isn't
particularly relevant. One common use of it is that an exception
actually strengthens a rule -- for example, that finding a black swan
makes the statement that "all swans are white" more valid. This is,
of course, nonsense. When you used a variant of the saying
upthread,you gave the impression, probably unintentionally, that you
were using it in that common nonsensical way. I suggest we drop the
discussion of the origin of the saying (or take it to alt.usage.english)
and instead talk about what you actually meant.
I'd agree. The problem is that some people's grasp of proverbs is so bad
that you've got to explain everything. The AS/400 is an exception to the
rule that no implementation stores bounds information with pointers, but it
is such an oddball, not even compiling to machine code, that it doesn't
really invalidate the point that I was trying to make.[ "compiler" ]
If you look at the literature you'll find "compiler" defined as something
that outputs machine code, or as something that translates a language into
another representation. You can use the latter definition if you wish, but
be aware that you are moving from the core definition of the term.
 
M

Malcolm

Flash Gordon said:
The byte code interpreter (or whatever) does NOT have to do that. > It can
crash if your executable gets it wrong.If a release program crashes on any input whatsoever then it is bugged. This
includes interpreters, decompressors, video games, or whatever.
 
G

Gordon Burditt

The byte code interpreter (or whatever) does NOT have to do that. > It can
crash if your executable gets it wrong.
If a release program crashes on any input whatsoever then it is bugged. This
includes interpreters, decompressors, video games, or whatever.

Some released programs are *SUPPOSED* to crash under appropriate
circumstances (like when you explicitly ask it to). The main reason
for this is apparently to permit sending this info back to the
maintainer so he can fix it. Yes, these features ARE sometimes left
in production and relased code.

For example, C has the abort() function.
GDB (the GNU debugger) has the "maintenance dump-me" command.
UNIX permits sending SIGQUIT to programs (unless the programs specifically
catch that signal) to cause a core dump, using a keystroke.

I do recall hearing some discussion about emulators for certain CPUs being
written so carefully that they emulated bugs in the actual hardware that
they were emulating precisely, if you asked it to emulate that particular
model of CPU. Anyone for emulating the Pentium math bug on a non-buggy
Pentium? Some of them also emulated lockups (which sounds to me like a
crash, as the only way out was RESET or power-cycle) just like the real thing.

Gordon l. Burditt
 
F

Flash Gordon

If a release program crashes on any input whatsoever then it is
bugged. This includes interpreters, decompressors, video games, or
whatever.

Not always. As far as the C standard is concerned there is nothing wrong
with an implementation containing an interpreter that crashes when a
program invokes Undefined Behaviour.

IMHO if that interpreter is something like a Virtual Machine that gets
invoked specifically to run a program and terminates when that program
terminates, then there is nothing wrong with it crashing on Undefined
Behaviour of the program being run, and this is no worse than the
program crashing if it was not running under a VM.
 
N

Nick Landsberg

Flash said:
Not always. As far as the C standard is concerned there is nothing wrong
with an implementation containing an interpreter that crashes when a
program invokes Undefined Behaviour.

IMHO if that interpreter is something like a Virtual Machine that gets
invoked specifically to run a program and terminates when that program
terminates, then there is nothing wrong with it crashing on Undefined
Behaviour of the program being run, and this is no worse than the
program crashing if it was not running under a VM.

While the standard is (rightly) unconcerned, as a possible
user of the VM which crashes, I am very concerned.
It has nothing to do with the language, per se, but
is a QOI issue. Given two VM's, one which crashed,
and another which issued a diagnostic, which would
you choose?
 
F

Flash Gordon

Flash Gordon wrote:


While the standard is (rightly) unconcerned, as a possible
user of the VM which crashes, I am very concerned.
It has nothing to do with the language, per se, but
is a QOI issue. Given two VM's, one which crashed,
and another which issued a diagnostic, which would
you choose?

That depends on the situation and the other aspects of the VMs. If the
one that crashed on Undefined Behaviour was significantly less resource
hungry then, having eliminated UB from the code, I might use it in
situations where no one would see the the messages produced by the other
one. If it also allowed me to hook in to things like segmentation
violations but the other did not let be trap any instances of Undefined
Behaviour, then I would be even more likely to use it.

In short, it depends.
 
M

Malcolm

Flash Gordon said:
That depends on the situation and the other aspects of the VMs. If
the one that crashed on Undefined Behaviour was significantly less
resource hungry then, having eliminated UB from the code, I might
use it in situations where no one would see the the messages
produced by the other one.
It depends why you are using a VM. If the reason is portability of the
binary, then you might accept crashing. However if the reason is safety,
then undefined behaviour introduces potential security risks.
If it also allowed me to hook in to things like segmentation
violations but the other did not let be trap any instances of Undefined
Behaviour, then I would be even more likely to use it.
The problem is that the interpreter is the program that crashes. So if you
look at the core dump, or whatever a crashed program produces on your
system, you see where the interpreter failed to check for a out of bounds
array access, for example, but not where that array was in the bytecode.
Of course a decent interpreter would produce a better diagnostic than an
unspecified "illegal memory access".
 
M

Michael Wojcik

Right. I've been tied up with work and have not had time to return to
this squabble, which is OT for c.l.c and fairly pointless anyway. I'll
respond to some specific points in this message if I feel I need to do
so, but on the whole I'd like to propose the following:

1. Most C implementations do not check pointer validity.
2. Some, however, do; the three I named for the AS/400 are examples.
3. I feel that (2) is a significant exception to (1).
4. You do not.
5. The origin and preferred usage of the phrase "the exception that
proves the rule" is disputed.
6. However, some people - including myself and some other c.l.c
readers - feel that when that phrase is used to mean "an
exception to a general rule demonstrates the validity of that
rule in other cases", it's somewhat lacking in rhetorical power.
Other people, of course, may feel differently.

Fair enough?

Ho ho. So you understand how language works do you?

Well enough. I'm ABD in critical theory (and twentieth century
Anglophone prose). I've studied structural linguistics, socio-
linguistics, pragmatics, speech act theory, and poststructuralist
linguistics (as well as various fields that are arguably cognate).
I've read a good bit of philology. My fiance is a professor of
rhetoric in one of the top rhetoric departments in the US, so I spend
quite a lot of time discussing language with other folks familiar
with the subject. I don't need J. Random Poster to tell me that
etymology doesn't determine current use.
See the other threads on bounds checking. There is no point using C if it
does not compile to efficient code, just as there is no point having a
sports car if you are driving it down a traffic-calmed street.

I disagree that the only justification for using C is efficiency of
the resulting code. In my work, for example, I generally have to
produce code which can be ported to various other platforms by other
developers; I am limited in the demands I can put on them, and they
currently deal with C and COBOL. Given that choice, I'll take C.

And since portability is a huge concern in my work - a portation
problem in one component can stall the build "pipeline" for everyone
- the C matters that c.l.c deals with are the ones that I care about
the most.

An example: Chris Torek corrected my use of vsnprintf some months
back, which let me fix a nasty memory-corruption bug in Linux/390.
My error was not "restarting" va-processing with va_end and va_start
between calls to vsnprintf. That's a harmless error in many
implementations, but in some implementations va_arg is destructive
and trying to iterate through the argument list again invokes nasal
demons. That's why the standard forbids it - a point I missed in
my original code.
Because registers are probably 32 bits, and there are probably not too many
of them, so you will thrash memory.

You miss my point. Making *any* guess about how the AS/400 C
implementations work under the covers is taking a stab in the dark,
and likely to mislead you. It doesn't matter how large CPU registers
are in AS/400 C, because (at least in "classic" EPM C) function calls
didn't use them anyway. They weren't accessible to the implementation.
So C's use of 128-bit pointers (ie "native" MI pointers) didn't have
any noticeable adverse effect on performance for any normal program.


--
Michael Wojcik (e-mail address removed)

An intense imaginative activity accompanied by a psychological and moral
passivity is bound eventually to result in a curbing of the growth to
maturity and in consequent artistic repetitiveness and stultification.
-- D. S. Savage
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,141
Messages
2,570,813
Members
47,357
Latest member
sitele8746

Latest Threads

Top