In Mon, 16 Aug 2004 06:48:19 GMT, Keith Thompson <
[email protected]>
wrote:
Function pointers and object pointers are different things; in
particular, you can't legally convert one to the other. In some
implementations, object pointers and function pointers aren't even the
same size. (An implementation could legally implement an object
pointer as a raw machine address, and a function pointer as an integer
index into a table of functions; I don't know of any that actually do
this.)
I can give you one, though obscure and arguably obsolete: classic but
still used in emulation Tandem^WCompaq^WHP NonStop aka TNS.
The (1970s proprietary) ISA defined originally two code segments --
one "user" (application) per process, one "system" (OS) shared -- each
with an entry point aka PEP table at the beginning containing 2
overhead (16-bit) words followed by up to 510 procedure addresses
(16-bit within segment). The (app) call instruction PCAL contained a
9-bit index, as did system call SCAL. BTW their OS was effectively a
nanokernel design, long before that term existed, so most of the OS
functionality was and still is actually in the "user code" segments of
various system processes, not in the "system code" segment.
Over time they added a second shared/OS "library" segment with a third
instruction LCAL, then an additional per-process "user library"
segment at which point they changed the PCAL instruction to index into
a second per-segment *exit* table at the end somewhat confusingly
called external entry point aka XEP, each entry a 16-bit word itself
containing 2 bits selecting the target segment plus 9 bits indexing
into the PEP table thereof. Then finally they used 4 of the remaining
bits of the XEP word to allow multiples of each segment type (user,
user library, system, system library).
And there was a separate indirect call instruction DPCL which now
takes that XEP word format; I forget if it originally took a "system"
bit (plus index) or was limited to user code. So to make a long story
not short enough, that XEP word format is the function pointer for C.
The data segments are also separate, again originally one "user" per
process and one "system" shared but accessible only by privileged
(normally system) code. The original 16-bit (TNS1) memory model is
still available as an option, restricted to less than 64KB*; in the
32-bit (TNS2) memory models, data pointers are 32-bit (mostly, with
some further hacks, and can access some code sometimes for readonly
but not for execution) and function pointers are 32-bits with the
upper 16 as above and the lower 16 not used (zero). While the C
standard allows function and data pointers to be different sizes as
well as different representations, Tandem did only the latter; I'm not
sure why. Perhaps just convenience. As long as you don't have huge
numbers (perhaps arrays) of function pointers, which I've never seen
anyone do, the space wastage is minor.
So using a converted or punned function pointer for a data access may
get you data completely unrelated to the function but more likely a
fault; while similarly using a data pointer for a function call will
get you either a valid function, having nothing to do with the data,
or a fault (segment or index out of range). PEP index zero is never
used, so a zero XEP word can be the null function pointer.
* As I've previously posted, TNS1 actually has two data pointer forms
-- one for byte/char=8-bit, and one for everything else, which must be
word=2-byte aligned, so in that environment you can't just treat all
data pointers as void*. The *ISA* supports 64KW = 128KB. But the HLL
runtime reserves the upper 32KW, hence *in C* only 32KW = 64KB of data
less some overhead, including as I've recently posted RTL-sacred data
always at data address 0 allowing that to be the null data pointer.
And in another oddity, the TNS stack grows upward, so the
"local" stack smashes too common in C on most other machines are
harmless -- they just run off into unused memory. It is still possible
to clobber the stack if you overrun a buffer in the (or a) *caller*'s
frame, or in some cases a "global" aka static-duration one, but this
is usually harder to provoke or control. And even if you do, you flat
cannot execute code from data space; the most you can do is redirect
to existing code somewhere. Or crash the process; or if you could
manage it in system code in the days of real hardware TNS, less likely
as the system code then wasn't in C and never used null-terminated
strings, maybe crash the CPU. (Current "TNS/R" systems emulate classic
TNS only in userspace, optionally, not system.)
Multics also had a code segment format that restricted in-calls to
(via) entries in a small, checked table, but I don't recall exactly
how the pointers worked, and in any case it no longer exists. I don't
believe it ever had a C, especially in view of C's historically strong
and originally exclusive connection with Unix.
- David.Thompson1 at worldnet.att.net