Sizes of pointers

  • Thread starter James Harris \(es\)
  • Start date
J

James Kuyper

On Wed, 31 Jul 2013 19:30:50 +0200, Rosario1903
i not see the reasons for distinguish pointers from unsigned
[and all the concern they generate in portability of C programs]
in not consider the pointer just as one fixed size unsigned
for example something as:
p32u32 *p;

p is a pointer that can contain one 32 bit address [unsigned]
that point to one array of 32 bits unsigned

would resolve all problems i see of undefinitions
and portability of programs for pointer "point of view"

if people has need of 64 bit pointers to u32
somehting
p64u32 *p;


Simply because there are machines and implementations where pointers
are *not* simple integers.

That doesn't quite answer the question. Why are such machines designed
that way? I know that they do exist, and write my programs accordingly,
but I don't know enough about hardware design to answer that question. I
know that some systems support the illusion of a simple linear address
space, even when the underlying reality is far more complicated. I don't
know what the issues are that prevent other systems from doing the same
- do you?
 
R

Rosario1903


i not see the reasons for distinguish pointers from unsigned
[and all the concern they generate in portability of C programs]
in not consider the pointer just as one fixed size unsigned
for example something as:
p32u32 *p;

p is a pointer that can contain one 32 bit address [unsigned]
that point to one array of 32 bits unsigned

would resolve all problems i see of undefinitions
and portability of programs for pointer "point of view"

if people has need of 64 bit pointers to u32
somehting
p64u32 *p;


Simply because there are machines and implementations where pointers
are *not* simple integers.

how they disign machines not easy, and that not use math in general in
particular
unsigned integer math?
 
G

glen herrmannsfeldt

James Kuyper said:
On 08/01/2013 11:45 AM, Bart van Ingen Schenau wrote:
(snip)
Well, the designers of the DS9000 are notorious for ignoring the intent
of the standard; some have even claimed that they go out of their way to
violate the intent.

Some years ago I was wondering about the possibility of generating
JVM code as output of a C compiler. Seems to me that it has pretty
many of the same problems as the DS9000, at least with the assumption
that you want to to run reasonably fast. (You could, of course,
write an 8086 emulator in Java and run 8086 code, but I don't
count that.)

Pointers to structs are one of the complications. Consider a
struct as an object of the appropriate class. You can pass around
references to the object, extract and modify fields in the object,
but you can't do some other things that C expects. If you cast
the pointer to (unsigned char*) then the compiler has to figure
out how to allow you to do many of those things.

-- glen
 
G

glen herrmannsfeldt

James Kuyper said:
That doesn't quite answer the question. Why are such machines designed
that way? I know that they do exist, and write my programs accordingly,
but I don't know enough about hardware design to answer that question.

Well, consider protected mode on the 80286. (I believe still supported
by all later x86 processors.) You have a segment selector (16 bits)
and offset into the segment (16 bits).

Real (8086) mode you could think of as a funny way to address 1MB,
but protected mode allows for much bigger virtual addressing without
so much extra hardware. A segment selector selects a (64 bit)
segment descriptor (hidden from you by the OS). The descriptor
holds a segment origin and segment length (from 1 to 64K bytes).
You can do virtual memory swapping whole segments to/from disk.
The processor never has to consider instructions crossing page
boundaries, and the complications that causes. If a segment selector
can be loaded into a segment register, the segment is addressable.
If you subtract ring bits and global/local bit, that allowed for
a 29 bit virtual address space when most computers has 640K or RAM.
I know that some systems support the illusion of a simple linear
address space, even when the underlying reality is far more
complicated. I don't know what the issues are that prevent other
systems from doing the same
- do you?

There were (maybe still in museums) machines with tagged memory.
Each memory word had bits to indicate the type of data stored
in it.

I believe there have also been machines where memory protection
depended on the compiler not generating code that could access
where it shouldn't, and users could only run the output of such
compilers. Whether or not addressing is linear, there is no way
to address something other than the way supplied by the system.
(I am not sure about C pointers on such processors.)

The main reasons are to make processors easier to build, and to
make more efficient use of the available bits.

-- glen
 
S

Stephen Sprunk

i not see the reasons for distinguish pointers from unsigned

What about systems with signed pointers, such as x86-64?
[and all the concern they generate in portability of C programs]
in not consider the pointer just as one fixed size unsigned for
example something as:
p32u32 *p;

p is a pointer that can contain one 32 bit address [unsigned] that
point to one array of 32 bits unsigned

would resolve all problems i see of undefinitions and portability of
programs for pointer "point of view"

if people has need of 64 bit pointers to u32 somehting
p64u32 *p;

So, you're proposing hard-coding the size of various types, which means
the code would gratuitously break when ported to a system that doesn't
have pointers and/or integers of exactly the specified size? How does
that improve portability in any meaningful way? It seems to me that
would make portability problems worse, not better.

The vast majority of code has no need to know exactly how wide pointers
are or, provided it meets the range requirements, exactly how wide an
integer type is. C's imprecise type system allows code to be _more_
portable than systems that are excessively precise, which is a big part
of C's enduring success.

S
 
J

James Kuyper

What about systems with signed pointers, such as x86-64?

A question came up awhile ago about whether pointers could be
meaningfully related to signed integers. I don't write the kind of code
where it makes any difference, so I didn't know whether there were any
such implementations, much less one as widely used as that one. I wish
someone had mentioned that during that discussion.
[and all the concern they generate in portability of C programs]
in not consider the pointer just as one fixed size unsigned for
example something as:
p32u32 *p;

p is a pointer that can contain one 32 bit address [unsigned] that
point to one array of 32 bits unsigned

would resolve all problems i see of undefinitions and portability of
programs for pointer "point of view"

if people has need of 64 bit pointers to u32 somehting
p64u32 *p;

So, you're proposing hard-coding the size of various types, which means
the code would gratuitously break when ported to a system that doesn't
have pointers and/or integers of exactly the specified size? How does
that improve portability in any meaningful way? It seems to me that
would make portability problems worse, not better.

Possibly he's implying that pointers must be stored in exactly 32 or 64
bits, regardless of the size of hardware addresses, with the C
implementation being responsible for trimming or padding the pointers as
needed to make that work. p32u32 would give access to no more that 4GB
of different memory locations, even on systems which have Etabtyes of
memory installed. On systems which have only 64KB of memory installed,
only the 16 low-order bits of each pointer would be meaningful, the rest
would be wasted. It would need to be specified which pointer type the
unary & operator returns, or perhaps two different operators would be
needed, one for each pointer type?
It would be possible to mandate this; I don't see any advantages,
though, and I expect that adding such a mandate to the standard would be
a good way to kill that version of C, permanently.
 
S

Stephen Sprunk

That doesn't quite answer the question. Why are such machines
designed that way? I know that they do exist, and write my programs
accordingly, but I don't know enough about hardware design to answer
that question. I know that some systems support the illusion of a
simple linear address space, even when the underlying reality is far
more complicated. I don't know what the issues are that prevent other
systems from doing the same - do you?

You seem to be under the impression that a flat linear address space per
process is always the best answer. That may not be true.

The usual counter-example is the AS/400, which puts every object into a
different segment. That enables strict bounds-checking, and it also
plays into the OS security scheme; processes share a single address
space, but you don't want one process/user to modify data belonging to
another process/user willy-nilly.

x86's 16-bit protected mode was sort of like that under early version of
Windows, but it didn't work very well because there were only a few
thousand segments available. Once the i386 introduced 32-bit offsets
and paging, segments became a vestigial CPU feature.

S
 
G

glen herrmannsfeldt

(snip on the properties of C pointers and addressing hardware)
You seem to be under the impression that a flat linear address space per
process is always the best answer. That may not be true.
The usual counter-example is the AS/400, which puts every object into a
different segment. That enables strict bounds-checking, and it also
plays into the OS security scheme; processes share a single address
space, but you don't want one process/user to modify data belonging to
another process/user willy-nilly.
x86's 16-bit protected mode was sort of like that under early version of
Windows, but it didn't work very well because there were only a few
thousand segments available. Once the i386 introduced 32-bit offsets
and paging, segments became a vestigial CPU feature.

Well, there was one other problem and that is that they never
implemented a segment descriptor cache. Every segment selector
load requires loading a 64 bit descriptor. A cache would have sped
up processing in many cases.

With a good descriptor cache, 80386 protected mode, with 4GB segments,
might have delayed the need for 64 bit addressing. (That, and some
way around the 32 bit MMU.)

-- glen
 
J

James Kuyper

You seem to be under the impression that a flat linear address space per
process is always the best answer. That may not be true.

No, I'm saying that to address the issue raised by Rosario1903 requires
explaining why a flat linear address space per process is NOT the best
answer. I'm sure there's good reasons, and I have some idea what they
might be, but I'm not a hardware-oriented person, and am therefore not
the best person to explain those reasons.

I can, as a software-oriented person, say that a simple linear address
space per process simplifies the programming, so there had better be a
good reason for any platform that doesn't provide one.
 
S

Stephen Sprunk

A question came up awhile ago about whether pointers could be
meaningfully related to signed integers. I don't write the kind of
code where it makes any difference, so I didn't know whether there
were any such implementations, much less one as widely used as that
one. I wish someone had mentioned that during that discussion.

Sorry, I must have missed that discussion. It's my favorite x86-64
quirk, so I rarely pass on an opportunity to mention it.

One _can_ describe x86-64 pointers in unsigned terms, and some sources
go to great lengths to do so; it's just that an explanation of certain
kernel details is both shorter and easier to understand when presented
in signed terms.

S
 
S

Stephen Sprunk

It's not that the output byte code is really a problem, rather the
verifier is going to spit up on the unsafe transforms the compiler C
program would create, and refuse to load an execute the program.

Trying to translate C pointers into Java references is a lost cause.

OTOH, couldn't you represent pointers as an integer offset into a
suitably large array (vector?) of bytes representing the C virtual
machine's memory?

S
 
R

Rosario1903

What about systems with signed pointers, such as x86-64?

the unsigned is the easiest mathematical model for one pc
possibly i not remember well but operation+-*/ on unsigned have the
same result if one see it for signed in x86...
[and all the concern they generate in portability of C programs]
in not consider the pointer just as one fixed size unsigned for
example something as:
p32u32 *p;

p is a pointer that can contain one 32 bit address [unsigned] that
point to one array of 32 bits unsigned

would resolve all problems i see of undefinitions and portability of
programs for pointer "point of view"

if people has need of 64 bit pointers to u32 somehting
p64u32 *p;
So, you're proposing hard-coding the size of various types, which means
the code would gratuitously break when ported to a system that doesn't
have pointers and/or integers of exactly the specified size? How does
that improve portability in any meaningful way? It seems to me that
would make portability problems worse, not better.

The vast majority of code has no need to know exactly how wide pointers
are or, provided it meets the range requirements, exactly how wide an
integer type is. C's imprecise type system allows code to be _more_
portable than systems that are excessively precise, which is a big part
of C's enduring success.
S
 
R

Rosario1903

No, I'm saying that to address the issue raised by Rosario1903 requires
explaining why a flat linear address space per process is NOT the best
answer.

for me the "flat linear address space" is one good answer, but i can
not speak because i not had seen other sys...
 
R

Rosario1903

i not see the reasons for distinguish pointers from unsigned

What about systems with signed pointers, such as x86-64?
[and all the concern they generate in portability of C programs]
in not consider the pointer just as one fixed size unsigned for
example something as:
p32u32 *p;

p is a pointer that can contain one 32 bit address [unsigned] that
point to one array of 32 bits unsigned

would resolve all problems i see of undefinitions and portability of
programs for pointer "point of view"

if people has need of 64 bit pointers to u32 somehting
p64u32 *p;

So, you're proposing hard-coding the size of various types, which means
the code would gratuitously break when ported to a system that doesn't
have pointers and/or integers of exactly the specified size? How does
that improve portability in any meaningful way?

if i have one generic operation *:AxA->A and AcN [can be A=32 bit
integers unsigned for example A=0..0xFFFFFFFF or 64 bit integers etc]

if math operations * is not the same thru machines[the * has to
associate the same numbers], and number elements object A contain are
not the same thru machines, than the operation * can not be portable
thru machines that want use it

and this for all operations
this is the meaning of portability in my way of see
any other is error and origin for UB for me

we associate different meanings for the word "portability"
 
G

glen herrmannsfeldt

(snip)
(snip)
Well, it certainly makes programming easier if you assume that all
actors in the address space are trustworthy. Even if we could assume
that all program components are non-malicious, we still have to assume
they're at the mercy of pathologically malicious inputs. Of course
trying to make protection too fine grained leads down the road to
capabilities...
On the performance side, large flat address spaces have historically
imposed a performance penalty, because of the sizes of the resulting
pointers. Even now, there's an effort to allow x86_64 programs in
Linux that only have 32 bit pointers (see the x32 ABI project).
Non-flat addressing schemes can make it possible to use larger
pointers only for bulk data (16 bit x86 mixed model programming, as a
recent example).

Another consideration is address constants. For 8 bit processors
with 16 bit addressing, it was usual to allow 16 bit addresses in
instructions. Some added 8 bit relative addressing, in addition.

For 32 bit RISC with 32 bit instructions, you can't have a 32 bit
address in an intruction. There are ways to generate one with two
instruction, but you want to be able to address most things with
only one instruction.

Base-displacement addressing for S/360 was one way to reduce the
bits needed in the instruction for addresses. (Especially for the
smaller models of S/360.) Also, S/360 uses a single flat address
space for all tasks. (Protection keys keep tasks apart.) S/370
with OS/VS2 used a single (flat) virtual address space, where
MVS (Multiple Virtual Storage) gives each task its own address
space.
There used to be considerably more variety in architectures, and much
of the world has migrated to fairly flat address spaces. OTOH, much
of that was driven by *nix (and to an extent Windows NT), which fairly
steadfastly refused to support any more complex addressing schemes.
And if your new processor won't run *nix well, you're starting with a
couple of strikes against you.

Except for parts too small to run unix. The Z80 descendants, with
a 24 bit address space don't try to run unix, but stay popular.

-- glen
 
G

glen herrmannsfeldt

the unsigned is the easiest mathematical model for one pc
possibly i not remember well but operation+-*/ on unsigned have the
same result if one see it for signed in x86...

VAX put user programs at the bottom of the 32 bit address space,
with the system space at the top. VAX was around a long time
before people could afford 4GB. I don't know that it was described
as signed, but the effect was the same.

Currently no-one can fill a 64 bit address space, so there are
tricks to avoid the overhead of working with it.

To avoid huge page table, VAX and S/370 use a two level virtual
address system. (VAX has pagable page tables, S/370 has segments
and pages.) Continuing that, z/Architecture has five levels of tables.
It should take five table references to resolve a virtual address
(that isn't in the TLB) but there is a way to reduce that for
current sized systems.

For x86 processors, the bottom of the (physical) address space is
interrupt vectors, and execution starts at the top. (I believe
that is still true.) That conveniently allows RAM at the bottom,
and ROM at the top.

-- glen
 
K

Keith Thompson

Stephen Sprunk said:
Trying to translate C pointers into Java references is a lost cause.

OTOH, couldn't you represent pointers as an integer offset into a
suitably large array (vector?) of bytes representing the C virtual
machine's memory?

There's no real need to model the C abstract machine's memory as a
monolithic array of bytes. I think it would make more sense for each
object (that's not a subobject of some other object) as a separate
byte array.

Robert Wessel points out a number of problems, most of which would
probably apply to this approach as well. I don't know enough about the
JVM to comment further (but I don't guarantee that that will stop me).
 
K

Keith Thompson

James Kuyper said:
I can, as a software-oriented person, say that a simple linear address
space per process simplifies the programming, so there had better be a
good reason for any platform that doesn't provide one.

I'm not sure how helpful that really is. Linearity within each
object is important, but as long as I can get a unique address for
each object (including each byte within each object), why should
I care how addresses of distinct objects relate to each other
(apart from "==" and "!=" working properly)?
 
S

Stephen Sprunk

Stephen Sprunk said:
It's not that the output byte code is really a problem, rather
the verifier is going to spit up on the unsafe transforms the
compiler C program would create, and refuse to load an execute
the program.

Trying to translate C pointers into Java references is a lost
cause.

OTOH, couldn't you represent pointers as an integer offset into a
suitably large array (vector?) of bytes representing the C
[abstract] machine's memory?

There's no real need to model the C abstract machine's memory as a
monolithic array of bytes. I think it would make more sense for
each object (that's not a subobject of some other object) as a
separate byte array.

The latter would work for portable code, but there's a lot of code out
there that assumes relative comparison or subtraction of pointers to
different objects is meaningful, i.e. a flat linear address space. I
see no need to gratuitously break such code.

S
 
G

glen herrmannsfeldt

Keith Thompson said:
Stephen Sprunk <[email protected]> writes:

(snip, someone wrote)
I believe for standalone (not applet) programs you can disable
the verifier, but most likely to should generate code that can
be verified.

For example, you are not allowed to look at the two halves of
a long separately. If C code does that, you have to special case
it and use bit operations. As well as I understand it, the underlying
system can be big endian, little endian, or (on a 64 bit host)
put the whole thing in one location.

In most case, I think you should be able to represent a pointer
as an reference to the object, and offset into the object (array).
Increment, decrement, and relational operations other than ==
and != only use the offset. Dereference uses the object reference
and current offset.

You have to special case (unsigned char *) so that memcpy(), or
any other reference to it can do the right thing. For copying of
the same type, memcpy() can use instanceof and copy appropriately.
There's no real need to model the C abstract machine's memory as a
monolithic array of bytes. I think it would make more sense for each
object (that's not a subobject of some other object) as a separate
byte array.

I agree, but it does get interesting. Scalar variables that could
possibly have a pointer point to have to be arrays of length one.
(Arrays are objects, but primitive types aren't.)
Robert Wessel points out a number of problems, most of which would
probably apply to this approach as well. I don't know enough about the
JVM to comment further (but I don't guarantee that that will stop me).

It might be that not everything can be done. If the things that most
programs use can be made to work, and work in such a way that one can
reasonably call to/from Java, it might be usable.

-- glen
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,079
Messages
2,570,573
Members
47,205
Latest member
ElwoodDurh

Latest Threads

Top