Format of Pointers in Unix

D

Dan Pop

In said:
Pointers in C, no matter how they are represented by the
underlying hardware/architecture, always appear as a flat
address memory model.

In C, each object resides in its own flat address space. C defines no
metric for pointers to different objects. As a result of this, two
arbitrary (but well defined) pointer values can only be compared for
equality. The < and > operators (and their friends) require pointers to
the same object as operands. Otherwise, the program invokes undefined
behaviour, so "it works just fine on every implementation I know of" is
NOT a *valid* counterargument.

Dan
 
D

Dan Pop

In said:
And somewhere around the time of 05/16/2004 09:41, the world stopped and
listened as Stephen L. contributed the following to humanity:


Ah, the flat memory model. That's exactly what I was looking for.

It might be exactly what you were looking for, but it's not the right
answer.

Dan
 
E

Eric Sosman

CBFalconer said:
:

... snip ...



That is just NOT so. That fact is at the root of the ban on
subtracting pointers that do not point within the same object. It
also the reason that most pointer comparisons can only be for
equal/not-equal.

The ban also arises from the properties of pointer
arithmetic as defined by C: Subtracting two pointers (when
permitted) gives the number of instances of the pointed-to
type that lie between them. This is the flip side of what
happens when you add an integer to a pointer: Adding one
to a `double*' advances by the sizeof one double, not by
one byte.

So, consider

typedef char Buffer[527];
char *p = malloc(528); /* assume success */
Buffer *b1 = (Buffer*)p;
Buffer *b2 = (Buffer*)(p+1);
ptrdiff_t d = b2 - b1; /* bzzzt! */

`b1' and `b2' point to memory locations only one byte
apart, and one is not divisible by 527. The subtraction
is undefined, because there's no way to define it and
still maintain the properties of pointer arithmetic.
 
G

glen herrmannsfeldt

Barry Margolin wrote:

(snip)
He said "on a x86" and mentioned "Unix" in the Subject line, so he seems
to realize it's platform dependent.
It's probably not in the compiler manual -- the format of pointers (and
other data types) is typically dictated by the operating system's ABI.
Otherwise, you wouldn't be able to use applications compiled with a
different compiler than the system's libraries.

The Watcom, and I believe now the OpenWatcom compilers support
large model 32 bit code, with 16 bit selector 32 bit offset,
for pointers. I don't know if any OS support running them, though.

In that case, the compiler manual does describe them because it
describes the selection between small and large model.
However, since there's more than one vendor of Unix for x86 platforms,
and they're not required to be binary-compatible with each other, the
answer may be specific to the particular version of Unix the OP is
using. Since he didn't say, we can't even give a good answer for this
in comp.unix.programmer.

Crossposted to comp.lang.c, though...

I used to think that large model would be a good way to
get past the 4GB limit on 32 bit processors. (Many IA32
processors have a 36 bit physical address bus but only a 32 bit
MMU.) Now that Opteron and Athlon 64 prices are coming down, close
to $200 for the processor, one might just as well support that.

-- glen
 
V

Victor Wagner

: I used to think that large model would be a good way to
: get past the 4GB limit on 32 bit processors. (Many IA32

It is rather terrible way. I remeber days of DOS programming, it was
a pain to deal with segments and selectors. And it is also was awfully
slow. Intersegment calls was several times slower in 286 protected mode
that in real mode, not mentioning "short" calls.

: processors have a 36 bit physical address bus but only a 32 bit
: MMU.) Now that Opteron and Athlon 64 prices are coming down, close
: to $200 for the processor, one might just as well support that.

Now, Unixes already support flat 64-bit model on those processors.
Several Linux distributions are released, FreeBSD supports them too,
64-bit Solaris for x86-64 is on the way. So, better get rid of this
outdated 32-bit crap.
 
S

Stephen Sprunk

glen herrmannsfeldt said:
I used to think that large model would be a good way to
get past the 4GB limit on 32 bit processors. (Many IA32
processors have a 36 bit physical address bus but only a 32 bit
MMU.)

Isn't the linear address space still 32-bit even if you have a 48-bit
virtual address space and a 36-bit physical address space?
Now that Opteron and Athlon 64 prices are coming down, close
to $200 for the processor, one might just as well support that.

That's ideal, but there's a lot of existing CPUs out there that will be in
service for another decade. And, of course, Intel and AMD are still
shipping new 32-bit chips.

S
 
J

James Kanze

|> And Daniel Rudy said...

|> > Besides, I *DID* look in the compiler manual for cc and it doesn't
|> > say. Which is why I'm asking in the first place. Also, does Unix
|> > use the segmented or flat memory model. I'm asking because I don't
|> > know and the docs on my system don't really give a straight answer
|> > either way.

|> All unixes I've seen use a flat memory model for user space
|> applications.

That's because your just a youngster. My first exposure to Unix was on
an 8086 based system which used segmented pointers.
 
?

=?iso-8859-1?q?Nils_O=2E_Sel=E5sdal?=

And Dan Pop said...
In <[email protected]>


How many Unices for the 8086 and 286 have you seen?
None. And I've neither touched AIX as someone else mentioned here.
I might have given the impression that *all* unixes uses
flat memory, that was not the intention though ;)
 
G

glen herrmannsfeldt

Victor said:
: I used to think that large model would be a good way to
: get past the 4GB limit on 32 bit processors. (Many IA32

It is rather terrible way. I remeber days of DOS programming, it was
a pain to deal with segments and selectors. And it is also was awfully
slow. Intersegment calls was several times slower in 286 protected mode
that in real mode, not mentioning "short" calls.

I don't think it is quite that bad. It depends much on the
problems you are working on, but many that need more than 4GB
total don't need more than 4GB for any one object.

If processors cached segment descriptors, it wouldn't be so
slow. I have been told that some did, but have never seen
it in the documentation. In any case, it made possible what
otherwise might not have been possible.
: processors have a 36 bit physical address bus but only a 32 bit
: MMU.) Now that Opteron and Athlon 64 prices are coming down, close
: to $200 for the processor, one might just as well support that.
Now, Unixes already support flat 64-bit model on those processors.
Several Linux distributions are released, FreeBSD supports them too,
64-bit Solaris for x86-64 is on the way. So, better get rid of this
outdated 32-bit crap.

I probably agree now, but I didn't last year.

-- glen
 
G

glen herrmannsfeldt

Isn't the linear address space still 32-bit even if you have a 48-bit
virtual address space and a 36-bit physical address space?

Yes, the MMU is still 32 bits. With OS support, changing page
tables as needed, this could be overcome.
That's ideal, but there's a lot of existing CPUs out there that will be in
service for another decade. And, of course, Intel and AMD are still
shipping new 32-bit chips.

There are, but the number of applications that really need
more than 4GB is small enough, and in most case those will
require more than 4GB of real memory. I don't feel so bad
in that case in requiring one to spend a little
more for a system to run it on.

-- glen
 
A

August Derleth

Don't forget SCO's XENIX; that was available for the 80286, IIRC.

And, I think, Microsoft made XENIX for the 8086. (One of my older UNIX
books mentions it, in a discussion of how far the *nix idea had gone by
the mid-80s.)

How did that work on a 16-bit chip without an MMU? (That is, how did it
differ from MS-DOS aside from the API/ABI and command shell being
different?)
 
V

Villy Kruse

Several mainstream OSes (including linux and freebsd, I think) use a
flat memory model.


The few unix versions that did run on 286 did use segment:eek:ffset style
pointers just like MSDOS programs did. The only way to avoid this
given the 286 hardware was to use the small model, that is, pointers
were 16 bits and the size of a program therfore must be less than 64K.

With the ability of using 32 bit pointers in 386 made it just natural
to use a flat memory model just like the numerous unix versions that
ran on Motorola 68k. That made porting a whole lot easier.

Followup-To: comp.unix.programmer

Villy
 
S

Stephen SM WONG

I'd run SCO Xenix on 80286 PC/AT, should be around 1988.
Xenix on 80286 uses segmentation model, which a code segment
can be no larger than 64kBytes, and a data segment can be no
larger than 64kBytes, though a program can have more than
one segments. IIRC, Xenix supported virtual memory, ie.
swap the whole segment in/out from swap space. And in fact,
I think it's the more correct usage of "swap" instead of
"page (partition)". Well, at the same period of time, there
were Sun 3s running on 68020/68030 (SunOS 4), and supported
X Windows with Open Look Window Manager (olwm). In the lab
I worked for, we were amazed to run Unix(-like) OS on a PC,
and equally amazed to run remote X Windows applications on
Sun 3s and displayed on a VGA Xenix workstation (X Windows)
through a coaxial 10Mbps LAN! That's old story though.

Stephen Wong @ Hong Kong
 
R

Ralmin

Eric Sosman said:
The ban also arises from the properties of pointer
arithmetic as defined by C: Subtracting two pointers (when
permitted) gives the number of instances of the pointed-to
type that lie between them. This is the flip side of what
happens when you add an integer to a pointer: Adding one
to a `double*' advances by the sizeof one double, not by
one byte.

So, consider

typedef char Buffer[527];
char *p = malloc(528); /* assume success */
Buffer *b1 = (Buffer*)p;
Buffer *b2 = (Buffer*)(p+1);

Is it possible for an array of char to have alignment requirements?
ptrdiff_t d = b2 - b1; /* bzzzt! */

`b1' and `b2' point to memory locations only one byte
apart, and one is not divisible by 527. The subtraction
is undefined, because there's no way to define it and
still maintain the properties of pointer arithmetic.

I would have expected the implicit division of 1 by 527 to always round down
to zero in practise. It appears to produce zero on most of my
implementations to hand (lcc-win32, microsoft vc++, borland bcc32), but
surprisingly not on cygwin gcc, where it produces a ptrdiff_t value
of -2086359825. Weird.

<OT> Any explanations? </OT>
 
D

Dan Pop

In said:
And, I think, Microsoft made XENIX for the 8086. (One of my older UNIX

Right, Xenix started its life as a Microsoft product.
How did that work on a 16-bit chip without an MMU? (That is, how did it
differ from MS-DOS aside from the API/ABI and command shell being
different?)

Multiuser/multitasking capabilities and I think it was supporting
swapping (a whole process could be moved to the swap partition, when the
system was short of memory).

Of course, since user processes executed in kernel mode, user code had
full control over the machine.

Dan
 
D

Daniel Rudy

And somewhere around the time of 05/17/2004 23:02, the world stopped and
listened as August Derleth contributed the following to humanity:
And, I think, Microsoft made XENIX for the 8086. (One of my older UNIX
books mentions it, in a discussion of how far the *nix idea had gone by
the mid-80s.)

How did that work on a 16-bit chip without an MMU? (That is, how did it
differ from MS-DOS aside from the API/ABI and command shell being
different?)

On the 8086, a memory page is 16 bytes. Every segment starts and ends
on a 16 byte boundary. This is why that 3186:3DE0 = 35640 & 3187:3DD0
also = 35640. It was much easier to send the CPU out into the weeds and
crash those machines because there was *NO* memory protection enforced
by the hardware. AFAIK, there was no swap either because the paging
mechanism doesn't exist on that ancient hardware.
 
T

Tim Shoppa

and in
real mode any pointer with the same value of 16*segment+offset
points to the same byte, even if the pointers don't compare equal.
This is decidedly unlike a flat address space.

I agree with you, that's not flat.

But...

Lots of architectures with MMU's allow the kernel to map multiple addresses
in userland space to the same hardware address. Is this flat
or not flat?

Many designs without MMU's but with incomplete address decoding
will map two addresses to the same memory cell.

I suspect that "flat" vs "not flat" is sometimes a continuum rather
than a one-or-the-other choice.

Tim.
 
E

Eric Sosman

Ralmin said:
Eric Sosman said:
The ban also arises from the properties of pointer
arithmetic as defined by C: Subtracting two pointers (when
permitted) gives the number of instances of the pointed-to
type that lie between them. This is the flip side of what
happens when you add an integer to a pointer: Adding one
to a `double*' advances by the sizeof one double, not by
one byte.

So, consider

typedef char Buffer[527];
char *p = malloc(528); /* assume success */
Buffer *b1 = (Buffer*)p;
Buffer *b2 = (Buffer*)(p+1);

Is it possible for an array of char to have alignment requirements?

I don't think so. It's easy to see that the alignment
requirement for any object must be a divisor of the object's
size (otherwise malloc'ed arrays wouldn't work right), so
in the illustration the alignment for `Buffer' would have
to be 1 or 17 or 31 or 527. Of these, only 1 (no alignment)
seems a reasonable choice.

This argument can probably be extended to show that an
array's alignment requirement cannot be stricter than that
of its elements. While I believe this to be true, I confess
that I wouldn't stake my life on being able to prove it with
perfect rigor.
I would have expected the implicit division of 1 by 527 to always round down
to zero in practise.

Ordinary integer division does so, of course. But if
you tried to extend that behavior to pointer subtraction
you'd wind up with things like `b1 + (b2 - b1) != b2'.
One *could* define a language this way -- after all, we've
survived `3 * (5 / 3) != 5' -- but the usefulness would be
questionable. In any event, C isn't defined this way.
It appears to produce zero on most of my
implementations to hand (lcc-win32, microsoft vc++, borland bcc32), but
surprisingly not on cygwin gcc, where it produces a ptrdiff_t value
of -2086359825. Weird.
>
<OT> Any explanations? </OT>

A splendid example of undefined behavior ;-) Or, just
possibly, using the wrong printf() conversion specifier to
display the ptrdiff_t value, and getting away with it on only
three of four implementations ...
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,143
Messages
2,570,821
Members
47,367
Latest member
mahdiharooniir

Latest Threads

Top