Format of Pointers in Unix

D

Daniel Rudy

Hello,

On a x86 machine, what is the format of a pointer in C? I know for a
fact that the x86 p-mode uses a /selector:eek:ffset/ notation where the
selector is defined in either the GDT or LDT. Does that carry over into
the pointer, or does Unix use the flat memory model?
 
E

Emmanuel Delahaye

On a x86 machine, what is the format of a pointer in C?

<CLC>
The C language doesn't define "the format of a pointer". The answer is
plateform dependent. Read your compiler manual.
</CLC>
 
B

Barry Margolin

Emmanuel Delahaye said:
<CLC>
The C language doesn't define "the format of a pointer". The answer is
plateform dependent. Read your compiler manual.
</CLC>

He said "on a x86" and mentioned "Unix" in the Subject line, so he seems
to realize it's platform dependent.

It's probably not in the compiler manual -- the format of pointers (and
other data types) is typically dictated by the operating system's ABI.
Otherwise, you wouldn't be able to use applications compiled with a
different compiler than the system's libraries.

However, since there's more than one vendor of Unix for x86 platforms,
and they're not required to be binary-compatible with each other, the
answer may be specific to the particular version of Unix the OP is
using. Since he didn't say, we can't even give a good answer for this
in comp.unix.programmer.
 
D

Daniel Rudy

And somewhere around the time of 05/16/2004 07:34, the world stopped and
listened as Barry Margolin contributed the following to humanity:

Emmanuel, way back when in the early 90's, I was programming under DOS
using Pascal and Assembler using a DOS extender. In the 386P-Mode, the
pointer format is 16 bit selector with a 32 bit offset for a 48 bit
pointer. In 286P-Mode, both selector and offset is 16 bit. The
selectors are indices into either the Global Discriptor Table or the
Local Descriptor Table. This is denoted by the x86 hardware. I have NO
idea what the format is on a X86 machine with PAX enabled.

Besides, I *DID* look in the compiler manual for cc and it doesn't say.
Which is why I'm asking in the first place. Also, does Unix use the
segmented or flat memory model. I'm asking because I don't know and the
docs on my system don't really give a straight answer either way.
He said "on a x86" and mentioned "Unix" in the Subject line, so he seems
to realize it's platform dependent.

It's probably not in the compiler manual -- the format of pointers (and
other data types) is typically dictated by the operating system's ABI.
Otherwise, you wouldn't be able to use applications compiled with a
different compiler than the system's libraries.

However, since there's more than one vendor of Unix for x86 platforms,
and they're not required to be binary-compatible with each other, the
answer may be specific to the particular version of Unix the OP is
using. Since he didn't say, we can't even give a good answer for this
in comp.unix.programmer.

Ah, forgot about that, my bad :) I'm using FreeBSD 4.9-RELEASE.
 
S

Stephen L.

Daniel said:
And somewhere around the time of 05/16/2004 07:34, the world stopped and
listened as Barry Margolin contributed the following to humanity:


Emmanuel, way back when in the early 90's, I was programming under DOS
using Pascal and Assembler using a DOS extender. In the 386P-Mode, the
pointer format is 16 bit selector with a 32 bit offset for a 48 bit
pointer. In 286P-Mode, both selector and offset is 16 bit. The
selectors are indices into either the Global Discriptor Table or the
Local Descriptor Table. This is denoted by the x86 hardware. I have NO
idea what the format is on a X86 machine with PAX enabled.

Besides, I *DID* look in the compiler manual for cc and it doesn't say.
Which is why I'm asking in the first place. Also, does Unix use the
segmented or flat memory model. I'm asking because I don't know and the
docs on my system don't really give a straight answer either way.


Ah, forgot about that, my bad :) I'm using FreeBSD 4.9-RELEASE.


Ahh... Now I see where you're coming from a little better.

Pointers in C, no matter how they are represented by the
underlying hardware/architecture, always appear as a flat
address memory model. Pointers in C appear the same way
and are independent of the operating system, i.e.,
Unix, Windows, Mac OS, etc. In other words, I would
code/use pointers identically on each operating system
(even though the pointer values themselves are different).

Do you _really_ need to know how a given architecture
represents a pointer? Can you post some C code where
this would be relevant, and possibly readers of this
group may be better able to answer your question?


Stephen
 
S

subnet

Daniel Rudy said:
On a x86 machine, what is the format of a pointer in C? I know for a
fact that the x86 p-mode uses a /selector:eek:ffset/ notation where the
selector is defined in either the GDT or LDT. Does that carry over into
the pointer, or does Unix use the flat memory model?

Several mainstream OSes (including linux and freebsd, I think) use a
flat memory model. In linux, look at
/usr/src/linux/arch/i386/kernel/head.S for the gdt map. Basically, it
uses 4 main "flat" segments (ie, starting at 0x0 and ending at
0xffffffff) for kernel code, kernel data, user code, user data. Of
course, since segmentation in the x86 can't be disabled, the cpu
internally still accesses memory using a selector:eek:ffset pair, but
since the selector always refers to a 4GB segment, this effectively
results in practice in a flat memory model. The four different
selectors are employed to enforce different memory access privileges
for kernel mode and user mode.
AFAIK, what you call a "pointer" in C is (on IA32) just a 4-byte
variable that holds the "offset" part in the selector:eek:ffset pair (the
selectors are usually preset by the OS, and user programs can't modify
them). Furthermore, keep in mind that all modern OSes use _virtual_
memory, so you'll probably find that two different pointers in two
different programs can hold the same value, but that's normal, since
they are virtual addresses.

HTH
 
G

Gordon Burditt

Ah, forgot about that, my bad :) I'm using FreeBSD 4.9-RELEASE.
Ahh... Now I see where you're coming from a little better.

Pointers in C, no matter how they are represented by the
underlying hardware/architecture, always appear as a flat
address memory model.

This is *NOT* true. The address space appears flat locally, where
you are allowed to see it, such as inside the contents of a single
array. You are guaranteed that incrementing and decrementing pointers
will work and refer to the next and previous elements of the array.

If you want to see if it's flat globally, you have to invoke
undefined behavior. If you do that on, say, MS-DOS using Microsoft's
16-bit C compiler in "large model", you'll notice that pointers
consist of 16 bits of segment number and 16 bits of offset, and in
real mode any pointer with the same value of 16*segment+offset
points to the same byte, even if the pointers don't compare equal.
This is decidedly unlike a flat address space.
Pointers in C appear the same way
and are independent of the operating system, i.e.,
Unix, Windows, Mac OS, etc.

If the operating system doesn't LET you use a particular mode, it's
awfully hard to use pointers in that mode. Large-model 32-bit compilers
aren't common, but I suspect they will become more common as the need
for 4GB+ arrays increases.
In other words, I would
code/use pointers identically on each operating system
(even though the pointer values themselves are different).

If you stick to defined behavior, you don't have to CARE whether
the address space is flat globally. Under the covers, it may
not be.
Do you _really_ need to know how a given architecture
represents a pointer?

If you stick to defined behavior, NO.
Can you post some C code where
this would be relevant, and possibly readers of this
group may be better able to answer your question?

Gordon L. Burditt
 
C

CBFalconer

Stephen L. said:
.... snip ...

Pointers in C, no matter how they are represented by the
underlying hardware/architecture, always appear as a flat
address memory model. Pointers in C appear the same way
and are independent of the operating system, i.e.,
Unix, Windows, Mac OS, etc.

That is just NOT so. That fact is at the root of the ban on
subtracting pointers that do not point within the same object. It
also the reason that most pointer comparisons can only be for
equal/not-equal.
.... In other words, I would
code/use pointers identically on each operating system
(even though the pointer values themselves are different).

Yet this sentence is accurate, within the limits prescribed by the
C standard.
 
?

=?iso-8859-1?q?Nils_O=2E_Sel=E5sdal?=

And Daniel Rudy said...
Besides, I *DID* look in the compiler manual for cc and it doesn't say.
Which is why I'm asking in the first place. Also, does Unix use the
segmented or flat memory model. I'm asking because I don't know and the
docs on my system don't really give a straight answer either way.
All unixes I've seen use a flat memory model for user space applications.
 
S

Stephen L.

CBFalconer said:
That is just NOT so. That fact is at the root of the ban on
subtracting pointers that do not point within the same object.

But that's a UNIVERSAL ban across all OS's, correct?

I don't believe I've ever seen a need to do that
(subtract pointers from two different objects)...
It
also the reason that most pointer comparisons can only be for
equal/not-equal.

I'm curious, is this legal -

{
char str[ 256 ];
char *p1 = str, *p2 = &str[ sizeof (str) ];

while (p1 < p2) {
*p1++ = '\0';
}
}


The point I was making was that code snippet will
run and produce identical results anywhere - I don't
have to change it for x86 to Power PC, etc.

I think much of the confusion comes from inferior
C implementations on segmented architectures (I'm not
trying to start a religious war). I _do_ understand
many of these choices were made in the interest of
performance - but that was over a decade ago! Compiler
tweaks that allow the programmer to select this or
that address model are _really_ extensions to the language.
I should not need to worry about segments, etc. in my C code.
Really, the compiler should have been able to figure out how
to best represent my C code on the architecture. When I
have to start thinking about how the compiler poorly
supports my address space, I'm not programming
in the C language anymore...

I stand by my original statement above.

I believe there are those who state that C isn't really
that much of a high level language - but when you think
of the way C presents pointers to the programmer (as a
person who has programmed x86, 680x0, Sparc, etc.) you
develop a wonderful respect for that _simple_ layer
of abstraction :).


Stephen
 
C

Chris Torek

And Daniel Rudy said...
This helps illustrate why cross-posting is often a bad idea. :)

All unixes I've seen use a flat memory model for user space applications.

Unix (or more specifically POSIX, although there are a number of
standards one can use to define "Unix" as well) imposes some
constraints that make life extremely difficult for anyone who wants
to use a non-uniform / "non-flat" per-process memory layout.
Specifically, the mmap() interface and functions like dlsym() even
manage to rule out all but a weakened form of Harvard architectures
("separate I&D").

Older (V6 and/or V7) Unixes did in fact run on PDP-11/70s with
truly separate I&D spaces, so that:

char *p;
void (*q)();
p = (void *)0x1234;
q = (void (*)())p;

had "p" and "q" pointing to different *physical* memory, even though
the two addresses were clearly identical. (In effect, addresses
were 17, not 16, bits long, with the "topmost" bit being "0 for
read/write-access, 1 for execute-access". Hence *p referred to
what one might call "address 0x01234", while (*q)() referred to
"0x11234".) The idea that one can mmap() an executable file and
then call into it, however, pretty much rules this right out.

The C language is much less restrictive than a typical Unix-like
system, however, so C can run on all kinds of machines that Unix
can never use. This is both good and bad: a less-restrictive system
can run on more hardware, but a more-restrictive system offers much
more "concreteness" to the programmer and is thus much easier to
write code for.
 
E

Erik Max Francis

Chris said:
Older (V6 and/or V7) Unixes did in fact run on PDP-11/70s with
truly separate I&D spaces, so that:

char *p;
void (*q)();
p = (void *)0x1234;
q = (void (*)())p;

had "p" and "q" pointing to different *physical* memory, even though
the two addresses were clearly identical.

This is illegal in Standard C, anyway: data pointers and function
pointers are not compatible. Unix extensions allow (require, in the
case of dlsym) this, but it is not standard conforming to convert data
pointers to function pointers or vice versa.
 
C

CBFalconer

Stephen L. said:
CBFalconer said:
That is just NOT so. That fact is at the root of the ban on
subtracting pointers that do not point within the same object.

But that's a UNIVERSAL ban across all OS's, correct?

I don't believe I've ever seen a need to do that
(subtract pointers from two different objects)...
It
also the reason that most pointer comparisons can only be for
equal/not-equal.

I'm curious, is this legal -

{
char str[ 256 ];
char *p1 = str, *p2 = &str[ sizeof (str) ];

while (p1 < p2) {
*p1++ = '\0';
}
}

Basic principle - if you have to ask OS questions, the code isn't
portable and doesn't belong here in c.l.c. And the above is
perfectly legal - there is a special dispensation for pointers to
one item past the end of an array in the standard.
 
D

Daniel Rudy

And somewhere around the time of 05/16/2004 09:41, the world stopped and
listened as Stephen L. contributed the following to humanity:
Ahh... Now I see where you're coming from a little better.

Pointers in C, no matter how they are represented by the
underlying hardware/architecture, always appear as a flat
address memory model. Pointers in C appear the same way
and are independent of the operating system, i.e.,
Unix, Windows, Mac OS, etc. In other words, I would
code/use pointers identically on each operating system
(even though the pointer values themselves are different).

Ah, the flat memory model. That's exactly what I was looking for. Thanks.
Do you _really_ need to know how a given architecture
represents a pointer? Can you post some C code where
this would be relevant, and possibly readers of this
group may be better able to answer your question?

Actually, it has to do with pointer arithmitic and certian pointer
tricks that I like to do under Pascal. I'm still learning C, and so far
it doesn't look like that I will need to pull some of those little
tricks like I did with my Pascal compiler to get around certian
limitations of the language. Also, considering the environment that I
come from, I had to be very mindfull about the format of the pointers in
real mode and protected mode because I coded in x86 Assembler (Still do
to some extent for ROM type code) which gets very close to the CPU
architecture. So part of it is just my own paranoia getting to me.
 
A

Andrey Tarasevich

Barry said:
He said "on a x86" and mentioned "Unix" in the Subject line, so he seems
to realize it's platform dependent.
...

Strictly speaking, it is implementation dependent. If some
implementation cares to use its own pointer format (not necessarily
compatible with the format imposed by the hardware), it is free to do
so. The same applies to all other formats of internal representation,
such as, for example, integral and floating-point formats.

And the OP didn't mention any concrete compilers.

Best regards,
Andrey Tarasevich.
 
D

Daniel Rudy

And somewhere around the time of 05/16/2004 17:28, the world stopped and
listened as Chris Torek contributed the following to humanity:
This helps illustrate why cross-posting is often a bad idea. :)

My appoligies about that, but because of the nature of my question, I
believe that it would be best to invite individuals from both groups as
my question deals with C programming on the Unix platform, specifically
FreeBSD on IA32 hardware.
All unixes I've seen use a flat memory model for user space applications.


Unix (or more specifically POSIX, although there are a number of
standards one can use to define "Unix" as well) imposes some
constraints that make life extremely difficult for anyone who wants
to use a non-uniform / "non-flat" per-process memory layout.
Specifically, the mmap() interface and functions like dlsym() even
manage to rule out all but a weakened form of Harvard architectures
("separate I&D").

Older (V6 and/or V7) Unixes did in fact run on PDP-11/70s with
truly separate I&D spaces, so that:

char *p;
void (*q)();
p = (void *)0x1234;
q = (void (*)())p;

had "p" and "q" pointing to different *physical* memory, even though
the two addresses were clearly identical. (In effect, addresses
were 17, not 16, bits long, with the "topmost" bit being "0 for
read/write-access, 1 for execute-access". Hence *p referred to
what one might call "address 0x01234", while (*q)() referred to
"0x11234".) The idea that one can mmap() an executable file and
then call into it, however, pretty much rules this right out.

The C language is much less restrictive than a typical Unix-like
system, however, so C can run on all kinds of machines that Unix
can never use. This is both good and bad: a less-restrictive system
can run on more hardware, but a more-restrictive system offers much
more "concreteness" to the programmer and is thus much easier to
write code for.[/QUOTE]

Now here's an interesting peice of code that I wrote earlier today
complete with the compile and run:

strata:/home/dcrudy/c 1027 $$$ ->cat ptr_test1.c
/*

Pointer Test Code. Not in Book

*/

#include <stdio.h>

int thing_var; /* This is the actual thing */
int *thing_ptr; /* This is a pointer to thing */
int **thing_ptr_2; /* This is a pointer to pointer thing_ptr */

main()
{
/* Initial value of thing_var */
thing_var = 4;
printf("Value of thing_var is %d\n\n", thing_var);

/* Load pointer with address of variable thing_var */
thing_ptr = &thing_var;
printf("Address of thing_var is %x\n", &thing_var);
printf("Value of pointer is %x\n\n", thing_ptr);

/* Assign value to thing using pointer */
*thing_ptr = 5;
printf("New value of thing_var is %d\n\n", thing_var);

/* Load pointer with address of pointer thing_ptr */
thing_ptr_2 = &thing_ptr;
printf("Address of thing_ptr is %x\n", &thing_ptr);
printf("Contents of thing_ptr_2 %x\n\n", thing_ptr_2);

/* Change thing_var by referencing thing_ptr_2 */
**thing_ptr_2 = 6;
printf("Value of thing_var is %d\n\n", thing_var);

/* Lets do a pointer programming error... */
thing_ptr_2 = 7;
printf("Address of thing_ptr is %x\n", &thing_ptr);
printf("Contents of thing_ptr_2 %x\n\n", thing_ptr_2);

/* This will probably seg_fault the program. */
printf("Contents of thing via our blown ptr %d\n", **thing_ptr_2);

}

strata:/home/dcrudy/c 1031 $$$ ->cc -g -optr_test1 ptr_test1.c
ptr_test1.c: In function `main':
ptr_test1.c:38: warning: assignment makes pointer from integer without a
cast
strata:/home/dcrudy/c 1032 $$$ ->./ptr_test1
Value of thing_var is 4

Address of thing_var is 8049834
Value of pointer is 8049834

New value of thing_var is 5

Address of thing_ptr is 8049830
Contents of thing_ptr_2 8049830

Value of thing_var is 6

Address of thing_ptr is 8049830
Contents of thing_ptr_2 7

Memory fault (core dumped)
strata:/home/dcrudy/c 1033 $$$ ->gdb -se ptr_test1 -c ptr_test1.core
GNU gdb 4.18 (FreeBSD)
Copyright 1998 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain
conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show warranty" for details.
This GDB was configured as "i386-unknown-freebsd"...Deprecated bfd_read
called at /usr/src/gnu/usr.bin/binutils/
gdb/../../../../contrib/gdb/gdb/dbxread.c line 2627 in
elfstab_build_psymtabs
Deprecated bfd_read called at
/usr/src/gnu/usr.bin/binutils/gdb/../../../../contrib/gdb/gdb/dbxread.c
line 933 i
n fill_symbuf

Core was generated by `ptr_test1'.
Program terminated with signal 11, Segmentation fault.
Reading symbols from /usr/lib/libc.so.4...done.
Reading symbols from /usr/libexec/ld-elf.so.1...done.
#0 0x80485b1 in main () at ptr_test1.c:43
43 printf("Contents of thing via our blown ptr %d\n",
**thing_ptr_2);
(gdb) quit
strata:/home/dcrudy/c 1034 $$$ ->

***

As you can see, the program cored because it did not like the address
that I assigned to thing_ptr_2, which would suggest a segmented memory
model? I am having a problem with this because if it was a true flat
memory model, then I would be able to read data from other process which
I have nothing to do with. There was a mention in the kernel options
about allowing a program to modify it's own LDT, which implies that
FreeBSD does use a segmented memory model of some sort.
 
R

Ralmin

Stephen L. said:
But that's a UNIVERSAL ban across all OS's, correct?

It's undefined behaviour according to the C standard, and that applies to
all C implementations. Some OSes may go ahead and try to compute a value
anyway, but any value produced is never going to be meaningful unless you're
writing very platform-specific code.
I'm curious, is this legal -

{
char str[ 256 ];
char *p1 = str, *p2 = &str[ sizeof (str) ];

while (p1 < p2) {
*p1++ = '\0';
}
}

I believe it is legal. The initialisation of p2 could be written more simply
as

p2 = str + 256;

This pointer value is "one past the end" of the str array, which behaves
according to specific rules in the C standard. You are never allowed to
dereference the value, but you are allowed to
- add a non-positive integer to it.
- subtract a non-negative integer from it.
- perform a subtraction with another pointer into the
str array (either way around).
- perform relational comparisons between it and other
pointers into the str array. (As you do here.)

Once the while loop is over, the p1 and p2 pointer variables both have the
same "one past the end" value.
 
?

=?iso-8859-1?q?M=E5ns_Rullg=E5rd?=

OT for c.l.c, followup set.

Daniel Rudy said:
Now here's an interesting peice of code that I wrote earlier today
complete with the compile and run:
[...]

/* Lets do a pointer programming error... */
thing_ptr_2 = 7;
printf("Address of thing_ptr is %x\n", &thing_ptr);
printf("Contents of thing_ptr_2 %x\n\n", thing_ptr_2);

/* This will probably seg_fault the program. */
printf("Contents of thing via our blown ptr %d\n", **thing_ptr_2);
[...]

Memory fault (core dumped)
[...]

As you can see, the program cored because it did not like the address
that I assigned to thing_ptr_2, which would suggest a segmented memory
model?

Unix uses virtual memory. On every memory access the CPU will look up
the physical address corresponding to the virtual address being
accessed. This mapping has been set up by the OS for each process.
For efficiency the mapping is done in pages of typically 4k or 8k
bytes. If there is no mapping for the virtual address being accessed,
your program tries to access the CPU will signal an error, and the OS
will kill the process. This is what happened in your case.
 
K

Keith Thompson

Stephen L. said:
But that's a UNIVERSAL ban across all OS's, correct?

The C standard says that subtracting pointers that don't point within
the same object (or just past the end of an object) invokes undefined
behavior. The standard makes no reference to OSs in this context. So
yes, I guess you could call it a universal ban across all OSs.

[...]
The point I was making was that code snippet will
run and produce identical results anywhere - I don't
have to change it for x86 to Power PC, etc.

Right. (The code snippet, which I snipped, operates on pointer values
that point within a single object or just past the end of it.)

[...]
I stand by my original statement above.

Sorry, but your original statement (that C pointers "always appear as
a flat address memory model") is incorrect. Your code snippet shows
an effectively flat memory model within a single object, but the C
standard is specifically designed to allow either flat or non-flat
memory models beyond single objects.

(Some old compilers implement extensions to support multiple memory
models; that's a separate issue.)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,142
Messages
2,570,820
Members
47,367
Latest member
mahdiharooniir

Latest Threads

Top