Sizes of pointers

  • Thread starter James Harris \(es\)
  • Start date
G

glen herrmannsfeldt

(snip on CPU and I/O speeds, then I wrote)
Double (or more) buffering certainly helps on sequential accesses, and
most modern OS's do some approximation of it automatically for stream
based accesses, so long as the program's access pattern appears to be
sequential. So there's no real reason to use AIO (or any other
program visible explicit double buffering) for sequential accesses.

There are two ways to do sequential I/O for OS/360 (and they are
still there in z/OS), which differ in how much the OS does for you.

For BSAM, (Basic Sequential Access Method) you pretty much do it AIO
like with READ and WAIT macros. For double buffering, you start
with two READs, then loop processing one while the other is read,
and with the appropriate WAIT at the end (or start) of the loop.

For QSAM (Queued Sequential Access Method) the OS does most of
keeping track of buffers, and, I believe, properly processing
more than two. QSAM also does blocking and deblocking for you.
Purely random accesses can make little use of prefetching, of course,
and that's where you find the read use case for AIO.

For OS/360, and random access, you have BDAM, and the records are
unblocked. (You can put anything in the records you want, but
they are unblocked as seen by the system.) That is, the records,
of whatever size you want from 1 byte to the size of a disk track,
are physical blocks on the disk. (For people used to unix, think
of blocks on tape.) The disk hardware can then find and supply
the appropriate disk block.

-- glen
 
T

Tim Rentsch

Stephen Sprunk said:
Having to explain why an unsigned value must be sign-extended isn't
nearly as simple or understandable as saying it's a signed value.

If pointers were unsigned, they'd be zero-extended.

Some sources go to great lengths to call it something other than sign
extension and then define that made-up term in unsigned terminology. A
rose by any other name is still a rose, though, and such explanations
are neither as simple nor as understandable as the signed one.


It applies to both Windows and Linux, and experts from both groups
collaborated with AMD on such details when the architecture was being
defined. The ISA was literally designed to run those OSes, so how those
OSes use the ISA is informative of the intent.

True, someone could design an OS that didn't define the negative half as
being kernel space, in which case sign extension would no longer be so
important, but sign extension would still happen because it's a part of
the instruction set definition. Change a fundamental detail like that
and the result cannot be called AMD64/x86-64 anymore.


If I extract a subset of bits from an object and look at the result as
an unsigned value, that says nothing about whether the original object
was signed or unsigned.

OTOH, dictating that an object must be sign-extended when converted says
a lot about whether those objects are signed.

This is all just circular reasoning - it's a sign bit
because what's being done is sign extension, and it's
sign extension because what's being "extended" is a
sign bit. It's just as consistent to say the smaller
address space is mapped to the larger address space
by replicating the high order bit. The addresses
themselves have no intrinsic signedness - only which
point of view one takes might ascribe one.
 
K

Keith Thompson

Rosario1903 said:
On Fri, 2 Aug 2013 16:35:35 +0000 (UTC), glen herrmannsfeldt wrote:

i try to send thi message not thru aoe that seem has banned me...


yes it seems here that all operation +-* are the same between unsigned
and int

Not exactly. An implementation that uses a 2's-complement
representation for signed integers *can* use the same instructions for
signed vs. unsigned addition, subtraction, and multiplication, because
the behavior of signed overflow is undefined.
what change, it is the division, operation "/"

but someone say it is not used in pointer aritmetic
[i would use & or % for see if one address is short alighed, int
aligned, double aligned etc]

Would you really? And what do you think would happen if you tried to
apply the "%" operator to a pointer value?

[...]
some C code without macro...

Thank you for not using your one-letter macros. I'm still not going to
try to figure out what your program does.

[snip]
 
K

Keith Thompson

Tim Rentsch said:
This is all just circular reasoning - it's a sign bit
because what's being done is sign extension, and it's
sign extension because what's being "extended" is a
sign bit. It's just as consistent to say the smaller
address space is mapped to the larger address space
by replicating the high order bit. The addresses
themselves have no intrinsic signedness - only which
point of view one takes might ascribe one.

The point is that sign-extension is a simple way to describe the
replication of the high order bit. I don't think anyone is arguing that
pointers on such a system *must* be described as (or as behaving like)
signed integers, merely that they can reasonably be described that way.
 
L

Lew Pitcher

Rosario1903 said:
On Fri, 2 Aug 2013 16:35:35 +0000 (UTC), glen herrmannsfeldt wrote:
yes it seems here that all operation +-* are the same between unsigned
and int

Not exactly. An implementation that uses a 2's-complement
representation for signed integers *can* use the same instructions for
signed vs. unsigned addition, subtraction, and multiplication, because
the behavior of signed overflow is undefined. ...
what change, it is the division, operation "/"

but someone say it is not used in pointer aritmetic
[i would use & or % for see if one address is short alighed, int
aligned, double aligned etc]

Would you really? And what do you think would happen if you tried to
apply the "%" operator to a pointer value?

i would know what happen here: the result for example address%8
would be
1 if mem is char aligned
2 if mem is short and char aligned
4 if mem is int, short, char aligned
0 if mem is double aligned etc

Perhaps. Perhaps not. Keith has purpose in asking his question.

First off, only a subset of mathematical and logical operations are valid
when applied against a pointer, and modulo (%) is not one of those
operations. Your
example address % 8
example would not be legal C.

Having said that, your conclusions (above) are also faulty. Assuming that,
by slight-of-programmers-hand, you could take the modulo of a pointer, you
would find that
short int are not always aligned on 2-byte boundaries
int are not always aligned on 4-byte boundaries
float are not always aligned on 8-byte boundaries

Alignment issues depend on the implementation and the platform. For
instance, on some platforms, 1-byte alignment is legal (but, not
necessarily efficient) for /all/ integral variables. For some platforms,
short and int must be 4-byte aligned. It depends on the hardware and the
compiler.

BTW, you can't take modulo of a pointer, but you /can/ convert a pointer to
an integer, and take modulo of that value. There's no guarantee that the
conversion will make much sense; the result is "implementation defined",
according to the C standard.

HTH
 
K

Keith Thompson

Rosario1903 said:
Rosario1903 said:
On Fri, 2 Aug 2013 16:35:35 +0000 (UTC), glen herrmannsfeldt wrote:
yes it seems here that all operation +-* are the same between unsigned
and int

Not exactly. An implementation that uses a 2's-complement
representation for signed integers *can* use the same instructions for
signed vs. unsigned addition, subtraction, and multiplication, because
the behavior of signed overflow is undefined. ...
what change, it is the division, operation "/"

but someone say it is not used in pointer aritmetic
[i would use & or % for see if one address is short alighed, int
aligned, double aligned etc]

Would you really? And what do you think would happen if you tried to
apply the "%" operator to a pointer value?

i would know what happen here: the result for example address%8
would be
1 if mem is char aligned
2 if mem is short and char aligned
4 if mem is int, short, char aligned
0 if mem is double aligned etc

You are mistaken. The result would be an error message from the
compiler. C doesn't define the "%" operator for pointers.

If you're talking about something other than C, you're in the wrong
place.
 
T

Tim Rentsch

Malcolm McLean said:
You can emulate any Turing-equivalent language with any other. So
you can write a 6502 (or whatever) processor emulator in Java, take
a C to 6502 compiler, and you've got a conforming C system. But
it's a horrible solution. You can't easily pass information from
the rest of your system into the little 652 emulator. You can't
take advantage of a lot of the modern processor's instructions, and
you're going through layer after layer of indirection, so it'll
probably run at about the speed of a 1980s vintage BBC computer, on
hardware with a hundred times faster processor.

Performance matters. A Java Virtual machine is bad enough, a
virtual machine within a virtual machine - well yes, if you're
actually writing a Beeb emulator so you can play "Planetoid". But
not as a general solution.

You're confusing abstract machines and implementations.
Obviously a Java implementation could be done that would run
C-to-JVM-compiled code as fast as native C code, by JIT
compiling into equivalent native code. There is nothing
in the JVM definition that would prevent this. If anything,
using a single memory to "fake" a C address space might help
the effort.

To put this another way, your thesis is disproved by the
existence of high performance implementations of the 80x86
architecture - a truly awful abstract machine, but still
capable of high performance implementation.
 
T

Tim Rentsch

Robert Wessel said:
Stephen Sprunk said:
On 02-Aug-13 11:35, glen herrmannsfeldt wrote:
30, Rosario1903 wrote:
i not see the reasons for distinguish pointers from unsigned

What about systems with signed pointers, such as x86-64?

the unsigned is the easiest mathematical model for one pc possibly
i not remember well but operation+-*/ on unsigned have the same
result if one see it for signed in x86...

VAX put user programs at the bottom of the 32 bit address space,
with the system space at the top. VAX was around a long time
before people could afford 4GB. I don't know that it was
described as signed, but the effect was the same.

Currently no-one can fill a 64 bit address space, so there are
tricks to avoid the overhead of working with it.

To avoid huge page table, VAX and S/370 use a two level virtual
address system. (VAX has pagable page tables, S/370 has segments
and pages.) Continuing that, z/Architecture has five levels of
tables. It should take five table references to resolve a
virtual address (that isn't in the TLB) but there is a way to
reduce that for current sized systems.

x86 is similar, [snip stuff about page tables]

Most importantly, though, all pointers must be sign-extended,
rather than zero-extended, when stored in a 64-bit register. You
could view the result as unsigned, but that is counter-intuitive
and results in an address space with an enormous hole in the
middle. OTOH, if you view them as signed, the result is a single
block of memory centered on zero, with user space as positive and
kernel space as negative. Sign extension also has important
implications for code that must work in both x86 and x86-64
modes, e.g. an OS kernel--not coincidentally the only code that
should be working with negative pointers anyway. [snip unrelated]

IMO it is more natural to think of kernel memory and user memory
as occupying separate address spaces rather than being part of
one combined positive/negative blob; having a hole between them
helps rather than hurts. If you want to think kernel memory as
"negative" and user memory as "positive", and contiguous so a
pointer being decremented in user space falls into kernel space,
you are certainly welcome to do that, but don't insist that
others have to share your perceptual bias.

I think it's reasonable for the application to consider its address
space to be distinct from the kernel. OTOH, the OS guys really want
the kernel's address space to contain one at least one user address
space at any given time.

As I mentioned in another posting, just because the address
spaces are (conceptually) separate, there is no reason a
single address representation couldn't allow addressing
memory in either space. There are lots of different ways
this might be done; using a particular bit to distinguish
"user space" and "kernel space" is one obvious way.
 
T

Tim Rentsch

glen herrmannsfeldt said:
[all snipped]

I read through this posting but am not sure what point you
were trying to make. It sounds like you agree with my
basic point that whether pointers on x86-64 are signed is
more a matter of subjective perception than objective
reality. I understand and agree with your view that
thinking of pointers as signed is convenient in some
ways. If we do indeed have agreement on these two
basic points then I am satisfied with the discussion.
 
T

Tim Rentsch

Stephen Sprunk said:
The presumed invalidity of an appeal to authority comes from the
tendency for the authority to be a false (or at least disputed) one.

OTOH, if I were to invent a new ISA/ABI and declared that my pointers
were unsigned, then it would be completely valid for you to cite me in a
debate over signedness. That is what I'm doing here: the people who
actually designed the ISA/ABI said that pointers are signed, and they
_do_ have the authority to say that. As the undisputed inventors, they
are correct by definition.

I don't share this opinion. If the inventors said that some values
were red and other values were green, that doesn't mean the values
necessarily acquire the property of being colored. Certainly it is
what they said (at least I will assume so for the sake of the
discussion), but that's not the property I'm interested in when
talking about signedness.

The reason appeal to authority is not valid, or more accurately
relevant, for what I'm saying is that the property I'm concerned
with is not legislated by any authority.
There are only two relevant ways to extend a value when stored
into a larger register: sign extension and zero extension. In
fact, elsewhere in the documentation you will find dozens of
examples saying that when a 32-bit value is stored into a 64-bit
register, it is sign-extended unless a special zero-extension
instruction is used. The docs even tell you not to use the
zero-extension instruction for pointers; you are supposed to use
the default, which is sign extension--but it's not called that,
because that would be heresy. "Everybody knows" that pointers
can't be signed, right?

Are you now saying the documentation does NOT describe the
transformation as sign extension? So we have a different authority
saying something different? For the property I've been talking
about, it doesn't matter whether the instruction is called
Sign Extend, Arithmetic Shift, Replicate High bit, Adjust
Pointer Size, or anything else -- what matters how things
behave, not what they are called.
I acknowledged that there are two views. That one of them is simpler,
more elegant and more self-consistent is objectively true.

Is that so? Just what physical equipment do you have in mind to
measure the simplicity, elegance, and self-consistency of a point
of view, to back up your claim?
You can argue that doesn't automatically make it the correct one,
but that ignores Occam's Razor--and a valid appeal to authority.

Your idea of objective truth and mine apparently are not identical.
 
T

Tim Rentsch

Keith Thompson said:
The point is that sign-extension is a simple way to describe the
replication of the high order bit. I don't think anyone is arguing
that pointers on such a system *must* be described as (or as
behaving like) signed integers, merely that they can reasonably be
described that way.

I'm happy to agree that it's reasonable to describe how x86-64
pointers work using sign extension, etc, if others are willing to
agree that it's equally reasonable to describe how those pointers
work without using sign extension, etc. In short, I think what
you're saying is pretty much the same as what I'm saying... but
I'm not sure that other people in the discussion agree on that.
 
G

glen herrmannsfeldt

Tim Rentsch said:
I'm happy to agree that it's reasonable to describe how x86-64
pointers work using sign extension, etc, if others are willing to
agree that it's equally reasonable to describe how those pointers
work without using sign extension, etc. In short, I think what
you're saying is pretty much the same as what I'm saying... but
I'm not sure that other people in the discussion agree on that.

I might even say more reasonable, but less convenient.

Reminds me of the ways PL/I and (newer) Fortran pass arrays.
(being an addressing question, it might be applicable).

Both languages allow one to declare arrays giving lower and
upper bounds for dimensions. For PL/I, when passed to a called
procedure, the bounds information is also passed, both lower
and upper bound.

With Fortran assumed shape, where dimension information is
passed to a called routine, only the extent is passed.
(As seen in the called routine, the origin is 1, by default,
or can be specified other than 1.)

The former seems more natural, but the latter is sometimes more
convenient, especially when rewriting old routines.

They are both reasonably, but I don't know that you can say equally
reasonable. (As with addressing, it is difficult to measure
reasonableness.)

-- glen
 
G

glen herrmannsfeldt

Tim Rentsch said:
I don't share this opinion. If the inventors said that some values
were red and other values were green, that doesn't mean the values
necessarily acquire the property of being colored. Certainly it is
what they said (at least I will assume so for the sake of the
discussion), but that's not the property I'm interested in when
talking about signedness.

OK, how about something else that people like to disagree on,
and that is bit numbering. While endianness of many processors
is fixed by design, on most the numbering of bits within bytes
is not physically part of the hardware. It is, however, often
part of the documentation for the hardware.

Some people might like a different numbering than that in the
documentation, but doing so will result in confusion. That said,
using different numbering is still confusing when interfacing
between different systems. (I once knew someone using an Intel UART
in an Apple II, and, after wiring it up noticed the different
bit numbering conventions.)

Now, the authority could require a specific notation as part of
the license or other agreement. (SPARC, for example, requires their
trademark to be in bold font.)
The reason appeal to authority is not valid, or more accurately
relevant, for what I'm saying is that the property I'm concerned
with is not legislated by any authority.

If the authority isn't reasonable, then people won't follow it.
(Colored bits might not be reasonable.) If it is reasonable and
consistent, though, it will often be followed.

By the way, the index registers on the IBM 704 and successors
subtract from the base address, and arrays on such machines are
usually implemented in decreasing address order. Note that unlike
the signedness convention this one has physical meaning.

-- glen
 
J

James Kuyper

You are mistaken. The result would be an error message from the
compiler. C doesn't define the "%" operator for pointers.

It could be both - after diagnosing the code, the implementation is free
to translate it into a program that, if executed, does precisely what he
expects it to do. The implementation is also free to translate it into a
program that plays "Killing Me Softly With His Song". Both possible
results are, of course, equally relevant to the issue he's talking about.
 
G

glen herrmannsfeldt

(snip, I wrote)
2314s were 2400rpm, I don't think anything in the IBM mainframe world
was 3600 RPM until the 3330. The main reason that didn't increase for
a long time was that the rotational delay was (for many years) already
much smaller than the seek time, so further decreasing the rotational
delay would have had minimal effect on performance (the 3330, IIRC,
and a 24ms average seek, but a 8.3ms average rotational latency).

I thought 2314's were 3600, but that is remembering from a long
time ago. According to:

http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.31.3058&
rep=rep1&type=pdf&ei=T8AEUtDBMeKQyQH1qIDYDA&usg=AFQjCNF7VPXvj_a
mFEyNtSpfiRNIXj9gQg&bvm=bv.50500085,d.aWc

it is 3600, but I don't have any official references.

It seems that the 2305, designed for S/360, but more commonly used
for S/370, is 10000RPM.

I believe the 2301 is 3600RPM.

OK, now I found a reference that says 2400RPM for 2314.

-- glen
 
L

Lew Pitcher

in other words you claim
that mathematic is not the reason all run right,
that mathematic is not more smart than each one of us

No. YOU claim that I claimed the above. I said no such thing. Reread what I
wrote, and learn.
so what other model for aligned pointers?

I don't understand your question. Perhaps you need to rephrase it.

If you are referring to how the C compiler and runtime requires pointers (to
various data types) aligned, that's something that you have to check
against the specific compiler & runtime. There is no general rule for
alignment that applies to *all* C compilers and *all* C runtime.

If you are referring to how the C code of an application program can
establish proper alignment of pointers, then there's a simple answer: A C
application should not attempt to *explicitly* determine alignment.
Instead, it should depend on the C language to do that for it. malloc()ed
memory is guaranteed (by the C standard) to be allocated such that it suits
any system/compiler/runtime-imposed alignment requirements. Structures are
guaranteed (by the C standard) to properly enforce alignment rules between
elements. Unions do not (IIRC) "enforce" alignment, but do /expect/ that
all alternative elements "behave" as far as alignment goes. Thus, for
single elements, you malloc() and map the results either to a pointer to a
simple data item (such as an int or char or float) or to a pointer to a
structure or to a pointer to a union.

OTOH, if you are building your own memory allocator, and want to enforce the
alignment requirements of /your specific platform/, then you have diverted
into the area of "implementation specific" or "undefined" behaviour, and
are no longer writing C standard-compliant C code. That's a risk you have
to evaluate and take (or not) as you see fit.

As for the mathematics of pointer arithmetic, their definition in the C
language has nothing to do with me. I make no claims on the validity or
appropriateness of the restrictions of pointer arithmetic in C. That's what
the C standard defines. If you don't like the rules, complain (and give
alternatives) to the ISO C standards body. Be prepared to defend your
requirements and proposal.
 
J

James Kuyper

On 08/09/2013 11:35 AM, Lew Pitcher wrote:
....
OTOH, if you are building your own memory allocator, and want to enforce the
alignment requirements of /your specific platform/, then you have diverted
into the area of "implementation specific" or "undefined" behaviour, and
are no longer writing C standard-compliant C code. That's a risk you have
to evaluate and take (or not) as you see fit.

That was true in C99, but C2011 has added max_align_t, _Alignof() and
_Alignas(), which means it's now possible to accommodate
implementation-specific alignments with portable C code. Well, "now"
meaning as soon as you can find a compiler that implements those features.
 
L

Lew Pitcher

On 08/09/2013 11:35 AM, Lew Pitcher wrote:
...

That was true in C99, but C2011 has added max_align_t, _Alignof() and
_Alignas(), which means it's now possible to accommodate
implementation-specific alignments with portable C code. Well, "now"
meaning as soon as you can find a compiler that implements those features.

Thanks, James

I learn something new every day. I'll have to get me a copy of the 2011
standard and read it asap :)

Luck be with you
 
M

Malcolm McLean

On Friday 09 August 2013 02:50, in comp.lang.c, (e-mail address removed)

If you are referring to how the C code of an application program can
establish proper alignment of pointers, then there's a simple answer: A C
application should not attempt to *explicitly* determine alignment.
Instead, it should depend on the C language to do that for it.
There is an efficiency issue.

Say we're storing binary "objects" in a hash table or similar. Most of
the time these objects are going to be integers, pointers, reals, only rarely
char arrays. But we won't want to complicate the interface with a rule
that types have to be specified explicitly.
By detecting alignment, we can treat the chars specially, and use fast
instructions to manipulate the bulk of the usage cases.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,076
Messages
2,570,565
Members
47,201
Latest member
IvyTeeter

Latest Threads

Top