Sizes of pointers

  • Thread starter James Harris \(es\)
  • Start date
G

glen herrmannsfeldt

You could sort an array of pointers, and use binary search to
find a pointer in that array.

For the question "does the pointer p point into the array q which
has a size of n bytes", it would be sufficient if the difference
between unrelated pointers did yield an unspecified (but not
undefined) result. Take the difference n, check that it is
between 0 and the size of the array, then compare p
and &array [n].

And C allows for machines that don't have a pointer compare operation.
If you really need one, put your pointers into a struct, add
another field that would be the distance from the beginning of memory
if all allocated regions were allocated sequentially. You might even
add in a little padding between them. Using those, then, you can
do any comparisons that you might otherwise do using the appropriate
pointers.

(As previously noted JVM has no operation other than equality and
non-equality on object references. There might be others, too.
Protected-mode x86 in large model normally compares only the offset.
In huge model, it should compare both selector and offset.

-- glen
 
G

glen herrmannsfeldt

(snip)
Copying is unfortunately required by some system calls. Every time a system
call supplies a buffer for the OS to write in to the OS has to copy data. In
that case the design of the system call mandates at least one copy
operation.

For OS/360, it is possible to do I/O without making a copy of
the data. (Well, the data has to already be contiguous.)

In the days of small memory systems, avoiding excess copying
was pretty important.

(snip)
I can answer best in connection with x86. On that, a reload of the register
which is the root of the page table structures (CR3) only the entries in the
TLB are documented to be flushed. It could take ages to flush the data
cahces to memory so leaving them alone is a good thing. I cannot think of a
reason to flush normal data at the same time as TLB.

For a virtual addressed cache, the addresses might be reused in a
different address space. For a real addressed cache, you have to
flush on a page fault, as with any I/O operation.
OS designers can choose whether x86_32 page table entries are cached in both
TLB and data cache or just in TLB by setting Page Cache Disable (PCD) bits
in, IIRC, the page directory. That made sense when the normal data cache was
very small as it helped avoid populating the data cache with entries which
were also cached in the TLB.

-- glen
 
G

glen herrmannsfeldt

(snip)
While PCIDs ala x86 are fairly new, ASIDs are not. S/370 used the
starting address of the segment table* as part of what it put into the
TLB, and a translation in the TLB had to match not only the virtual
address, but the segment table origin of the translation tables that
originally produced that translation in the TLB had to match the
current STO. Since each address space would have a different segment
table you had an effective ASID (although it wasn't called that).
And just to be clear, I'm not claiming that S/370 was first with
ASIDs, but it's the first I know of.
*S/370 paging used a very conventional** two level page table, the
upper one was called the segment table, although there's no real
relation to the more common concept of segments. But the S/370 segment
table is very similar to an x86 Page Directory..

In college, I had to write a paper for a CS class. Among other
things, it compared the two level paging for S/370 and two
level paging for VAX. (Page tables in two different address spaces.)

Now with 64 bit processors, there might be five levels.
I believe both z/ and x86-64 have five, with ways to speed things
up until real memory gets that big.
**In modern terms - obviously little about paging was "conventional"
in 1972. Or in the late sixties when the 360/67 prototyped the later
370 paging facility.

-- glen
 
G

glen herrmannsfeldt

Stephen Sprunk said:
I'll grant that 36-bit machines have fallen out of favor, and I'm not
sure how common ones complement and signed magnitude are these days, but
16-bit (and 8-bit) machines are still common, as are 24-bit DSPs.

As far as I know, Unisys still sells ones complement machines.

That last sign magnitude binary fixed point that I know of is
the IBM 7090 series. (Maybe 704, 709, 7090, 7094.)

One way to do sign magnitude is to convert to ones complement,
add, then convert back.

S/360 through current z/Architecture support decimal (BCD) in
sign magnitude form.

And, of course, most floating point is sign magnitude.
The industry is less diverse than it used to be, but it's
still far more diverse than Rosario1903 believes it to be.
All the world is _not_ an x86.

-- glen
 
G

glen herrmannsfeldt

(snip)
We could probably come up with dozens of such oddities that _could_ be
programmer-adjustable but typically aren't. I can't think of anywhere
the standard requires that, though; at most, it gives implementations a
list of options to choose from. Is that enough of an improvement over
the status quo, where things are simply left undefined?

Well, note that C pretty much requires binary arithemtic, but Fortran
allows any base greater than one. Still, trinary computers are
pretty rare.

-- glen
 
G

glen herrmannsfeldt

(snip, I wrote)
Right. And almost from the get-go, Java had to relax its
rules for floating-point to make x86 implementations practical.
(See the "strictfp" keyword.)

Still, it was a reasonable thing to do at the time, where it would
not have been when C was standardized. (Especially not when C started.)
As for self-obsolescence -- Well, data sets get bigger all
the time, and "Big Data" is today's trendy term. Yet Java
*cannot* have an array with more than 2^31 elements, not even
on a 64-bit system with forty-two terabytes of RAM.

Presumably that can be fixed at some point, but it won't be
quite as easy as if they had done it in the first place.
But slowing down all array accesses by allowing for long
subscripts would have made it unnecessarily slow in the
first place.

Since Java requires casts on narrowing operations, it should
not be possible to allocate or index an array with long now.
Adding it won't break existing programs.
I'm not saying that Java's choices were uniformly better
or worse than C's; Java certainly gains in definiteness, even
if it loses in portability. The point is that there's a trade-
off: more exactness with more rigidity, more flexibility with
more vagueness -- it's almost Heisenbergian.
The extreme exactness Rosario favors would, I believe, have
made C so rigid that there would have been little interest in
the language. C's success is due in large measure to the ease
with which the language could be implemented on a variety of
machines; had that *not* been easy, who'd have wanted C? It
would have been a high-level assembler for the PDP-11, nothing
more.

-- glen
 
G

glen herrmannsfeldt

James Kuyper said:
On 08/06/2013 05:41 AM, James Harris wrote:
(snip)
He's talking about the hypothetical case where C's floating point
semantics had been specified so precisely that only PDP-11 floating
point qualified. This is not just about limits, but also about
specification of the handling of special cases. Take a look at Annex F
for an example of the kinds of special cases whose handling could be
specified. I know nothing about how PDP11 floating point differs from
IEEE, but I presume Eric would not have mentioned it unless there were
some such differences. Imagine the case where Annex F matches PDP11, int
a way that makes it incompatible with IEEE, and imagine that conformance
to Annex F was mandatory, rather than depending upon whether the
implementation choose to pre-#define a macro like __STDC_IEC_559__. How
popular would C be, with such a specification?

As well as I know it, the VAX floating point format came from
some model of PDP-11. The funny mixed-endian form, to be
more specific.

-- glen
 
G

glen herrmannsfeldt

(snip)
Oh, no, I was thinking about representing floating point
numbers using integers - most likely the traditional exponent
and mantissa - and manipulating those using the machine's normal
registers just like had to be done before FP hardware emerged
from the test tube.

The first floating point hardware that I know about is the
IBM 704 where Fortran originated. That was around 1956, way
before C.
A given floating point representation would look and behave in
exactly the same way on all machines so the results would be
repeatable. Choosing representation and operations for speed
is what I had in mind.

There are formats that are more efficient to implement in
byte oriented software. One that I know of puts the exponent
in one byte, and the sign as the most significant bit of the
hidden one significand. After one extracts the sign, one can
then set that bit an treat it as a non-hidden one.

-- glen
 
G

glen herrmannsfeldt

(snip)
Don't remind me. As I remember it, Intel already had their 80-bit
representation and applied pressure to get it included in the 754 spec.
Sure, it helps precision but I wish they had either stuck to
powers of two or mandated similarly slightly-larger
representations for all other sizes. As it stands it is
an anomaly ratified by the standards body to give one
manufacturer an edge over the others as your example
illustrates. Its inclusion in the standard may have been
purely for partisan commercial interests.

As well as I know it, the 8087 was developed simultaneously with
the standard, with Kahan being a big part of it. Presumably at the
time they didn't know how much of a problem it would be.

Consistent use of extended precision is probably better, but
that isn't what happens in real programs.
The Java intention of write once, run anywhere is a good goal
as long as the programmer gets involved and knows the limits
of such a promise. For example, even if the JVM executed
instructions in an identical way a program which needed a
gigabyte of memory wouldn't "run anywhere" on a machine with
128M of RAM. And a program which needed a network connection
wouldn't work the same way on a stand along machine.
And a program which needed a high resolution timer wouldn't
work the same way on a machine with a coarser timer.

Well, those are pretty much the limitations that are stated
with, for example, the S/360 architecture. Maybe one of the
earlier write once run anywhere (any sized S/360), you have
similar limitations.

It is interesting to see what happens when running old OS
under modern emulation. Some systems depend on things taking
a certain minimum time, and fail at full speed emulation.

-- glen
 
E

Eric Sosman

(snip, I wrote)


Still, it was a reasonable thing to do at the time, where it would
not have been when C was standardized. (Especially not when C started.)

My point was that even a Really Serious Attempt at a "fully-
portable" language wasn't fully successful: Java had to let a few
of the barbarians through the gates, or it would have died right
away. I consider this an illustration of how far beyond the
realm of possibility Rosario's notions are.
Presumably that can be fixed at some point, but it won't be
quite as easy as if they had done it in the first place.
But slowing down all array accesses by allowing for long
subscripts would have made it unnecessarily slow in the
first place.

Since Java requires casts on narrowing operations, it should
not be possible to allocate or index an array with long now.
Adding it won't break existing programs.

What happens to the array's `.length' attribute? It's a
32-bit `int', so it can't count above 0x7fffffff. Change it
to a `long', and existing code like

int mid = array.length / 2;

.... would fail to compile. Worse, code like

for (int i = 0; i < array.length; ++i)
array = i;

.... would still compile, but would throw an exception when used
on a "big" array (ArrayIndexOutOfRangeException when the subscript
wrapped around and went negative).

It's true that no existing program uses any "big" arrays
(they can't be created, as you observe). But that doesn't mean
that the changes to support "big" arrays would leave existing
code unscathed.

(My personal bet: Java will eventually wither, and there
will be a Java-like-but-not-Java successor, perhaps Sumatra.
And we will get to watch the Second System Effect yet again ...)
 
J

James Harris

Stephen Sprunk said:
If you need to move more data than will fit in the registers, e.g.
read() or write(), copying to/from a buffer seems mandatory, aside from
completely changing the I/O model a la mmap() or sendfile().

The 'problem' with traditional models is when they have the disk, say, write
to kernel space and then the kernel write the same data to user space, often
with different alignment. That happens any time the app reserves a buffer
(such as with malloc or an automatic or in the data space) and has a library
function write in to it. The copy between kernel and user space can be
avoided with a different IO model. It needs to be done right taking into
account whether the app is going to just read or to read and write, and
whether the app is reading randomly or sequentially. For example, the kernel
can arrange that the disk controller write data directly into memory that
the app can read and then tells the app where to look. That avoids the extra
kernel-to-user copy.

James
 
J

James Harris

....
We could probably come up with dozens of such oddities that _could_ be
programmer-adjustable but typically aren't. I can't think of anywhere
the standard requires that, though; at most, it gives implementations a
list of options to choose from. Is that enough of an improvement over
the status quo, where things are simply left undefined?

Yes. I wasn't talking about the C standard specifically. I just meant that
for some programs there would be value in being able to rely on or choose
certain behaviour regardless of the programming language. Not all of us can
remember that integer division and remainder can generate different results
on different machines.

Some languages allow the choice by using different operators, for example. I
think that's a good solution. The slight problem with that is that the one
the programmer uses by default may be the one that's not directly supported
by the CPU, i.e. it could be the slower one.

I often find it hard to remember the differences. These pages can help. The
second shows how the problem is addressed in different languages.

http://codewiki.wikispaces.com/mod+and+rem
http://en.wikipedia.org/wiki/Modulo_operation

James
 
G

glen herrmannsfeldt

(snip, someone wrote)
The 'problem' with traditional models is when they have the disk, say, write
to kernel space and then the kernel write the same data to user space, often
with different alignment. That happens any time the app reserves a buffer
(such as with malloc or an automatic or in the data space) and has a library
function write in to it. The copy between kernel and user space can be
avoided with a different IO model. It needs to be done right taking into
account whether the app is going to just read or to read and write, and
whether the app is reading randomly or sequentially. For example, the kernel
can arrange that the disk controller write data directly into memory that
the app can read and then tells the app where to look. That avoids the extra
kernel-to-user copy.

What IBM, at least, calls Locate mode I/O. PL/I has options to
allow for it. In the unix model, I am not sure that it can be done,
but it should allow for one fewer copy than otherwise would be done.

-- glen
 
G

glen herrmannsfeldt

(snip, someone wrote)
(snip, then I wrote)
Z defines a five level scheme (Regions First Table, Region Second,
Region Third, Segment and Page Table - the last two being slightly
wider versions of what was in the pre-64 bit architecture), but the
control register pointing to the beginning of the page table can
specify any of the first four levels.

I remembered that there was a way to shorten it when not so many
were needed, but didn't remember how they did it.
x86 in 64 bit mode uses a fixed four level scheme (PML4, Page
Directory Pointer Table, Page Directory and Page Table), and is
currently limited to 48 bit linear addresses. Presumably another
level or two will be added when necessary.

Unless we have something completely different by then.

-- glen
 
G

glen herrmannsfeldt

(snip, someone wrote)
My point was that even a Really Serious Attempt at a "fully-
portable" language wasn't fully successful: Java had to let a few
of the barbarians through the gates, or it would have died right
away. I consider this an illustration of how far beyond the
realm of possibility Rosario's notions are.

But extended precision is part of the IEEE standard...
What happens to the array's `.length' attribute? It's a
32-bit `int', so it can't count above 0x7fffffff. Change it
to a `long', and existing code like
int mid = array.length / 2;
... would fail to compile. Worse, code like
for (int i = 0; i < array.length; ++i)
array = i;

... would still compile, but would throw an exception when used
on a "big" array (ArrayIndexOutOfRangeException when the subscript
wrapped around and went negative).
It's true that no existing program uses any "big" arrays
(they can't be created, as you observe). But that doesn't mean
that the changes to support "big" arrays would leave existing
code unscathed.

One way is to have something like longlength in the case of
big arrays. Maybe a little ugly.

If you consider bigarray as subclass of array, I think it works.
As a subclass, the constructor will select based on the type of
the expression to the new operator. Seems like that means that
array variables need to be somenow different, though.
(My personal bet: Java will eventually wither, and there
will be a Java-like-but-not-Java successor, perhaps Sumatra.
And we will get to watch the Second System Effect yet again ...)

-- glen
 
K

Keith Thompson

James Kuyper said:
I'm not sure how helpful that really is. Linearity within each
object is important, but as long as I can get a unique address for
each object (including each byte within each object), why should I
care how addresses of distinct objects relate to each other (apart
from "==" and "!=" working properly)?

You could sort an array of pointers, and use binary search to find a
pointer in that array.

For the question "does the pointer p point into the array q which has
a size of n bytes", it would be sufficient if the difference between
unrelated pointers did yield an unspecified (but not undefined)
result. Take the difference n, check that it is between 0 and the
size of the array, then compare p and &array [n].

If the difference between unrelated pointers is unspecified, how could
you be sure that it isn't a value between 0 and the size of the array?

Suppose relational and additive operators on unrelated pointers yielded
an unspecified result rather than having undefined behavior. Then you
could do something like this (which is pretty much what Christian
suggested):

/* returns true iff ptr points to an element of the array or just
past the end of it. */
bool pointer_in_range(const void *ptr, const void *base, size_t size) {
const char *cptr = ptr;
const char *cbase = base;
size_t index = ptr - base;
return index > = 0 && index <= size && ptr == base + index;
}

Relational and arithmetic operators could yield garbage results for
pointers to distinct objects, but not for pointers to the same object --
and "==" is always reliable. It's possible for "index > = 0 && index <=
size" to yield a false positive; the "&& ptr == base + index" test
checks for that.
 
T

Thomas Jahns

The first Java implementations on x86 architectures ran
into a problem: x86' floating-point implementation used extended
precision for intermediate results, only producing 32- or 64-bit
F-P results at the end of a series of calculations. Original
Java, though, demanded a 32- or 64-bit rounded result for every
intermediate step (which is what SPARC's implementation did).
The upshot was that to achieve the results mandated by original
Java, a Java-on-x86 implementation would have had to insert an
extra rounding step after every F-P operation, with a significant
speed penalty (perhaps a pipeline stall).

I'm not sure that's accurate because x86 FPU has control word bits which can be
set explicitly to change width of in-register precision. This would only cause
problems for constantly mixing precisions. Not great but also not catastrophical.

Regards, Thomas
 
J

James Harris

Thomas Jahns said:
I'm not sure that's accurate because x86 FPU has control word bits which
can be
set explicitly to change width of in-register precision. This would only
cause
problems for constantly mixing precisions. Not great but also not
catastrophical.

I've written for the x86 for many years but never, in assembly, had a need
to write floating point code. However, checking the manual from the days of
the Pentium (doc number 243190) it says as follows.

---
7.3.4.2. PRECISION CONTROL FIELD

The precision-control (PC) field (bits 8 and 9 of the FPU control word)
determines the precision (64, 53, or 24 bits) of floating-point calculations
made by the FPU (see Table 7-4). The default precision is extended
precision, which uses the full 64-bit significand available with the
extended-real format of the FPU data registers. This setting is best suited
for most applications, because it allows applications to take full advantage
of the precision of the extended-real format.

The double precision and single precision settings, reduce the size of the
significand to 53 bits and 24 bits, respectively. These settings are
provided to support the IEEE standard and to allow exact replication of
calculations which were done using the lower precision data types. Using
these settings nullifies the advantages of the extended-real format's 64-bit
significand length. When reduced precision is specified, the rounding of the
significand value clears the unused bits on the right to zeros.

The precision-control bits only affect the results of the following
floating-point instructions: FADD, FADDP, FSUB, FSUBP, FSUBR, FSUBRP, FMUL,
FMULP, FDIV, FDIVP, FDIVR, FDIVRP, and FSQRT.
---

I'm not sure how well that accords with either view but it is interesting
nonetheless.

James
 
S

Stephen Sprunk

The 'problem' with traditional models is when they have the disk,
say, write to kernel space and then the kernel write the same data to
user space, often with different alignment. That happens any time the
app reserves a buffer (such as with malloc or an automatic or in the
data space) and has a library function write in to it.

That's the read()/write() model.
The copy between kernel and user space can be avoided with a
different IO model. It needs to be done right taking into account
whether the app is going to just read or to read and write, and
whether the app is reading randomly or sequentially. For example, the
kernel can arrange that the disk controller write data directly into
memory that the app can read and then tells the app where to look.
That avoids the extra kernel-to-user copy.

That's the mmap() model.

S
 
P

Phil Carmody

James Harris said:
I've checked the stripfp issue and can see why the keyword was added but it
looks like a kludge added in order to accommodate the eccentric FP width of
Intel FPUs. Once Intel had had their hardware described by the IEEE they
were in a position to exercise undue influence on the market.

From your comment it sounds like there is no fast way for the x87 and its
descendants to round the top of stack to 32 or 64 bits.

Nobody interested in numeric computing should be using Java anyway.

/Branch Cuts for Complex Elementary Functions, or Much Ado About Nothing's Sign/ by Kahan
/How s Floating-Point Hurts Everyone Everywhere/ by Kahan and Darcy

Phil
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,077
Messages
2,570,566
Members
47,202
Latest member
misc.

Latest Threads

Top