Another String reversal question

S

Siemel Naran

Francis Glassborow said:
<[email protected]>, Siemel

I think that I probably have an unusually extensive reading of relevant
literature but cannot recall any such statement being made by good, bad
or indifferent authors. Perhaps you could jog my memory with a few
references.

While I would give 'use bool to report success/failure' as a very strong
guideline, it is not an option when writing code that needs to be
compatible to all versions of C and to C++ so I continue to have strong
doubts that any convention such as the one you claim exists in the wider
C & C++ programming community.

Note that zero denotes success for return from main, and it also does so
for comparison functions passed to qsort and bsearch but it does not do
so for comparison functions passed to the various C++ sort and search
functions.

For the record, that convention is something I came up, and a few people
use. But I don't think anyone read throughout the land has said anything
like it.
 
D

Dhruv

I do not think you are thinking clearly about what happens on hardware
that has address registers, and particularly with oSs that do proper
memory management.

I do not know what address registers are. The only hardware that I'm
partially familiar with is the current x386 and pentium architecture, so
whatever I think is in terms of that hardware. It would be nice if you
could explain what happens on architectures with 'address registers', and
what they actually are?

Regards,
-Dhruv.
 
F

Francis Glassborow

Dhruv said:
I do not know what address registers are. The only hardware that I'm
partially familiar with is the current x386 and pentium architecture, so
whatever I think is in terms of that hardware. It would be nice if you
could explain what happens on architectures with 'address registers', and
what they actually are?

This is not the place for a lesson on hardware architecture. However
even the most elementary experience of X86 architectures requires some
knowledge that the standard register set include various special purpose
registers (SI, DI, SP and BP are examples that I recall from the days
when I had to dabble at that low a level). However the important issue
is that protected memory systems check values that purport to be
addresses and take action if the value is not available to the process.
Doing anything that uses an invalid pointer value (i.e. rvalue or
address in this context) is outside the domain of the C++ Standard and
so is undefined by the Standard.

The point at issue is that address registers in a protected memory
system are sometimes designed to trap when an out of range value is
loaded into them and there is nothing that a C++ programmer can do in
such circumstances.
 
F

Francis Glassborow

In message
Siemel said:
For the record, that convention is something I came up, and a few people
use. But I don't think anyone read throughout the land has said anything
like it.

Oh you mean "Siemel Naran's" convention. That is very different from
'the convention' which would claim some for of general acceptance. In
addition treating personal coding conventions as if they were universal
is very dangerous particularly when that information is handed out to
the inexperienced.
 
D

Dave Harris

In particular, what's wrong with computing p+2? What's the rationale
behind not allowing it?

p+1 is allowed because x is treated as an array of length 1, and p+1
is the one-past-the-end value, which is fine. p+2 is not fine. It may
point to memory which is not owned by the program, and C++ is allowed
to trap as soon as such an address is evaluated.

AFAIK, delete is not allowed to mofdify the pointer passed to it,
so just accessing the value stored in the pointer should not be a
problem?

Even if p has the same bit pattern, the memory which that bit pattern
refers to no longer belongs to the program. So again C++ is allowed
to trap on load.

-- Dave Harris, Nottingham, UK
 
R

Ron Natalie

[mod note: setting your newsreader to post with a width of less than 80 columns
will avoid this type of formatting problem.]

Dhruv said:
It would be nice if you
could explain what happens on architectures with 'address registers', and
what they actually are?

This whole "one-past-the-end" issue came from a very real environment that
was in popular
use at the time the C standard was originally being hammered out. It was
the 80x86 segmented
(pre-386) architectures. While there were others as well, like your
beloved Pentium, it was
very popular in it's day. The issue was that these architectures could
only index 16-bits worth
of address space at a time, a separate register held the rest of the
address. Without specifically
mandating the one-past-the-end guarantees, it was quite possible that an
object/array could be
placed at the end of a 64K segment. The one past the end address could
roll over to be have
a value LESS than an address inside the array (if the behavior was left
undefined as other outside
the bounds of arrays are even today).

Maybe you can assume a flat address space today, but who knows what the
future will bring.
Perhaps some day we'll end up with huge cross-network addresses (such an
architecture
did exist in the past: the Apollo Domain). It may be once again
advantageous to work with
local pointers that are subsets of the global address space, and the
one-past-the-end guarantee
will need come into play again.

Another issue is while by and large most architectures only trap invalid
acesses when you actually
read or write through pointers to invalid memory, nothing precludes a
machine from trapping when
you manipulate an invalid pointer value. Again you will need the
one-past-the-end guarantee.
This might happen someday when security and reliability become more
important that the current
cavalier Microsoft-induced attitude.

Believe me, it can happen and be best to make your application clean against
reliance on undefined
behavior. Lots of heartburn when apps were ported to the first 64 bit
architectures in common use
in micros (Alpha) because people assumed a things like the size of pointers
and the various other
types. We also had fun because while most machines trap only computations
involving invalid
floating point numbers, the Alpha (which was fine by the language) trapped
even loading invalid
floats into the fp registers. Our bad for assuming we could put (possibly)
garbage values into
float variables. We had to clean that up.

-Ron
 
D

Dhruv

On Sun, 21 Dec 2003 06:40:17 -0500, John Potter wrote:

[...]
I don't think C++ is rational. There is no rationale. :)

There goes the standard ;-)

[...]
I think you are confusing operator delete with the delete expression.
The delete expression does anything the compiler generates including
assigning an invalid value to the pointer or adding the value of the
pointer expression to a list of addresses which cause email to your
boss in a program that produces that rvalue.

See 5.3.5/4. The delete expresion invalidates the pointer. The only
thing that may be done with a pointer holding an invalid value is to
assign a new value. You are not allowed to look at the old value.

So, how would you account for something like this:

delete ((int*)0); ?

Regards,
-Dhruv.
 
J

Jeff Schwab

John said:
Yes.

| 5.7/4 For the purposes of these operators, a pointer to a
| non-array object behaves the same as a pointer to the first element of
| an array of length one with the type of the object as its element type.

It is always valid to add one to a dereferencable pointer value.

John


Sweet!!! I guess this implies that algorithms meant to work on
collections actually can work on individual elements just as easily.
Good to know.

#include <iostream>
#include <cctype>

int main( )
{
char const* s = "a";
char const c = 'a';

std::cout << std::equal( s, s + 1, &c ) << '\n';
}
 
F

Francis Glassborow

Dhruv said:
So, how would you account for something like this:

delete ((int*)0); ?

Your point being? Because that is actually pointless code ((int*)0) is a
null pointer (guaranteed to be a valid pointer value) and C++ guarantees
that supplying that in a delete expression results in nothing happening.
 
D

Dhruv

Your point being? Because that is actually pointless code ((int*)0) is a
null pointer (guaranteed to be a valid pointer value) and C++ guarantees
that supplying that in a delete expression results in nothing happening.

ok, no because as John Potter mentioned that the delete expression is
allowed to modify the pointer passed to it, so I was just wondering how it
would modify a constant passed to it. But now that you've mentioned that
C++ guarantees a Null operation, there's nothing wrong with it.

Regards,
-Dhruv.
 
D

Dhruv

[mod note: setting your newsreader to post with a width of less than 80 columns
will avoid this type of formatting problem.]

Dhruv said:
It would be nice if you
could explain what happens on architectures with 'address registers', and
what they actually are?
[...]

The issue was that these architectures could
only index 16-bits worth
of address space at a time, a separate register held the rest of the
address. Without specifically
mandating the one-past-the-end guarantees, it was quite possible that an
object/array could be
placed at the end of a 64K segment. The one past the end address could
roll over to be have
a value LESS than an address inside the array (if the behavior was left
undefined as other outside
the bounds of arrays are even today).

So, I guess adding 1 to the end of the array would mean that the higher
16-bits in the register would get incremented by 1 and the lower 16-bits
would roll back to 0?

[...]
Another issue is while by and large most architectures only trap invalid
acesses when you actually
read or write through pointers to invalid memory, nothing precludes a
machine from trapping when
you manipulate an invalid pointer value. Again you will need the
one-past-the-end guarantee.
This might happen someday when security and reliability become more
important that the current
cavalier Microsoft-induced attitude.

So, isn't one past the end of the array also some memory that the running
process might not bw owning? I have probably misunderstood something here.

Assume this is the array:
[ ][ ][ ][ ][ ][ ][ ][ ][ ][ ][ ][ ][ ]
^ ^ ^
| | |
First element Last |
|
One past last.

Now, do you mean to say that the one past last element of the array is
also opwned by the process?


Regards,
-Dhruv.
 
D

Dhruv

]
Sweet!!! I guess this implies that algorithms meant to work on
collections actually can work on individual elements just as easily.
Good to know.

#include <iostream>
#include <cctype>

int main( )
{
char const* s = "a";
char const c = 'a';

std::cout << std::equal( s, s + 1, &c ) << '\n';
}

Getting a bit picky, I guess s+1 is not the actual end of the array.
However s+2 is right? Or am I mistaken?

Regards,
-Dhruv.
 
F

Francis Glassborow

Dhruv said:
So, isn't one past the end of the array also some memory that the running
process might not bw owning? I have probably misunderstood something here.

Assume this is the array:
[ ][ ][ ][ ][ ][ ][ ][ ][ ][ ][ ][ ][ ]
^ ^ ^
| | |
First element Last |
|
One past last.

Now, do you mean to say that the one past last element of the array is
also opwned by the process?

The address of one past the last must be owned by the process. Note that
only requires a single extra address in the address range. One before
the start is an entirely different issue which is why it is not
supported.
 
D

Dave Harris

For functions returning an integer, return of zero means no error,
and return of any other number (positive or negative) means error.
Are you sure you want to return 1?

In other conventions, a negative number means an error, and 0 or
positive means success. This is useful when a successful result
needs include something like a file handle, which is an index into
some table. In still others, 0 means false which means failure.

In this case I don't see a need for any result at all, because
StringReverse cannot fail.

-- Dave Harris, Nottingham, UK
 
J

John Potter

So, isn't one past the end of the array also some memory that the running
process might not bw owning? I have probably misunderstood something here.

Yes. The standard requires that the last byte of every address space
owned by the process be unused and available for use as a
one-past-the-end of whatever comes in front of it. The standard
requires that one-past-the-end of any object be a valid address. It
need not be dereferenceable nor the beginning of space large enough
to hold an object of the same type. Why or how the implementation
does it is not important to a programmer. It is a guarantee.

John
 
J

Jeff Schwab

Dhruv said:
]

Sweet!!! I guess this implies that algorithms meant to work on
collections actually can work on individual elements just as easily.
Good to know.

#include <iostream>
#include <cctype>

int main( )
{
char const* s = "a";
char const c = 'a';

std::cout << std::equal( s, s + 1, &c ) << '\n';
}


Getting a bit picky, I guess s+1 is not the actual end of the array.
However s+2 is right? Or am I mistaken?

You're right that s + 2 is the end of the array, but in this case, I
wasn't looking for the end of the array. I was only comparing one
character.

-Jeff
 
K

kanze

This whole "one-past-the-end" issue came from a very real environment
that was in popular use at the time the C standard was originally
being hammered out. It was the 80x86 segmented (pre-386)
architectures.
[...]

Another issue is while by and large most architectures only trap
invalid acesses when you actually read or write through pointers to
invalid memory, nothing precludes a machine from trapping when you
manipulate an invalid pointer value. Again you will need the
one-past-the-end guarantee. This might happen someday when security
and reliability become more important that the current cavalier
Microsoft-induced attitude.

Why do you say pre-386? The only large application I did on an 80386
used 48 bit segmented pointers. Every malloc returned a different
segment, and segment selectors were invalidated when the memory was
returned by free. Loading an invalid segment selector into a segment
register provoked a hardware trap. And the code generated by the
compiler to read an address from memory was the instruction LES, LFS or
LGS. So something like:

free( p ) ;
if ( p == NULL )

really did trap. (On the other hand, there was no problem with 2 past
the end, or one before the beginning, as long as one didn't
dereference.)
Believe me, it can happen and be best to make your application clean
against reliance on undefined behavior. Lots of heartburn when apps
were ported to the first 64 bit architectures in common use in micros
(Alpha) because people assumed a things like the size of pointers and
the various other types. We also had fun because while most machines
trap only computations involving invalid floating point numbers, the
Alpha (which was fine by the language) trapped even loading invalid
floats into the fp registers. Our bad for assuming we could put
(possibly) garbage values into float variables. We had to clean that
up.

Interesting. I first encountered that problem on Interdata 8/32, in
1977. (Hardly new:).) And consider yourself luck you got a trap; in
our case, the hardware would normalize floating point values on a load.
One of the programmers had "optimized" his copy loops to use double* for
text data, in order to pass through the loop less times. The 8/32 used
IBM format floating point; most of the time, there was no problem, but
if the second byte in the double happened to be a control character
(with the top four bits 0), all of the bytes in the mantissa were
shifted left 4 bits, and the exponent was decremented. Obviously, the
character values after that were somewhat unexpected. But legitimate
character data would never cause a trap.

That's what's fun about undefined behavior. You never know what it
might do:).
 
K

kanze

Francis Glassborow said:
In message
I think that I probably have an unusually extensive reading of
relevant literature but cannot recall any such statement being made by
good, bad or indifferent authors. Perhaps you could jog my memory with
a few references.

With regards to functions returning int, there are a few conventions
established by the C standard: for the functions in ctype, for example,
0 means failure, as does NULL (a type of zero) for fopen. On the other
hand, many of the IO functions will return the number of characters
transmitted (with 0 as a valid value), or a negative value for failure.
While I would give 'use bool to report success/failure' as a very
strong guideline, it is not an option when writing code that needs to
be compatible to all versions of C and to C++ so I continue to have
strong doubts that any convention such as the one you claim exists in
the wider C & C++ programming community.

I rather prefer something along the lines of "enum ReturnCode { ok,
error } ;". I'd prefer it even more if there were no implicit
conversion to bool. While true == success seems more reasonable than
the reverse to me as well, I've had to deal with code which used the
other convention. Whereas "return ok", or "if ( func() == ok )" is
perfectly clear in every case.
Note that zero denotes success for return from main, and it also does so
for comparison functions passed to qsort and bsearch but it does not do
so for comparison functions passed to the various C++ sort and search
functions.

There is no error return value for the comparison functions passed to
qsort and bsearch.
 
D

Dhruv

On Mon, 22 Dec 2003 09:11:56 -0500, Francis Glassborow wrote:

[...]
The address of one past the last must be owned by the process. Note that
only requires a single extra address in the address range. One before
the start is an entirely different issue which is why it is not
supported.

Then how are reverse iterators supposed to work?

Regards,
-Dhruv.
 
D

Dave Harris

So, I guess adding 1 to the end of the array would mean that the
higher 16-bits in the register would get incremented by 1 and
the lower 16-bits would roll back to 0?

With some hardware, the higher 16 bits would be unchanged. Pointer
arithmetic only affected the low 16-bits; the CPU might not even
have been capable of 32-bit arithmetic directly.

Thus (char *) 0x1001ffff + 1 = (char *) 0x10010000 for these pointers,
and p > p+1.

(For some compilers, only the low 16 bits took part in pointer
comparisons, but this wasn't ratified by the standard. It meant
(char *)0x10014444 == (char *) 0x20024444, and you had to be
sure you had the right kind of NULL. Unpleasant.)

So, isn't one past the end of the array also some memory that the
running process might not bw owning?

It could be, but the C++ standard requires that the compiler make it
work. This may mean allocating one more byte than necessary. In
practice for alignment reasons you may need several bytes, but there
doesn't have to be a whole object there. In that respect your diagram
is misleading.

To allow one before the beginning, we would need sizeof(object) bytes,
which is more onerous because an object can be arbitrarily large.

-- Dave Harris, Nottingham, UK
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,156
Messages
2,570,878
Members
47,413
Latest member
KeiraLight

Latest Threads

Top