trim whitespace

N

Nick Keighley

You're still not understanding.

The mere FACT of accessing random garbage memory is undefined behavior.

it might be unmapped memory, or unwriteable memory or code.

<snip>
 
N

Nick Keighley

I don't know about you, but I can always find a way to guarantee my code
will never be stuck in an infinite loop.

but your code can end up onlt partically trimming a string that was
actually trimmable. I think you're running foul of the Halting
Problem.
Despite what you believe, I did.

no, not really.
 
N

Nick Keighley

Assume an implementation where a pointer consists of two parts: a segment
identifier and an offset.

On allocation, the OS reserves a virtual memory region - which may well
not have any actual memory behind it.  On read or write, the OS detects a
fault (no memory page mapped), sorts out how to handle it, digs up a
usable page of memory, maps it in and away you go, reading or writing.

Then you free the memory, telling the OS that the virtual memory region
is now invalid, that it does not exist.

What happens now, if you try to - as you say - "try and trim the
garbage"?  You read a byte, the system faults, the OS's memory manager
kicks in to load the associated page of memory, but there _is no_
associated page of memory.  You are trying to access memory which _does
not exist_.  The app - if you're lucky - is summarily killed by the OS
for trying to poke its nose into memory it doesn't own.

In C terms, by calling "free", you disposes of the object in question,
but then attempted to examine the object after the fact.  C's answer to
this is "undefined behaviour", where *any* outcome is perfectly
acceptable: crashing, "working", trashing memory belonging to other
processes, setting your CPU on fire, causing your kitten to tie you up
and flog you with damp tea bags, it's _all_ perfectly acceptable as far
as C is concerned.

Once that pointer has become invalid, there is simply _no manner_ in
which you can use it, for any meaningful operation, other than assigning
a new value to it - which does not help one bit in trying to do what
you're seeking to accomplish.

one fun possibility is preemptive multiple processes without memory
protection. If two processes start modifying the same memory at the
same time I think an infinite loop is quite possible.
 
N

Nick Keighley

Standards conformance has its place, but how can you protect yourself if
you don't explore UB and learn how things really work.

but it will change if

- you change machines
- you change compilers
- you ugrade your compiler (I've seen code break on a compiler
upgrade)
- you change an optimisation flag
 
N

Nick Keighley

ho! ho!
On most CPUs (especially modern ones), memmove() cannot be
implemented in one assembly instruction.

older machines could nearly do it though. There was usually some setup
first. Loading pointers and counts into appropriate registers.
 Even if it can, that
instruction must perform a pass over the data; it's going to be O(N),
where N is the number of bytes moved.  There's nothing magical about
microcode.

true. I tended to find the explicit loop was quicker than the single
instruction.
memmove() is likely to be faster than an equivalent C loop, but
only by a constant factor, and probably a fairly small one.

I've seen some fairly rubbish memmove() implementations as well (well
memcpy() actually)
 
N

Nick Keighley

He's not.  You made the assertion that memmove was "probably" not
equivalent to a "pass" -- even though it's trivially obvious that
it MUST be in order for its operation to be carried out.  Keith
correctly pointed out that this somewhat topical assertion was wrong.
His post is not a rant, and is in no way off topic; questions of
what memmove() is or isn't are pretty topical.  Programmers who
wish to make good use of C should know that, while usually efficient,
memmove() is not a magical tool for bypassing performance
considerations.  In particular, this magical "one assembly
instruction" is both implausible (in general) and misleading -- you
seem to think that performance is purely a function of number
of instructions, but again, in real-world cases, user-provided C
loops which compile into complicated loops in assembler have been
known to outperform dedicated hardware functions by a large margin.

what about blit chips? Could they beat ordinary assemble? You'd think
doing it completely in hardware *could* be quicker
 
N

Nick Keighley

I think you're talking to a wall here.  Kelly's in all the world's a
VAX mode -- he's assuming a flat memory model where you can do whatever
you want to pointers, as long as you "avoid overflow", which is totally
incoherent -- but since he dismisses any platform which doesn't conform
to his /a priori/ assumptions as broken or uninteresting, it hardly
matters.

can he name a machine that meets his requirement? M68000?
 
I

Ian Collins

Define symbols and (words)
exam ......... temporary char *
hast ......... temporary char * to alpha !isspace
hath ......... temporary char **
keep ......... temporary char * to omega !isspace
trimlen ...... trimmed string length
ts ........... temporary char * to string
ts ........... temporary string []
tu ........... temporary size_t
xu ........... fail safe size_t

Why don't you just your variables meaningful names, rather then having
to annotate them? It's bad enough in old C having to scroll to the top
to find declarations, without having to go even further to see what they do.
 
W

Willem

Nick Keighley wrote:
) what about blit chips? Could they beat ordinary assemble? You'd think
) doing it completely in hardware *could* be quicker

Yes.
By a constant factor.
The quickest way to do something is not having to do it.


SaSW, Willem
--
Disclaimer: I am in no way responsible for any of the statements
made in the above text. For all I know I might be
drugged or something..
No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT
 
N

Nick Keighley

Unfortunately, errno gets clobbered by any function that sees fit to
put its error there.

so? Before you carry out an operation that might effect errno then
zero it. Afterwards test if it is still zero. No standard library
function is permitted to set errno to zero.
 It's more of a portability issue, where
different functions on different systems like to put their error codes
there.  Does a malloc failure set errno to ENOMEM?  On some it does
and others it does not.

so what? Just assume it does.
 
N

Nick Keighley

Functions should NEVER crash no matter
how bad the input.

you've set yourself impossibly high standards.

void nasty (void)
{
char *s;
s = (char*)1;
trim (&s);
}

now some languages might never crash, but C isn't one of them!
 
B

Ben Bacarisse

John Kelly said:
You are thinking of C strings. There are no "bounds" to a pointer.

There are, both in theory and in practise.

When I originally asked you to say what the contract was between the
caller and the callee, this was one of the things that you could ave
specified: the code works only on such-and-such type machines. In
effect you did specify that when you said the contract is defined by
what the code does. I.e. you wrote trim for machines where it works
and, presumably, you don't care about ones where it won't.

<snip>
 
B

Ben Bacarisse

Seebs said:
I think you're talking to a wall here. Kelly's in all the world's a
VAX mode -- he's assuming a flat memory model where you can do whatever
you want to pointers, as long as you "avoid overflow", which is totally
incoherent -- but since he dismisses any platform which doesn't conform
to his /a priori/ assumptions as broken or uninteresting, it hardly
matters.

Basically, he figures that since the results he's seen so far didn't
include infinite loops, it can never happen.

The world of 64-bit integers also introduces another problem: almost
infinite loops (if you'll permit such a monstrous phrase). I don't see
the point in preventing unbounded loops when waiting for a loop to hit
PTRDIFF_MAX might take a year or two.
 
B

Ben Bacarisse

Ian Collins said:
Define symbols and (words)
exam ......... temporary char *
hast ......... temporary char * to alpha !isspace
hath ......... temporary char **
keep ......... temporary char * to omega !isspace
trimlen ...... trimmed string length
ts ........... temporary char * to string
ts ........... temporary string []
tu ........... temporary size_t
xu ........... fail safe size_t

Why don't you just your variables meaningful names, rather then having
to annotate them? It's bad enough in old C having to scroll to the
top to find declarations, without having to go even further to see
what they do.

"temporary char *" is not much help. The simplest code inspection tells
me it's a char * and the fact that it is local to function makes it
temporary. It does not help that the types are not exact (exam is, for
example, unsigned char *) and ts seems to have two types.
 
K

Keith Thompson

John Kelly said:
John Kelly said:
[...]
and one write pass?

Probably, memmove() is not equivalent to a "pass." If it's one assembly
instruction, zoom!

On most CPUs (especially modern ones), memmove() cannot be
implemented in one assembly instruction. Even if it can, that
instruction must perform a pass over the data; it's going to be O(N),
where N is the number of bytes moved. There's nothing magical about
microcode.

memmove() is likely to be faster than an equivalent C loop, but
only by a constant factor, and probably a fairly small one.

Why are you indugling in off topic rants?

I'm curious -- what led you to think that the above was either off topic
or a rant? That's a serious question. Or were you making a joke?
 
J

John Kelly

John Kelly said:
[...]
and one write pass?

Probably, memmove() is not equivalent to a "pass." If it's one assembly
instruction, zoom!

On most CPUs (especially modern ones), memmove() cannot be
implemented in one assembly instruction. Even if it can, that
instruction must perform a pass over the data; it's going to be O(N),
where N is the number of bytes moved. There's nothing magical about
microcode.

memmove() is likely to be faster than an equivalent C loop, but
only by a constant factor, and probably a fairly small one.

Why are you indugling in off topic rants?

I'm curious -- what led you to think that the above was either off topic

You seem to have a stringent view of topicality.

or a rant?

A constant factor is significant when scaled to large use. It's easy to
overlook that.
 
J

John Kelly

There are, both in theory and in practise.

I meant, a pointer can wrap to 0, and if your data has no terminating
\0, the pointer won't stop you either. You have a theoretical infinite
loop.

When I originally asked you to say what the contract was between the
caller and the callee, this was one of the things that you could ave
specified: the code works only on such-and-such type machines. In
effect you did specify that when you said the contract is defined by
what the code does. I.e. you wrote trim for machines where it works

Aside from the ptrdiff_t issue, how can it fail?

and, presumably, you don't care about ones where it won't.

Segfaulting is a caller's error, not a defect in trim(). What other
failure scenario is there?
 
K

Keith Thompson

John Kelly said:
You are thinking of C strings. There are no "bounds" to a pointer.
You can increment a pointer until it wraps back to 0, and repeat that
loop forever.

Not necessarily. Just constructing or reading a pointer outside
the bounds of any object invokes undefined behavior, whether you
dereference it or not.
If you try to read that memory, you may segfault and terminate. But
that is beside the point. My code copes with an abstract machine where
one process owns all the memory, from 0 to the highest address.

Do you have an abstract machine like that sitting on your desk?
That may not be a real world scenario, but again, that is beside the
point. My code copes with abstract near impossibilities. That's the
way a programmer should think.

You don't care about conforming to the standard, and you don't care
about real-world implementations. What's left?
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,085
Messages
2,570,597
Members
47,218
Latest member
GracieDebo

Latest Threads

Top