trim whitespace

K

Kelsey Bjarnason

[snips]

Aside from the ptrdiff_t issue, how can it fail?



Segfaulting is a caller's error, not a defect in trim(). What other
failure scenario is there?

Err... wasn't one of the key features of your trim() that if fed an
invalid pointer, it would simply march along, trimming memory?

That could very well lead to reading memory you don't own, causing a
segfault - or worse - yet be perfectly consistent with the described
operation of the function - and thus, not a caller error at all.
 
J

John Kelly

John Kelly wrote:
I'd call that user error. They should pass a pointer to a string. If
you attempt to prevent an infinite loop by choosing an upper bound, then
you have the impossibility of success for a programmer trying to trim a
string where that constraint isn't met.

That's why I use ptrdiff_t or SIZE_MAX for the limit. Strings larger
than that are not likely candidates for the purpose of the function.

We'll likely just disagree about it; I think the responsibility should
be with the programmer using the function, so that they can trim any
string using well-defined operations, such as pointer arithmetic without
overflow.

I would not drive across a bridge where the responsibility to prevent
its collapse is on the users. Not all programmers make good engineers.

Without regarding security, the possibility that 'memmove' might be
slower than one's own single read and write passes, and prevention of an
infinite loop because of a programmer error, here is another version
which calls 'memmove' multiple times and uses an arbitrary window size
of one's choice (up to the maximum value that can fit inside a
'size_t'). It'll keep trying to trim forever if no null terminator is
found, but it should allow one to trim a string many times greater than
'SIZE_MAX'.

Nice effort. But I wonder if the different types for
size_t window = 0;
int found_left = 0;

are a potential hidden defect.
 
K

Keith Thompson

John Kelly said:
That's why I use ptrdiff_t or SIZE_MAX for the limit. Strings larger
than that are not likely candidates for the purpose of the function.
[...]

What makes you think that using ptrdiff_t or SIZE_MAX for the limit
will prevent an infinite loop?

Hint: it doesn't. An infinite loop is a possible consequence of
undefined behavior.
 
S

Seebs

What makes you think that using ptrdiff_t or SIZE_MAX for the limit
will prevent an infinite loop?
Hint: it doesn't. An infinite loop is a possible consequence of
undefined behavior.

I think he's assuming that either it's okay to mess with a given address,
or doing so causes crashes. If the real world were that simple, life would
be SO much easier.

-s
 
G

Geoff

C was around in 1956?

Why do you choose that date for the invention of GIGO?

It just shows the prescience of those sage old programmers, they foresaw the
evolution of C. ;)

You guys need to lighten up. :)
 
N

Nick Keighley

Why do you choose that date for the invention of GIGO?

wikipedia has it that the first user was an instructor on a particular
IBM machine. The machine came out in 1956 (again according to
wikipedia). I was goin to say that I was pretty sure Grace Hopper used
the term.

It's very old.

"And history became Myth, and Myth became Legend"
It just shows the prescience of those sage old programmers, they foresaw the
evolution of C. ;)

You guys need to lighten up. :)

who was unlight?
 
J

John Kelly

I consider it a mistake to define any new functions with errno as their
error-reporting mechanism, unless you are the kernel or libc implementor.

Look at some higher-level libraries and how they handle error codes. Most of
them don't touch errno. zlib, BerkeleyDB, and OpenSSL all have error codes,
but they leave errno alone.

Even getaddrinfo(), as a fresh addition to libc, didn't mess with errno. It
returns an error code and provides a separate gai_strerror() to map the error
code to a string.

Probably good advice for something popular.

But I don't see many potential users around here. Mostly loud critics.
 
K

Kenny McCormack

....
To which our friend John innocently remarked:
Probably good advice for something popular.

But I don't see many potential users around here. Mostly loud critics.

You're just now figuring that out, eh?

No, you certainly won't find any users here, and you won't find any help
either. But you have to learn to enjoy CLC on its own terms. I assume
you can and will do so.

--
One of the best lines I've heard lately:

Obama could cure cancer tomorrow, and the Republicans would be
complaining that he had ruined the pharmaceutical business.

(Heard on Stephanie Miller = but the sad thing is that there is an awful lot
of direct truth in it. We've constructed an economy in which eliminating
cancer would be a horrible disaster. There are many other such examples.)
 
B

BruceS

I think he's assuming that either it's okay to mess with a given address,
or doing so causes crashes.  If the real world were that simple, life would
be SO much easier.

OK, it seems we've gone far into silly land with this, but isn't he
saying it could cause an infinite loop because trim() could go through
all of memory, wrapping from the end back to the beginning, without
ever encountering a null terminator? Ignoring all the other problems
with this idea, is it actually possible to have no 0 anywhere in
memory? I would think that as soon as any string is defined, or any
of a number of other situations occur, there's going to be one and the
trim() will eventually encounter it and terminate. Maybe I've
misunderstood his "infinite loop" situation.
 
J

John Kelly

OK, it seems we've gone far into silly land with this
Wheee!


but isn't he
saying it could cause an infinite loop because trim() could go through
all of memory, wrapping from the end back to the beginning, without
ever encountering a null terminator? Ignoring all the other problems
with this idea, is it actually possible to have no 0 anywhere in
memory?

Many things are possible.

I would think that as soon as any string is defined, or any
of a number of other situations occur, there's going to be one and the
trim() will eventually encounter it and terminate. Maybe I've
misunderstood his "infinite loop" situation.

You got it. It's likely there will be a \0 somewhere in the address
space, of ordinary machines which use ordinary operating systems.

But I'm programming a theoretical abstract machine. Who knows what the
contents of its memory might be?
 
J

John Kelly

Read John Donne's invective against zero.

Null terminated strings are not proper objects. They make program
correctness dependent on what data is given to the program.

The result of that design decision, is our 30+ year legacy of buffer
overflows and security holes.
 
S

Seebs

Probably good advice for something popular.
But I don't see many potential users around here.

Perhaps this is because, given information about how something would have
to work for people to potentially use it, you disregard it as irrelevant?

-s
 
K

Keith Thompson

John Kelly said:
Null terminated strings are not proper objects. They make program
correctness dependent on what data is given to the program.

The result of that design decision, is our 30+ year legacy of buffer
overflows and security holes.

How would you avoid that problem? There are some good arguments in
favor of counted strings, where you store an integer length followed
by the characters of the string, but what if the length is stored
incorrectly? What happens if you call a function that expects a
pointer to a counted string, passing a pointer to something that
isn't a counted string?

Abstract data types can (try to) avoid such problems, by allowing
data to be manipulated only through a defined interface, but such
data types have to be built on top of some underlying primitives,
and it's very hard to make those primitives bullet-proof.

I do not argue that C's null-terminated strings were the best
solution, merely that it's not as easy as you might assume to come
up with a substantially better solution.

How would you have done it?
 
J

John Kelly

How would you avoid that problem? There are some good arguments in
favor of counted strings, where you store an integer length followed
by the characters of the string, but what if the length is stored
incorrectly? What happens if you call a function that expects a
pointer to a counted string, passing a pointer to something that
isn't a counted string?

Abstract data types can (try to) avoid such problems, by allowing
data to be manipulated only through a defined interface, but such
data types have to be built on top of some underlying primitives,
and it's very hard to make those primitives bullet-proof.

I do not argue that C's null-terminated strings were the best
solution, merely that it's not as easy as you might assume to come
up with a substantially better solution.

How would you have done it?

I would have developed Go.
 
S

Seebs

I would have developed Go.

Thirty years ago?

I dunno, I think you might have to leave that to people who are capable
of responding to a challenge with something other than "the world is
wrong to inconvenience me".

-s
 
B

Ben Bacarisse

BruceS said:
... Ignoring all the other problems
with this idea, is it actually possible to have no 0 anywhere in
memory?

Well, either argc must == 0 or argv[0] must be a null-terminated string
giving the program name or argv[0][0] must be a null character. Of
course, there is no guarantee that this zero can be found by repeated
manipulation of an unrelated pointer.

Furthermore, only hosted implementation must obey these rules.

<snip>
 
K

Keith Thompson

Ben Bacarisse said:
BruceS said:
... Ignoring all the other problems
with this idea, is it actually possible to have no 0 anywhere in
memory?

Well, either argc must == 0 or argv[0] must be a null-terminated string
giving the program name or argv[0][0] must be a null character. Of
course, there is no guarantee that this zero can be found by repeated
manipulation of an unrelated pointer.

Ah, but all-bits-zero is only *a* representation of 0, not
necessarily the *only* representation of 0.

Suppose argc == 0, but the system uses a 1's-complement
representation and the value stored in argc is represented as
all-bits-1.

errno is 0 at program startup, but the same thing applies.

Realistically, of course, there's almost certain to be a zero byte
*somewhere* in memory -- not that it matters.
 
N

Nick Keighley

I would not drive across a bridge where the responsibility to prevent
its collapse is on the users.  

raw C isn't a suitable interface into a computer for naive users. If
you validate the user input properly you won't get non-strings passed
to your trim function. C is good for writing tight, close to the
machine code. Perhaps you should be writing in something like Python
that protects you from yourself. And it's likely that the internals of
Python are written in C.

Your approach would be to deny the bridge builders the use of welding
torches because they might burn themselves.
Not all programmers make good engineers.

and maybe such programmers shouldn't be using C...
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,083
Messages
2,570,591
Members
47,212
Latest member
RobynWiley

Latest Threads

Top