Why doesn't strrstr() exist?

  • Thread starter Christopher Benson-Manica
  • Start date
W

websnarf

Old said:
You keep going on about how "C is slow" and "it would be easy
to make it faster and safer". Now you claim that you have a
library that does make C "faster and safer".

In other messages, you've explained that by "safer", you mean
being less prone to buffer overflows and undefined behaviour.

The only way a C-like language can avoid buffer overflows
is to include a runtime bounds check.

Ok. This is where the problem is. Because you people are all bipolar,
you 1) say the only way to mitigate buffer overflow problems, is by
removing them completely, and thus change the language (so its a false
argument with a built-in response), and 2) ignore my suggestion of
presenting safe paths as a means of *directing* the ways people program
more safely by default (without removing the unsafe paths, just making
them less compelling, or unnecessary.)

Look, I am still talking about C here. I am not talking about
guarantees of no buffer overflow. I am talking about reducing their
incidence dramatically.

See how I did that? I saw two endpoints, where neither is perfect, so
I drew a line in between and picked what I thought was a good point
somewhere on that line?
Please explain how -adding- a runtime bounds check to some
code, makes it faster than the exact same code but without
the check.

Study Bstrlib for a while. Try to figure how is it *POSSIBLE* that I
have kicked the living crap out of C's performance, even though I have
safety checks crawling all over it, while presenting at least as much
functionality. Its a special technique I use that I'm thinking of
patenting; its obvious and there's lots of prior art -- but that's
never stopped the patent office from issuing them before. (I'll give
you a hint, I didn't just duplicate all the C library functions, and
add in length parameters and bounds checks.)

Here's another idea you can investigate -- why don't you take Bstrlib,
and strip out all the safety checks. Then rerun the benchmarks and
tell me how much more performance you get (you'll need a fairly
accurate timer, btw.)
 
W

websnarf

Keith said:
(e-mail address removed) wrote: [...]
If it's low-level, by definition it gives you
access to unprotected access to dangerous features of the machine
you're writing for.

So how does gets() or strtok() fit in this? Neither provides any low
level functionality that isn't available in better ways through
alternate means that are clearly safer (without being slower.)

I wouldn't put strtok() in the same category as gets(). strtok() is
ugly, but if it operates on a local copy of the string you want to
tokenize *and* if you're careful about not using it on two strings
simultaneously, it can be used safely.

You also missed the part where strtok is also laughably slow for the
most typical case of not modifying the second argument, and otherwise
really redundant with functions like strcspn, and strspn.
[...] If I were designing a new
library I wouldn't include strtok(), but it's not dangerous enough to
require dropping it from the standard.

Its one of the very few functions in C that is not reentrant. How
about at least adding gcc's strtok_r()?
[...]
[...] One of
the most dangerous features of C is that it has pointers, which is a
concept only one layer of abstraction removed from the concept of
machine addresses. Most of the "safer" high level languages provide
little or no access to machine addresses; that's part of what makes
them safer.

Ada has pointers.

Ada has pointers (it calls them access types), but it doesn't have
pointer arithmetic, at least not in the core language -- and you can
do a lot more in Ada without explicit use of pointers than you can in
C.

Interesting that you point this out. Bstrlib has "special pointers"
and delivers greater functionality for strings without resorting to
pointer arithmetic. Of course, using Bstrlib doesn't change C into a
different language.
[...] If one were to design a safer version of C (trying desperately to
keep this topical), one might want to consider providing built-in
features for some of the things that C uses pointers for, such as
passing arguments by reference and array indexing.

references is good -- its one of the things C99 should have picked up
from C++ (and not the arbitrary positioned declarations, which are
really just there for changing the order of constructors, which C
doesn't have.) Just more evidence that the ISO/ANSI C committee are
just irresponsible with respect to safety (refs are guaranteed to be
pointing to something.)

I don't exactly know what you want to do about array indexing. Simply
throwing away syntaxes like 1[arr], I would agree with (since it
doesn't add any value to the language.)

Bounds checking is too expensive on *every* array, and would take away
from certain deductive bounds checking. Perhaps you could have a
"boundschecked" keyword, that you could apply as an attribute to an
array, and the compiler could then put in checks for those arrays. But
then you have to decide what do on the check if it failed. So we have
something a little more sophisticated:

int errfn (int idx, int x[100]);
int boundschecked(errfn) x[100]; /* x[-1] => errfn(-1,x) */

So you can just exit(-1) or do whatever you want in your user defined
error function, or in fact give an interpretation for what you think
x[-1] should mean and return it. And we could have even more useful
things like:

int wrapped x[100]; /* index is take modulo 0 and 99 */
int saturated x[100]; /* index is saturated to 0 and 99 */

Just a thought.
 
W

Wojtek Lerch

Uh ... excuse me, but dividing by zero has well defined meaning in IEEE
754, and there's nothing intrinsically wrong with it (for most
numerators you'll get inf or -inf, or otherwise a NaN).

Sorry, I meant integer division by zero. Besides, standard C does not
require IEEE 754, does it?
Integer
overflow is also extremely well defined, and actually quite useful on
2s complement machines (you can do a range check with a subtract and
unsigned compare with one branch, rather than two branches.)

Extremely well devined by who? In standard C, it's undefined. For your
range check to be defined, you have to eliminate the possibility of an
overflow by using unsigned subtraction.
 
K

kuyper

It is, and you simply continue to propgate it.

Unless you were in my office watching me when I read your message, and
used a stopwatch to time how long it took me to think about it, there's
no way you can know whether my dismissal was immediate or took a
considerable amount of time. Your insistence that I it was immediate,
despite my insistence to the contrary, constitutes a claim that I was
lying (and about an extremely trivial matter, which amounts to an
assertion that I am so habitual a liar that I would bother lying about
an unimportant matter). You can politely suggest that my dismissal of
that possibilty was wrong, but there's no polite way you can suggest,
after I've claimed otherwise, that it was immediate. Do you have some
basis for rudely claiming that I'm lying?
[...] It's also not a dichotomy: low-level languages are inherently
unsafe, [...]

No. They may contain unsafe ways of using them. This says nothing
about the possibility of safe paths of usage.

Well, if you want to change the terms of discussion, you should warn
us. There are safe paths of usage for C, too (they don't involve any
use of gets(), except in contexts that are so rare and unlikely that
they don't constitute justification for continuing to retain gets() in
the standard). If a low level languages can be considered safe if
there's a safe way to use it, that pretty thoroughly undercuts your
argument that C is unsafe. Even gets() is safe: the safe way to use is
"never".

Note: by this definition, a knife containing sharp poison-soaked pins
all along it's handle is safe, because there are ways to use it safely.
Personally, I'd recommend a different definition of "safe".
Empty and irrelevant (and not really true; at least not relatively.)

You were the one who claimed the existence of a false dichotomy.
Dichotomies are by definition not fuzzy; especially the false ones -
the word derives from a Greek(?) work meaning "cut", referring to
making a sharp distinction between two different categories. The claims
that you were labelling "false dichotomies" are entirely consistent
with the fuzzy idea that higher level languages are safer than low
level languages.
So how does gets() or strtok() fit in this? Neither provides any low
level functionality that isn't available in better ways through
alternate means that are clearly safer (without being slower.)

I agree; it's the suggestion that fixing those wholes will make C safe
that I'm arguing against. As long as C retains anything remotely
resembling a pointer; in other words, as long as C continues to
remotely resemble anything like the current version of C, it will be
less than perfectly safe (except in your modified sense, in which the
fact that something that can be used safely is safe.)
[...] If it protected your access to those features, that
protection (regardless of what form it takes) would make it a
high-level language.

So you are saying C becomes a high level language as soon as you start
using something like Bstrlib (or Vstr, for example)? Are you saying

"high" -> "higher". As you say, it's all relative.
Notice that doesn't coincide with what you've said above. But it does
coincide with the false dichotomy. The low-levelledness in of itself
is not what makes it unsafe -- this just changes the severity of the
failures.

More severe failures -> more unsafe (all else being equal). If you
think a system where a particulare error causes a compilation to fail,
with a clear error message pointing to where the problem may be found,
is just as safe a system where the same error causes a nuclear bomb to
explode, that's a might peculiar way of assessing risk. On both
systems, the same error causes a failure; the only difference is the
serverity of the failure.
Ada has pointers.

I know almost nothing about Ada. But I guarantee you that in the
unlikely event that Ada is perfectly safe, it's pointers can't be exact
conceptual equivalents of C pointers.
Why? Because you assert that C represents the highest performing
language in existing?

No, I'm just saying tha by comparison with most of the other
non-assembly languages, it has a reputation for speed, not slowness.
You're characterization of it as slow comes across as mighty peculiar.
Its well known that Fortran beats C for numerical applications.

It may be well known, and there once was a lot of truth to that claim,
but it's no longer universally true. It used to be that the Fortran
compilers represented many more decades of refinement than C compilers.
However, C's been around for many decades by now, and C compilers have
caught up with, and in some cases surpassed, the competing Fortran
compilers. On many platforms, including the one I'm currently using,
the fortran compiler works by creating intermediate C code and passing
it to the C compiler.
Also,
if you take into account that assembly doesn't specify intrinsically
unsafe usages of buffers

Neither does C. Like assembler, it allows intrinsically unsafe usage.
Like assembler, C doesn't require unsafe usage.
God, what is wrong with you people? He makes an utterly unfounded
statement about portability that's not worth arguing about.

It's a statement founded in his own experience. Are you claiming he's
lying? If so, on what basis? I've personally seen C code ported to a
wider variety of platforms than he listed, so I've no reason to doubt
that he might have ported it to those particular ones. What's your
reason for doubting it?
... I make the
obvious stab to indicate that that argument should be nipped in the
bud, but you just latch onto it anyways.

Making code portable in C requires a lot of discipline, and in truth a
lot of a testing (espcially on numerics, its just a lot harder than you
might think). Its discipline that in the real world basically nobody
has. Randy is asserting that C is portable because *HE* writes C code
that is portable. And that's ridiculous, and needs little comment on
it.

No, he's asserting that C code is portable, because he's successfully
ported it. That's precisely the single most relevant assertion he could
make. If you claimed a mountain was unclimbable, and I pointed out that
I've climbed it, would that assertion be irrelevant?

Also, he's far from being the only person with that experience. C is
one of the most widely portable languages there is. I've heard claims
that Java is more widely portable, and those claims might be true, but
even if they are, they don't change the fact that C can be very
portable.
 
K

kuyper

It is, and you simply continue to propgate it.

Unless you were in my office watching me when I read your message, and
used a stopwatch to time how long it took me to think about it, there's
no way you can know whether my dismissal was immediate or took a
considerable amount of time. Your insistence that I it was immediate,
despite my insistence to the contrary, constitutes a claim that I was
lying (and about an extremely trivial matter, which amounts to an
assertion that I am so habitual a liar that I would bother lying about
an unimportant matter). You can politely suggest that my dismissal of
that possibilty was wrong, but there's no polite way you can suggest,
after I've claimed otherwise, that it was immediate. Do you have some
basis for rudely claiming that I'm lying?
[...] It's also not a dichotomy: low-level languages are inherently
unsafe, [...]

No. They may contain unsafe ways of using them. This says nothing
about the possibility of safe paths of usage.

Well, if you want to change the terms of discussion, you should warn
us. There are safe paths of usage for C, too (they don't involve any
use of gets(), except in contexts that are so rare and unlikely that
they don't constitute justification for continuing to retain gets() in
the standard). If a low level languages can be considered safe if
there's a safe way to use it, that pretty thoroughly undercuts your
argument that C is unsafe. Even gets() is safe: the safe way to use is
"never".

Note: by this definition, a knife containing sharp poison-soaked pins
all along it's handle is safe, because there are ways to use it safely.
Personally, I'd recommend a different definition of "safe".
Empty and irrelevant (and not really true; at least not relatively.)

You were the one who claimed the existence of a false dichotomy.
Dichotomies are by definition not fuzzy; especially the false ones -
the word derives from a Greek(?) work meaning "cut", referring to
making a sharp distinction between two different categories. The claims
that you were labelling "false dichotomies" are entirely consistent
with the fuzzy idea that higher level languages are safer than low
level languages.
So how does gets() or strtok() fit in this? Neither provides any low
level functionality that isn't available in better ways through
alternate means that are clearly safer (without being slower.)

I agree; it's the suggestion that fixing those wholes will make C safe
that I'm arguing against. As long as C retains anything remotely
resembling a pointer; in other words, as long as C continues to
remotely resemble anything like the current version of C, it will be
less than perfectly safe (except in your modified sense, in which the
fact that something that can be used safely is safe.)
[...] If it protected your access to those features, that
protection (regardless of what form it takes) would make it a
high-level language.

So you are saying C becomes a high level language as soon as you start
using something like Bstrlib (or Vstr, for example)? Are you saying

"high" -> "higher". As you say, it's all relative.
Notice that doesn't coincide with what you've said above. But it does
coincide with the false dichotomy. The low-levelledness in of itself
is not what makes it unsafe -- this just changes the severity of the
failures.

More severe failures -> more unsafe (all else being equal). If you
think a system where a particulare error causes a compilation to fail,
with a clear error message pointing to where the problem may be found,
is just as safe a system where the same error causes a nuclear bomb to
explode, that's a might peculiar way of assessing risk. On both
systems, the same error causes a failure; the only difference is the
serverity of the failure.
Ada has pointers.

I know almost nothing about Ada. But I guarantee you that in the
unlikely event that Ada is perfectly safe, it's pointers can't be exact
conceptual equivalents of C pointers.
Why? Because you assert that C represents the highest performing
language in existing?

No, I'm just saying tha by comparison with most of the other
non-assembly languages, it has a reputation for speed, not slowness.
You're characterization of it as slow comes across as mighty peculiar.
Its well known that Fortran beats C for numerical applications.

It may be well known, and there once was a lot of truth to that claim,
but it's no longer universally true. It used to be that the Fortran
compilers represented many more decades of refinement than C compilers.
However, C's been around for many decades by now, and C compilers have
caught up with, and in some cases surpassed, the competing Fortran
compilers. On many platforms, including the one I'm currently using,
the fortran compiler works by creating intermediate C code and passing
it to the C compiler.
Also,
if you take into account that assembly doesn't specify intrinsically
unsafe usages of buffers

Neither does C. Like assembler, it allows intrinsically unsafe usage.
Like assembler, C doesn't require unsafe usage.
God, what is wrong with you people? He makes an utterly unfounded
statement about portability that's not worth arguing about.

It's a statement founded in his own experience. Are you claiming he's
lying? If so, on what basis? I've personally seen C code ported to a
wider variety of platforms than he listed, so I've no reason to doubt
that he might have ported it to those particular ones. What's your
reason for doubting it?
... I make the
obvious stab to indicate that that argument should be nipped in the
bud, but you just latch onto it anyways.

Making code portable in C requires a lot of discipline, and in truth a
lot of a testing (espcially on numerics, its just a lot harder than you
might think). Its discipline that in the real world basically nobody
has. Randy is asserting that C is portable because *HE* writes C code
that is portable. And that's ridiculous, and needs little comment on
it.

No, he's asserting that C code is portable, because he's successfully
ported it. That's precisely the single most relevant assertion he could
make. If you claimed a mountain was unclimbable, and I pointed out that
I've climbed it, would that assertion be irrelevant?

Also, he's far from being the only person with that experience. C is
one of the most widely portable languages there is. I've heard claims
that Java is more widely portable, and those claims might be true, but
even if they are, they don't change the fact that C can be very
portable.
 
K

Keith Thompson

Keith said:
(e-mail address removed) wrote: [...]
If it's low-level, by definition it gives you
access to unprotected access to dangerous features of the machine
you're writing for.

So how does gets() or strtok() fit in this? Neither provides any low
level functionality that isn't available in better ways through
alternate means that are clearly safer (without being slower.)

I wouldn't put strtok() in the same category as gets(). strtok() is
ugly, but if it operates on a local copy of the string you want to
tokenize *and* if you're careful about not using it on two strings
simultaneously, it can be used safely.

You also missed the part where strtok is also laughably slow for the
most typical case of not modifying the second argument, and otherwise
really redundant with functions like strcspn, and strspn.

It may well be laughably slow and/or redundant; that wasn't my point.
My point is that the reasons for removing gets() from the standard
(that it can't be used safely) don't apply to strtok(). I wouldn't
particularly object to removing strtok(), perhaps replacing it with
something like strtrok_r(). It just isn't much of a concern to me
personally.

[...]
I don't exactly know what you want to do about array indexing. Simply
throwing away syntaxes like 1[arr], I would agree with (since it
doesn't add any value to the language.)

My objection to the way C defines array indexing is that it's nothing
more than a thin syntactic wrapper around pointer arithmetic. The
expression x[y] is *by definition* equivalent to *(x+y).

If I were designing C from scratch, arrays would be first-class
objects, there would be no decay from arrays to pointers, and the
indexing operator would be defined directly, not in terms of pointer
arithmetic. (1[arr] would go away as a side effect.)

This doesn't by itself imply bounds checking, but it would make it
easier to add it in a consistent way.

There's a correspondence (though not a perfect one) between control
flow constructs and data types. A loop is like an array. A block is
like a struct. An if-then-else is like a union (what some languages
call a variant record). And a pointer is like a goto. C has done a
reasonably good job of making gotos unnecessary; I would like it to
have done a better job of making pointers unnecessary.

Of course, it's far too late to do this, since it would break too much
existing code. Conceivably you could add a new array-like construct,
but then the language would have two ways to do the same thing, which
would probably be worse than the present situation.

And of course, sufficiently competent programmers can write good and
safe code even in a pointer-dependent language like C -- just as it's
possible to write good structured code with gotos.

Until I invent my time machine, I think we're just going to have to
leave C arrays the way they are.
 
W

websnarf

Unless you were in my office watching me when I read your message, and
used a stopwatch to time how long it took me to think about it, there's
no way you can know whether my dismissal was immediate or took a
considerable amount of time.

Oh I see, I'm all wrong because I used the word immediately in there
(referring to where in the post it was, and the fact that you don't
even acknowledge my position at all.)
[...] Your insistence that I it was immediate,

There are many other words I used in there other than "immediate".
Interesting that you are obsessing over that one. So if I ask you if
you've stopped beating your wife with a broomstick, will you scream at
me for suggesting that you own a broomstick?
[...] It's also not a dichotomy: low-level languages are inherently
unsafe, [...]

No. They may contain unsafe ways of using them. This says nothing
about the possibility of safe paths of usage.

Well, if you want to change the terms of discussion, you should warn
us.

I have not presented a position to the contrary of this. I'm not
changing anything, from my point of view. Its only your false
dichotomies that are preventing you from seeing that this is what I am
talking about, and that I am not actually talking about anything else.
These other tangents have not been introduced by me.
[...] There are safe paths of usage for C,

They are not obvious, and the mountains of CERT advisories suggest that
they generally are not travelled.
[...] too (they don't involve any
use of gets(), except in contexts that are so rare and unlikely that
they don't constitute justification for continuing to retain gets() in
the standard). If a low level languages can be considered safe if
there's a safe way to use it, that pretty thoroughly undercuts your
argument that C is unsafe.

Not if those paths are hidden or highly non-obvious. Or, require a
complete built up from scratch. The safety, measured in actually
resultant code which is "safe" (basically correctly implemented,
without unintended side-effects), will be primarily influenced by the
most often taken paths by the programmer to solve their problems. If
the most obvious libraries have land mines all over them, then
programmers are going to step on those landmines at some rate.

For most rational people, given that the problems are being highlight
by the mainstream press on a weekly basis, this would lead to a very
obvious question -- is it possible present an interface where the most
likely to be used paths are not nearly as dangerous as what C presents?
And if so, what are the minimum trade offs? In the case of Bstrlib,
as a substitute for C's string library, the answer is "yes you can make
such a thing, and the trade offs are none."
[...] Even gets() is safe: the safe way to use is "never".

Tell that to the ANSI/ISO C committee. According to their own
documentation they claim that gets() can be used under some unspecified
environmental assumptions.
Note: by this definition, a knife containing sharp poison-soaked pins
all along it's handle is safe, because there are ways to use it safely.
Personally, I'd recommend a different definition of "safe".

Really? Because I prefer the one that will ultimately lead to safer
code in real world production.
You were the one who claimed the existence of a false dichotomy.
Dichotomies are by definition not fuzzy; especially the false ones -
the word derives from a Greek(?) work meaning "cut", referring to
making a sharp distinction between two different categories. The claims
that you were labelling "false dichotomies" are entirely consistent
with the fuzzy idea that higher level languages are safer than low
level languages.

Reread the definition. That's not what it means.
I agree; it's the suggestion that fixing those wholes will make C safe
that I'm arguing against. As long as C retains anything remotely
resembling a pointer; in other words, as long as C continues to
remotely resemble anything like the current version of C, it will be
less than perfectly safe (except in your modified sense, in which the
fact that something that can be used safely is safe.)

And so who was arguing for perfect safety again? Please find the
applicable quote.
[...] If it protected your access to those features, that
protection (regardless of what form it takes) would make it a
high-level language.

So you are saying C becomes a high level language as soon as you start
using something like Bstrlib (or Vstr, for example)? Are you saying

"high" -> "higher". As you say, it's all relative.

At least in the case of Bstrlib, this is a ridiculous notion. No
low-level path is removed or obscured. Nothing is abstracted to any
degree in which the representation isn't known exactly. No
functionality or capability is given up, theoretical or otherwise.
Using Bstrlib, you remain at exactly the same low-leveledness as
without it, because you can do all of the exact same things you did
before. You just happen to also have the option of doing things safer
and faster.

Saying Bstrlib makes C more high level, is like saying AMD's 64 bit
instruction set makes the x86 more RISC-like because they added more
registers.
More severe failures -> more unsafe (all else being equal). If you
think a system where a particulare error causes a compilation to fail,
with a clear error message pointing to where the problem may be found,
is just as safe a system where the same error causes a nuclear bomb to
explode, that's a might peculiar way of assessing risk. On both
systems, the same error causes a failure; the only difference is the
serverity of the failure.

Ok, but this is irrelevant. You can't change the specification of C to
make it decrease the severity of UB. You can only act on the
probability of failure occurrences. Furthermore, severe errors are
*NOT* confined to low-level languages. Java has race conditions, which
are arbitrarily bad, and nobody considers Java a low-level language.
I know almost nothing about Ada. But I guarantee you that in the
unlikely event that Ada is perfectly safe,

No real language is perfectly safe. You would have to take loops out.
What the hell are you talking about? Ada is fairly safe; rather than
detailing something I only have modest familliarity with, let me just
point out that the US military up until recently used Ada exclusively
basically because it is a safe language (and they do not consider Java
any safer.)
[...] it's pointers can't be exact conceptual equivalents of C pointers.

You mean Ada is not C? That is correct.
No, I'm just saying tha by comparison with most of the other
non-assembly languages, it has a reputation for speed, not slowness.

Right, as I've posted before C has lots of "FAKE SPEED", that makes
people think its a fast language. Its like how sporty commuter cars
have spoilers or impressive air intake grills, and are really amazing
aerodynamic. Its utter nonsense, and has nothing to do with anything
about the car's performance -- but it looks cool. Same thing with the
C.
You're characterization of it as slow comes across as mighty peculiar.

I'm sure it does to you, if you've bought into the silly notion that C
is a fast language.
It may be well known, and there once was a lot of truth to that claim,
but it's no longer universally true.

Uhh ... excuse me, but it will remain true for as long as "restrict" is
not widely deployed on enough C compilers, to compell programmers to
use it.
[...] It used to be that the Fortran
compilers represented many more decades of refinement than C compilers.

The state of the art C and Fortran compilers from Intel (which win
basically every benchmark there is) uses a common backend (i.e., they
both compile to the same intermediate language, before being optimized
then translated to assembly). The C language can only keep up with the
Fortran on linear algebra stuff, if restrict is used (in which case the
two become equivalent) or some unsafe switch like "assume no aliasing"
is set for the C compiler.
However, C's been around for many decades by now, and C compilers have
caught up with, and in some cases surpassed, the competing Fortran
compilers.

Not on numerical stuff. Compiler technology has nothing to do with it.
The language is simply a barrier.
[...] On many platforms, including the one I'm currently using,
the fortran compiler works by creating intermediate C code and passing
it to the C compiler.

Oh you mean like the Absoft compiler? Go look at the polyhedron
benchmark site to see how worthless that is as a strategy for compiling
Fortran. Only marketroids from Apple, with a specific intention of
decieving people on benchmarks would ever use such a kind of compiler.
Neither does C.

Excuse me, but gets() is in the C specification, and *MUST* lead to
unsafe usage of buffers. Most C string function require implicit
assumptions about buffer lengths that are unchecked by any mechanism.
C includes necessarily non-reentrant functions like strtok(). Assembly
present you with no such weaknesses from its baseline specification.
[...] Like assembler, it allows intrinsically unsafe usage.
Like assembler, C doesn't require unsafe usage.

Unlike C, assembler does not *promote* unsafe usage. You can only
program "unsafely" in assembler, if you create the unsafe scenario,
specification, and semantics from the group up.
 
W

websnarf

Wojtek said:
Sorry, I meant integer division by zero. Besides, standard C does not
require IEEE 754, does it?

I have no idea, I don't use platforms that are not IEEE 754 compliant.
I'm just latching onto what you think is or is not a good idea.
Remember IEEE 754 is a specification as well, and unlike the C
specificiation, its generally completely adhered to, and is not
littered with UB.
Extremely well defined by who? In standard C, it's undefined.

I said "on 2s completement machines". Please read all the words.
 
W

Wojtek Lerch

I have no idea, I don't use platforms that are not IEEE 754 compliant.

A lot of people never use platforms that are not Intel. But when you're
teaching a programming language, it's important to distinguish between
promises made by the language standard and those made by some other
standards or hardware or software vendors.
I'm just latching onto what you think is or is not a good idea.
Remember IEEE 754 is a specification as well, and unlike the C
specificiation, its generally completely adhered to, and is not
littered with UB.

Perhaps. But, like I said, I meant integer division by zero.
I said "on 2s completement machines". Please read all the words.

I know what you said. But in standard C, signed integer overflow is
undefined behaviour, no matter whether a machine is 2s complement or not.
 
D

Dave Thompson

On the other hand, I don't think it would be unreasonable for the Standard
to officially declare gets() as obsolescent in the "Future library
directions" chapter.
Though only 1 of 5 'obsolescent's in C90 actually went away in C99.

<G>

- David.Thompson1 at worldnet.att.net
 
D

Dave Thompson

Just to correct the misinformation, there's no reason a low level I/O
driver library couldn't be written in Ada. The language was designed
for embedded systems. Ada can do all the unsafe low-level stuff C can
do; it just isn't the default.

It's certainly possible in the language, which was designed for the
gamut from embedded to data-processing -- it even has some things
arguably lower than C like rep specs. But it might not be convenient
in a particular situation, for example if the driver interface is
defined in C using features not easily translated automatically to Ada
so you may have to manually redo or even recode the bindings on every
release, which might be quite frequent. While it would still be
possible it might not offer enough benefit to justify the cost.

- David.Thompson1 at worldnet.att.net
 
F

Flash Gordon

Wojtek said:
A lot of people never use platforms that are not Intel. But when you're
teaching a programming language, it's important to distinguish between
promises made by the language standard and those made by some other
standards or hardware or software vendors.

Especially as there are processors in your today with *no* floating
point hardware that are programmed in C. I know because I used to work
on one in C. I've no idea whether they implemented IEE 754 in software,
but I hope that what they implemented used all they hacks possible (that
don't break complience with the C standard) to get it running as fast as
possible.

IEEE 754 is probably only adhered to by a subset fo C implementations.
Perhaps. But, like I said, I meant integer division by zero.



I know what you said. But in standard C, signed integer overflow is
undefined behaviour, no matter whether a machine is 2s complement or not.

I would also point out that there is 2s complement hardware in use today
that can be told to *limit* on overflow instead of wrapping. I don't
know off the top of my head whether the C compiler I was using can use
the hardware in that way, but it is definitely possible and could be
extremely useful.
 
K

kuyper

Wojtek Lerch wrote: ....

I have no idea, I don't use platforms that are not IEEE 754 compliant.

C is very intentionally not so restricted.
I'm just latching onto what you think is or is not a good idea.
Remember IEEE 754 is a specification as well, and unlike the C
specificiation, its generally completely adhered to, and is not
littered with UB.
From what I've read in this newsgroup, obscure violations of IEEE 754
are actually pretty common. When you consider that the C standard is a
lot more complicated than IEEE 754, the degree to which compilers
comply with C90 is pretty high. Compliance with the new features of C99
isn't as good as I'd like, but they represent only a small fraction of
the entire language.
I said "on 2s completement machines". Please read all the words.

Given the way you constructed your sentence, the phrase "on 2s
complement machines" only qualifies "actually quite useful". If you'd
intended it to apply to "extremely well defined" as well, you should
have constructed the sentence differently. A comma after "useful" would
be the simplest fix.
 
K

kuyper

Keith said:
....
I wouldn't put strtok() in the same category as gets(). strtok() is
ugly, but if it operates on a local copy of the string you want to
tokenize *and* if you're careful about not using it on two strings
simultaneously, it can be used safely. If I were designing a new
library I wouldn't include strtok(), but it's not dangerous enough to
require dropping it from the standard.

I certainly agree that deprecating->removing gets() is a higher
priority, but ensuring that strtok() is not used on two different
strings simultaneously is trickier than it sounds. A library that I'm
currently responsible for, but which was designed and written by
someone else, contained a function which calls strtok(). A user of the
library called that function while his own code was in the middle of
using strtok(). Very confusing! I removed all use of strtok() from the
library completely, which wasn't difficult.
 
K

kuyper

Keith said:
....
I wouldn't put strtok() in the same category as gets(). strtok() is
ugly, but if it operates on a local copy of the string you want to
tokenize *and* if you're careful about not using it on two strings
simultaneously, it can be used safely. If I were designing a new
library I wouldn't include strtok(), but it's not dangerous enough to
require dropping it from the standard.

I certainly agree that deprecating->removing gets() is a higher
priority, but ensuring that strtok() is not used on two different
strings simultaneously is trickier than it sounds. A library that I'm
currently responsible for, but which was designed and written by
someone else, contained a function which calls strtok(). A user of the
library called that function while his own code was in the middle of
using strtok(). Very confusing! I removed all use of strtok() from the
library completely, which wasn't difficult.
 
M

Michael Wojcik

A library that I'm
currently responsible for, but which was designed and written by
someone else, contained a function which calls strtok(). A user of the
library called that function while his own code was in the middle of
using strtok(). Very confusing! I removed all use of strtok() from the
library completely, which wasn't difficult.

For that reason, when I'm writing a library, I generally try to avoid
as much as possible the functions that the standard prohibits the
implementation itself from (observably) calling ("The implementation
shall behave as if no library function calls the XXX function"):

getenv
localeconv
mblen
mbtowc
rand
setlocale
signal
srand
strerror
strtok
tmpnam
wctomb

(I think that's the whole C99 list.) Sometimes one of these (eg
getenv) is difficult to work around, even impossible for portable
code if that functionality is required by the specifications for
the library I'm writing; but as you noted regarding strtok, others
are easy to dispense with.

Of course this list isn't comprehensive - there are other library
functions with side effects, and many outside the standard - but
it's a start.
 
A

Antoine Leca

En said:
Its one of the very few functions in C that is not reentrant.

I find the actual number of "few" to be far too high. For example, I can
count 12 occurences of the "shall behave as if no library function calls"
moniker.
How about at least adding gcc's strtok_r()?

Last time I had a look at GCC there were no strtok_r() in the C compiler.
And a freestanding compiler is NOT the place I would look at to find it.


Antoine
 
K

Keith Thompson

Flash Gordon said:
I would also point out that there is 2s complement hardware in use
today that can be told to *limit* on overflow instead of wrapping. I
don't know off the top of my head whether the C compiler I was using
can use the hardware in that way, but it is definitely possible and
could be extremely useful.

That raises another interesting point. Even if we could assume that
all implementations use two's-complement, mandating the usual
wraparound on overflow would preclude the possibility of a checking
implementation that treats overflow as an error.
 
K

kuyper

Oh I see, I'm all wrong because I used the word immediately in there
(referring to where in the post it was, and the fact that you don't
even acknowledge my position at all.)

I don't acknowledge that your position is valid, because I don't
consider that to be the case. I explained why. What more
acknowledgement than that did I owe to a position I consider to be
wrong?
[...] Your insistence that I it was immediate,

There are many other words I used in there other than "immediate".
Interesting that you are obsessing over that one. So if I ask you if
you've stopped beating your wife with a broomstick, will you scream at
me for suggesting that you own a broomstick?

No, because there's nothing insulting in that part of the suggestion.
If I had said in a previous message that I didn't own a broomstick, and
if your follow up message had referred to "your broomstick" rather than
"a broomstick", then your follow-up would have implied that I was
lying. I would have objected to that. Objecting to the rest of the
accusation would have been a priority, as being far more serious, but I
would have eventually gotten around to objecting to the implicit
accusation that I was lying.
[...] There are safe paths of usage for C,

They are not obvious, and the mountains of CERT advisories suggest that
they generally are not travelled.

Well, the safe paths of usage for assembler are equally inobvious, if
not more so, and the same can be said of any low-level language, which
is the point we've been trying to make to you. The mountains of CERT
advisories are at least equally a consequence of the popularity of C,
which means that there are more opportunities for errors to be made in
C.
[...] Even gets() is safe: the safe way to use is "never".

Tell that to the ANSI/ISO C committee. According to their own
documentation they claim that gets() can be used under some unspecified
environmental assumptions.

I'm in perfect agreement with your objections to gets(). It's the
suggestion that a few changs to the library (specifically, the
incorporation of your alternative library) would be sufficient to make
it safe that is bizarre.
Really? Because I prefer the one that will ultimately lead to safer
code in real world production.

"Has safe paths of usage" doesn't meet that requirement, because the
language can be arbitrarily difficult to use safely, and still possess
safe paths of usage.
Reread the definition. That's not what it means.

When I asked www.ask.com what the definition of the word "dichotomy"
was, the top ten web sites listed all used one or the other of the
following two definitions (with exactly identical wording!):

"Being twofold; a classification into two opposed parts or subclasses"

"Division into two; especially, the division of a class into two
subclasses opposed to each other by contradiction, as the division of
the term man into white and not white."

Note the use of the word "opposed", implying a sharp distinction with
no middle ground. The word "contradiction" reinforces that meaning. The
phrase "false dichotomy" directly addresses that sharpness; it says
that the distinction is not sharp, and that it's incorrect to treat it
as if it were.

....
And so who was arguing for perfect safety again? Please find the
applicable quote.

I've reviewed your previous posts, and I concede that you've not argued
for perfect safety; you merely seem unrealistically optimistic about
the possibilities for radical improvement in safety. Yes, improvement
is possible, and removing gets() is a small step in that direction; but
as long as programmers get angry at compiler vendors for failing to
support legacy code that uses gets(), removing it from the standard
won't have much real-world effect. Real, significant improvements in
safety will only be achieved by removing access to the ability to use
low-level C constructs. And then it won't be C any more.

....
... Furthermore, severe errors are
*NOT* confined to low-level languages. ...

We never suggested that they were, We only pointed out that the
liklihood and severity of errors is higher in low level languages.

....
I know almost nothing about Ada. But I guarantee you that in the
unlikely event that Ada is perfectly safe,
....
... Ada is fairly safe; ....
[...] it's pointers can't be exact conceptual equivalents of C pointers.

Which, I gather from other poster's messages, is in fact the case.
You mean Ada is not C? That is correct.

Well, I was being more general than that, but the more specific
statement you're attributing to me can be interpreted in a way that
makes it a special case of the more general statement I actually made.

The fact that Ada's pointers are quite different than C's pointers is
part of the reason it's possible for Ada to be safer. Any language that
was radically safer than C couldn't look sufficiently similar to C to
justify inheriting the name.

....
I'm sure it does to you, if you've bought into the silly notion that C
is a fast language.

Until you've identified a truly faster high-level language, I'll stick
with that notion. It's umambiguously faster than any of the languages
that are suitable for use on my current project; your options may be
different than mine.
Excuse me, but gets() is in the C specification, and *MUST* lead to
unsafe usage of buffers.

Only if it's actually used. I'm unaware of any part of the C standard
that specifies that you must use gets(). If we're using your "safe
usage path" criterion for safety, the fact that C provides safe
alternatives to gets() is sufficient to ensure that C is safe.
... Most C string function require implicit
assumptions about buffer lengths that are unchecked by any mechanism.
C includes necessarily non-reentrant functions like strtok().

C standard libraries must contain strtok(). To the best of my
knowledge, the C standard doesn't require your program to actually call
strtok(). By your safety criteria, the fact that there is a safe usage
path for C (one that avoids calling strtok()) is sufficient to make C
safe. It doesn't make strtok() safe, but then I never claimed that it
was. C is safe, by your "safe usage path" criterion; by saner criteria,
its a fairly dangerous language.
... Assembly
present you with no such weaknesses from its baseline specification.

Calling gets() and strtok() in C is a misuse of C; writing code that
performs the equivalent functionality is possible in every assembly
language that I'm familiar with, and constitutes a misuse of assembly.
Of course, my familiarity with assembler is limited to three different
assembly languages for three radically different platforms. It could
very well be that in most modern assembly languages, it's impossible to
write gets() or strtok() - but I'd be very curious to see how that was
achieved.
[...] Like assembler, it allows intrinsically unsafe usage.
Like assembler, C doesn't require unsafe usage.

Unlike C, assembler does not *promote* unsafe usage. You can only
program "unsafely" in assembler, if you create the unsafe scenario,
specification, and semantics from the group up.

Neither does the C language. I don't remember any part of the C
standard that promotes use of gets() and strtok(); they're merely
presented as things you can do. The fact that it's impossible, and
difficult, respectively, to avoid undefined behavior when calling those
routines is something that the standard fails to warn you about, but
the standard is not about telling you the right way to program, that's
what textbooks are for.
 
K

kuyper

Randy said:
(e-mail address removed) wrote
(in article
<[email protected]>): .... ....
Unportable? You have got to be kidding. I must be
hallucinating when I see my C source compiled and executing on
Windows, Linux, NetWare, OS X, Solaris, *bsd, and a host of
other UNIX-like platforms, on x86, x86-64, PPC, Sparc, etc.

I've been thinking about this, and there's at least two very different
concepts of portability that might be relevant here, and websnarf is
probably using a different one than you and I.

Most C code is unportable, for one reason or another, and I think
that's what websnarf is thinking of. However, paradoxically, that fact
is directly related to the fact that C is one of the best languages
available for writing code that needs to be portable, which is what I
was thinking of (and I presume you, as well).

The C standard specifies that the behavior of a great many programs is
either implementation-defined or undefined, and often doesn't specify
the behavior at all. The implementation-defined behavior can be chosen
in a manner that's optimal for each implementation. The fact that
construct X makes the behavior undefined allows a vendor to implement
the behavior of other constructs without having to worry about the
possibility that construct X might exist, which allows for more
efficient implementation. If the fact that the behavior of X was
undefined were a random occurance, it would be pretty unlikely for it
to allow a significant performance improvement. However, it's not
random: in many cases the decision was make to make the behavior
undefined, for the express purpose of allowing more efficient
implementation. Finally, undefined and implementation-defined behavior
is the basis for implementation-specific extensions to C that make the
implementation more efficient and useful to those who want or need to
be able to write unportable C code. "wanting" is lot more common than
actual "needing", but there are legitimate reasons for needing to write
unportable code.


The net result is that it's possible to produce a conforming
implementation of C that is efficient and useful enough to be
profitable, on a wider variety of platforms than would be possible if
the C standard imposed stricter requirements. As a result, you can
count on the presence of a implementation that will accept, translate,
and correctly execute your code on a wider variety of platform than
most other languages.

Of course, that's only true if you're careful enough to avoid writing
unportable C code. I'll concede that it's tricky to write
widely-portable C code, but it's certainly not impossible, or even
unacceptably difficult, to do so.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,169
Messages
2,570,920
Members
47,462
Latest member
ChanaLipsc

Latest Threads

Top