strcmp but with '\n' as the terrminator

R

Richard Heathfield

Paul Hsieh wrote:

Well, I think the first thing is to realize that the C library is just
pure
digital diarrhea, especially for strings.

Please do not present your opinions, however dearly held, as if they are
facts.
The implicit requirement to
scan for the end of the string implicit in most of the string library
belies is propensity for being slow, a haven for buffer overflows, and
generally just the wrong set of primitives for string manipulation.

And yet a goodly number of C programmers manage perfectly well with
null-terminated strings in their fast, well-written code. Don't assume your
own experience is universal, and don't blame the library for /your/ buffer
overflows. If you don't like C, and you clearly don't, then why not just
use something else instead?

<snip>
 
M

Malcolm

Paul Hsieh said:
Well, I think the first thing is to realize that the C library is just pure
digital diarrhea, especially for strings. The implicit requirement to scan for
the end of the string implicit in most of the string library belies is
propensity for being slow, a haven for buffer overflows, and generally
just > the wrong set of primitives for string manipulation.The functions are easy to implement, which is often important. Usually
performance in string manipulation isn't too important, since a string is
usually either input or output, and IO overhead is so large that a bit of
processing inefficiency isn't noticeable.
Finally, for better or for worse C has built in support for NUL-terminated
strings.
If your whole purpose is just to read the string off the disk as fast as
possible, then parse later, then I would recommend trying out my string
library.
I'm sure your string library is well-written, is an improvement over the C
string library, deserves to be accepted as an ANSI standard, etc.
However the problem is that it hasn't yet gained wide acceptance, so anyone
trying to understand how the code works first has to read and understand
your library documentation.
I won't say "don't use it", it might well be an advantage, particularly for
a large string-intensive project. However the issue isn't a clear-cut as you
seem to suggest.
 
K

Kevin Easton

Paul Hsieh said:
If your whole purpose is just to read the string off the disk as fast as
possible, then parse later, then I would recommend trying out my string
library (http://bstring.sourceforge.net) as it should lead to a much
simpler solution:

#include "bstrlib.h"
#include "bstraux.h"

bstring b = bread ((bNread) fread, fptr); /* Read the whole file */

bNRead is defined thus in bstrlib.h:

typedef size_t (* bNread) (void *buff, size_t elsize, size_t nelem,
void *parm);

....and bread in part:

bstring bread (bNread readPtr, void * parm) {
/* ... */
l = readPtr ((void *) (buff->data + i), 1, n - i, parm);

When you make the bread() call in your example, bread has undefined
behaviour here, because it calls the fread function through an
incorrectly-typed function pointer. bread is passing a void *, but
fread is prototyped as accepting a FILE * as that formal parameter, and
the incorrectly-typed function pointer means there is no conversion.

To do this correctly, you must define an intermediate function:

#include <stdio.h>
#include "bstrlib.h"
#include "bstraux.h"

size_t bfread (void *buff, size_t elsize, size_t nelem, void *parm)
{
return fread(buff, elsize, nelem, parm);
}

....and now the call:

bstring b = bread (bfread, fptr); /* Read the whole file */

is OK (the conversion from FILE * to void * happens at the call to
bread, coerced by the bread prototype, and the conversion from void *
back to FILE * occurs at the call to fread, coerced by the fread
prototype).

Perhaps you should include this wrapper function for fread in your
library, since reading from FILE * is likely to be quite common?

I think you should also reflect on the fact that even the creator of
bstring can't seem to post simple examples using it that don't have
errors.

- Kevin.
 
P

Paul Hsieh

kevin@-nospam- said:
Paul Hsieh said:
If your whole purpose is just to read the string off the disk as fast as
possible, then parse later, then I would recommend trying out my string
library (http://bstring.sourceforge.net) as it should lead to a much
simpler solution:

#include "bstrlib.h"
#include "bstraux.h"

bstring b = bread ((bNread) fread, fptr); /* Read the whole file */

bNRead is defined thus in bstrlib.h: [...]

Yes, I am aware of this issue; I have made a note of this in my documentation.
I am blatantly recommending that people break the ANSI rules for this. BTW,
can you name me at least one platform where this actually ends up being an
issue? I would like to make a note of it in my documentation, but I don't seem
to have ever encountered a platform which implemented pointers for one type
different from pointers of another type. In fact I am suspicious as to whether
such a platform exists.
[...] Perhaps you should include this wrapper function for fread in your
library, since reading from FILE * is likely to be quite common?

A possibility, but then it would force your program to link with file
manipulation functions (or at least fgetc and fread.)

I know this is difficult to understand, but bstring is a *STRING LIBRARY*. It
is *NOT* a file library, and makes absolutely no impositions on the
implementation of file streams whatsoever, while still being able to use them.

I.e., if someone decided that C's file functions are worthless in the same way
that I decided that C's string library functions are worthless, and used the
same philosophy that I did with bstring, then that library should be able to
work together with my library with no issue (even without having awareness of
the existence of bstrlib). I.e., neither of us would have to go through hoops
to interoperate, since we would have both exposed C-library sematic compatible
mechanisms.

Same thing is true of regexp's or other parsing libraries -- bstring will work
well with them.
I think you should also reflect on the fact that even the creator of
bstring can't seem to post simple examples using it that don't have
errors.

Apparently that's par for the course for source code posted here. Obviously,
the original should download the documentation and read it before using the
bstring library where this function prototype coercion ANSI problem is
explained.
 
P

Paul Hsieh

The functions are easy to implement, which is often important.

And very difficult to implement for speed. I've optimized strlen () by itself
several times based on various ideas I've had or which have been given to me.
I can beat the strlen performance of nearly every C library written by a huge
margin, and still its an embarrasement as being necessarily O(n), when there is
no sensible reason not to be O(1).
[...] Usually
performance in string manipulation isn't too important, since a string is
usually either input or output, and IO overhead is so large that a bit of
processing inefficiency isn't noticeable.

You obviously haven't been introduced to the world of XML, HTML, or ASN1. You
probably have never considered how to implement a fast and space efficient
spell checker, text editor, or database either. Of course you could just
concede that C is the wrong tool for those jobs ...
Finally, for better or for worse C has built in support for NUL-terminated
strings.


I'm sure your string library is well-written, is an improvement over the C
string library, deserves to be accepted as an ANSI standard, etc.

Well, I don't quite see things that way. The C ANSI committee are the group of
people who did *NOT* deprecate gets() and added C++ namespace conflicting
complex numbers which is suitable for numerical computationalist, and worthless
to number theorists (i.e., no complex integers) when they had the opportunity
in the C99 Spec. Whether or not my library is endorsed or considered by them
.... I mean these people are totally irrational, what motivation would or should
I have to submit my library to the ANSI C committee?

And you don't have to *speculate* as to whether or not its well written or not;
the source is fairly small, you can look at it yourself.
However the problem is that it hasn't yet gained wide acceptance, so anyone
trying to understand how the code works first has to read and understand
your library documentation.

The library is only 7 months old, there is a lot of competition from other
string libraries out there and apparently I'm not much of an advertiser. This
seems like a poor rationale for deciding whether or not an extension should be
added to the C standard -- and it clearly was not used for adding floating
point complex numbers.
I won't say "don't use it", it might well be an advantage, particularly for
a large string-intensive project. However the issue isn't a clear-cut as you
seem to suggest.

You cannot buffer overflow with my library unless you are trying really really
hard to do so. With the C library its fairly difficult *NOT* to buffer
overflow. I.e., I would claim that using my library is suitable for *ANY*
amount of string manipulation, if for no other reason than to mitigate the
buffer overflow problem that goes hand in hand with the C library's string
functions.

Squashing this bug alone would probably save Microsoft alone millions in
development costs.
 
R

Richard Heathfield

James said:
Please give examples, because...

http://www.and.org/vstr/security.html#reason

...doesn't agree with you at all, and it does provide examples where very
competent people didn't manage it.

On the contrary, the page doesn't disagree with me at all. It advocates good
practice with respect to buffer management, and I certainly agree with
that. It also points out the limitations of fixed size buffers, and I agree
there too.

As for null-terminated strings, why, the page doesn't even mention the term.

I think you've misunderstood the intent of the author of that page. Why
don't you ask him what he really meant to say? ;-)
 
M

Malcolm

Paul Hsieh said:
You obviously haven't been introduced to the world of XML, HTML, or
ASN1. You probably have never considered how to implement a fast and
space efficient spell checker, text editor, or database either. Of course you
could just concede that C is the wrong tool for those jobs ...
I'm a games programmer so I don't generally do text-intensive apps.
However I use Internet Explorer. With a dial-up connection such I have at
home it often takes about two seconds for a page to load. With such an
overhead, no amount of efficiency in the string library is going to make any
noticeable difference.
With a spell checker, the problem would seem to be searching the dictionary.
I don't see how avoiding NUL-terminated strings is going to make a vast
improvement.
I have written a text editor - it was an assignment. I stored the text as a
linked list of lines, and performance was fine. If you try to store
everything as one ASCIIZ string then you are admittedly asking for trouble.
I have been warned "never to bullshit your way into a database job" so I'll
withhold comment on this.
... I mean these people are totally irrational, what motivation would or
should I have to submit my library to the ANSI C committee?
Because then everyone would use your library, you would be famous, and you
could write a book "Notes on Using the Extended String Library" and make a
nice amount of money. As happened with the Standard Template Library.
the buffer overflow problem that goes hand in hand with the C library's
string functions.

Squashing this bug alone would probably save Microsoft alone millions in
development costs.
If you program in C you have to get used to the fact that arrays don't have
bounds checking.
It is also tempting to write code that uses fixed "big enough" buffers. I
will often do this for an internal tool. Since the program is meant for
internal use only, no-one is going to try to find weaknesses in it to
exploit.
In practise I haven't found string buffer overflows to be much of a problem.
 
K

Kevin Easton

Paul Hsieh said:
kevin@-nospam- said:
Paul Hsieh said:
If your whole purpose is just to read the string off the disk as fast as
possible, then parse later, then I would recommend trying out my string
library (http://bstring.sourceforge.net) as it should lead to a much
simpler solution:

#include "bstrlib.h"
#include "bstraux.h"

bstring b = bread ((bNread) fread, fptr); /* Read the whole file */

bNRead is defined thus in bstrlib.h: [...]

Yes, I am aware of this issue; I have made a note of this in my documentation.
I am blatantly recommending that people break the ANSI rules for this. BTW,
can you name me at least one platform where this actually ends up being an
issue? I would like to make a note of it in my documentation, but I don't
seem to have ever encountered a platform which implemented pointers for one
type different from pointers of another type. In fact I am suspicious as
to whether such a platform exists.

The Data General Eclipse had different representations for word and byte
pointers - converting from one to the other required a shift and mask,
so accessing one as the other without conversion would lead to
unpredictable results.

Since you ask about any kind of pointers, not just pointers to object
types, there are several platforms where function pointers are larger
(in some cases, *much* larger) than object pointers - this is why
conversions between function pointer and object pointer types aren't
defined.

Anyway, once upon a time all the world was a VAX - I don't plan on
repeating the mistakes of the past. Why should people write this
erroneous code, when there is a simple, correct alternative?
[...] Perhaps you should include this wrapper function for fread in your
library, since reading from FILE * is likely to be quite common?

A possibility, but then it would force your program to link with file
manipulation functions (or at least fgetc and fread.)

OK - you'd like it to be portable to non-hosted implementations. Fair
enough. Since the requisite function is a one-liner anyway, you can
just include it in the documentation for people to copy and paste if
they need it.

- Kevin.
 
C

Chris Torek

The Data General Eclipse had different representations for word and byte
pointers - converting from one to the other required a shift and mask ...

Just a shift, actually -- the bits are carefully arranged with ring
and segment numbers offset by one bit, so that a one-bit shift
serves to convert one into the other. The "word" is a 16-bit word,
so a byte pointer has one extra low-order bit that must be introduced
or discarded as necessary. The top bit of a word pointer is a
special "indirect" bit that is not used in C at all (so it can be
discarded without loss of information).
Anyway, once upon a time all the world was a VAX - I don't plan on
repeating the mistakes of the past.

Indeed, we see this happening today with the introduction of 64-bit
architectures. All of the "ILP32 vs LP64" items that were posted
a short while ago are wonderful examples of assuming "all the
world's an i386 or other 32-bit, byte-oriented processor". The C
language proper does not assume this, and if you (the generic "you")
also avoid assuming it, your code will work on both ILP32 *and*
LP64 machines, with no source-level changes required.

(As usual, those who do not learn from history are doomed to repeat
it. :) )
 
R

Richard Heathfield

James said:
Hmm, maybe I misunderstood then. It seemed like you were saying that
using the plain string.h functions is often a good solution to string
related problems in C.

No, they are a fairly decent basis for normal programming situations. We've
all encountered situations that they don't match up to, and we've all
written code to work around those situations. But if I've got a buffer
that's yay big, and a null-terminated string that's no bigger than yay big
minus one, and I need to copy the string into the buffer, strcpy works for
me every time.
It does at...

http://www.and.org/vstr/security.html#io

...but the words null-terminated aren't used.

Well, that would explain why I couldn't find them. :)

If the null-terminated string model is not appropriate for your data, then
obviously you have to use something else (and in fact my own "string"
library completely ignores null characters, treating '\0' as just another
value).
The problem is that he never seems to argue with what I say ;)

Perhaps he's killfiled you? <g,d&r>
 
D

Dave Vandervies

(e-mail address removed) says...

By what standards can you say any of that? Buffer overflows are the #1
occurring bug, and the vast majority of them occurr in the C string library.

Accidents are the #1 cause of death for people under the age of 35 in
the United States[1][2], and the vast majority of them are motor vehicle
accidents[3]. Does that mean we should stop using motor vehicles?
As to being fast -- that's impossible unless the functions are absolutely
trivial. The C-library basically imposes an additional minimum O(n) on all
non-trivial string manipulations.

Can you give an example of a nontrivial string manipulation that doesn't
already have O(n) time? I strongly suspect that anything you can come up
with could be done with the C string library with no additional overhead.

C still exposes the best core
speed for someone willing to work around the compiler and pretty much the only
useful language with inline assembly language. So I am stuck with it.

Really? If you go right to assembly, you stop having to work around
the compiler (since there's no longer a compiler to work around),
and claiming that it doesn't give you inline assembly would quickly
(and rightly) be dismissed as quibbling over details.


dave

[1] http://www.cdc.gov/nchs/fastats/pdf/nvsr49_11tb1.pdf
This is a breakdown of causes of death by age, race, and sex.
The first set of detailed breakdowns is all races, both sexes, by age.
Accidents are highest up to age 34 and second-highest for 35-44.

[2] The US being the first country that Google returned results for

[3] http://www.cdc.gov/nchs/fastats/pdf/nvsr49_11.pdf
Perhaps not "vast majority", but enough to get the top ranking in
the subdivision of accidental deaths.
 
K

Kevin Easton

Dave Vandervies said:
Can you give an example of a nontrivial string manipulation that doesn't
already have O(n) time? I strongly suspect that anything you can come up
with could be done with the C string library with no additional overhead.

Concatenation of a string of length m with a string of length n is
O(n+m) using strcat, but O(m) if you use a string type that has its
length explicitly stored, rather than indicated by a sentinel.

In most other cases you will end up doing a copy of the string that you
need to find the length of, so the overall time complexity doesn't
change by avoiding the scan to find string length - but the constant
factors can often be reduced quite significantly (consider an operation
like search-and-replace).

- Kevin.
 
K

Kevin Easton

Mark McIntyre said:
Sure, but the difference between O(m+n) and O(m) is negligible for any
realistic n,m associated with strings.

Consider repeated concatenation of strings onto a destination - if we
concatenate 20 strings, each character in the original buffer is
inspected at least 20 times, each character of the second string at
least 19 times, ...

- Kevin.
 
K

Kevin Easton

Richard Heathfield said:
Kevin said:
Concatenation of a string of length m with a string of length n is
O(n+m) using strcat, but O(m) if you use a string type that has its
length explicitly stored, rather than indicated by a sentinel.

Using no additional overhead [1], remember how many bytes you've copied
using strcat, and offset the dest pointer by that many on the next copy.

[1] in comparison to the "store the length" method.

strcat doesn't tell you how many bytes you've copied. If you know the
length of the destination string to start with, you'd just use memcpy
anyway (which is what the stored-length method comes down to).
This can all be managed perfectly satisfactorily using C strings and
temporary variables.

Satisfactorily, yes. But it's undeniably more efficient to keep the
lengths of the pertinent strings around and use directed copies rather
than scan-for-sentinel copies, which means you end up not using a fair
number of the standard C string functions.

I'm not saying the difference is incredibly noticeable in all or even
many cases, but sometimes it is.

- Kevin.
 
J

James Antill

This can be trivially avoided by using sprintf instead of strcat :)

Errm, what?

Say you have...

List *scan = NULL;
char buf[4096]; /* we "know" this is long enough */

buf[0] = 0;
scan = beg;
while (scan)
{
strcat(buf, scan->data);
scan = scan->next;
}

....how does sprintf() help? Ok, so you can do something like...

ptr = buf;
while (scan)
{
ptr += sprintf(ptr, "%s", scan->data); /* assume sprintf() has an ISO
* return value*/
scan = scan->next;
}

....but then you might as well just do...

ptr = buf;
while (scan)
{
size_t len = strlen(scan->data);

memcpy(ptr, scan->data, len);
ptr += len;

scan = scan->next;
}

....and after you do that more than once you realize that you want...

char *my_stpcpy(char *dst, const char *src)
{
size_t len = strlen(src);

memcpy(dst, src, len);
dst += len;

return (dst);
}

ptr = buf;
while (scan)
{
ptr = stpcpy(ptr, scan->data);
scan = scan->next;
}

....at which point you've just _reinvented the wheel_ for about the
millionth time, creating your own clumsy string API. All because the c
library string APIs are deficient ... which is pretty much what was
argued.
 
J

James Antill

Kevin said:
Concatenation of a string of length m with a string of length n is
O(n+m) using strcat, but O(m) if you use a string type that has its
length explicitly stored, rather than indicated by a sentinel.

Using no additional overhead [1], remember how many bytes you've copied
using strcat, and offset the dest pointer by that many on the next copy.

So you create a local stpcpy(), strconcat() and etc. or varients thereof
that take a pointer to a length and the begining of the c style string.
Then you just have to deal with all the problems of using a string API
that limits the length of data:

But this says nothing about how good or bad the C-library string API is.

memcpy: Useful, but requires the programer to keep track of metadata for
dst.
memmove: Same as memcpy.
strcpy: Most commonly used for buffer overflows, as with all the str*
functions to create data the two inputs cannot be the same.
strncpy: Most broken interface ever
strcat: O(n)
strncat: O(n) Plus dst must be a valid NIL terminated c style string
memcmp: Useful, but requires the programer to keep track of metadata for
both arguments and properly merge them (you can "fix" having to
merge the metadata by using strncpy() but I wouldn't recommend
this).
strcmp: Useful, assuming you have valid c style strings.
strcoll: Same as strcmp
strncmp: Same as memcmp
strxfrm: Can be used as a non-broken strncpy() if you don't mind confusing
everyone (and you don't use LC_COLLATE).
memchr: Same as memcpy
strchr, strcspn, strpbrk, strrchr, strspn, strstr, strlen: Same as strcmp
strtok: Often used badly, destroys it's input ... sometimes even horribly
abused as a side band parameter to functions.
memset: Same as memcpy

So there are some useful functions for dealing with C style strings that
exist, but as I've said the only sane way to create those strings is to
abuse strxfrm() or write your own using memcpy()/memmove().
And then after you've created those functions so you can move data to
limited sized buffers without going insane, you still have all the
problems of having limited size buffers...

http://www.and.org/vstr/security.html#alloc
[1] in comparison to the "store the length" method.

_4 bytes of metadata_
and if you want to dynamically allocate the string this is probably less
than 25% of a zero length (1 byte long[2]) string.
And if you aren't dynamically allocating the string, you are almost
certainly going to have the fixed size buffer greater than 16 bytes long,
so you again have less than 25% overhead.

But yeh it's not impossible to do that, you might only need to create
one or two extra functions and it's possible you won't have any security
problems because of it. I might even put Richard on the list of people
that can do all of that, however that's a very short list.

[2] Malloc implementations I've seen require at least 16 bytes of overhead
per object, so you get 16 + 4 + 1 vs. 16 + 1
 
R

Richard Heathfield

James said:
On Tue, 22 Jul 2003 07:38:33 +0000, Richard Heathfield wrote:
So you create a local stpcpy(), strconcat() and etc. or varients thereof

strconcat is out, since it invades implementation namespace.
that take a pointer to a length and the begining of the c style string.

Why bother? Just remember the length in an auto variable. Much of the time,
this is sufficient.
Then you just have to deal with all the problems of using a string API
that limits the length of data:

Have I misunderstood you? I'm not aware of any imposed limit on string
length in the C string model.
But this says nothing about how good or bad the C-library string API is.

memcpy: Useful, but requires the programer to keep track of metadata for
dst.
Right.

memmove: Same as memcpy.

Right, in this regard at least!
strcpy: Most commonly used for buffer overflows,

That's a little unfair on strcpy. If the programmer is careful (as all
programmers should be), strcpy is perfectly safe.
as with all the str*
functions to create data the two inputs cannot be the same.

Why would you want to copy a string onto itself?

So there are some useful functions for dealing with C style strings that
exist, but as I've said the only sane way to create those strings is to
abuse strxfrm() or write your own using memcpy()/memmove().

strcpy still works fine for me.
And then after you've created those functions so you can move data to
limited sized buffers without going insane, you still have all the
problems of having limited size buffers...

So don't use limited size buffers.

But yeh it's not impossible to do that, you might only need to create
one or two extra functions and it's possible you won't have any security
problems because of it. I might even put Richard on the list of people
that can do all of that, however that's a very short list.

If the list is indeed so short, the programming industry needs to be very
very worried. It's not difficult to get this right.
 
K

Kevin Easton

Mark McIntyre said:
Well, firstly I still contend that this is relatively speaking
insignificant except in critical sections of code (eg tight loops, but
erm what are you doing manipulating strings in tight loops? :) ), and
secondly I contend that in such sections, strcat is a poor choice
anyway, memcpy is probably more appropriate.

For some programs, string manipulation is the meat of their job. Anyway
- you've hit the nail on the head - using memcpy _is_ probably more
appropriate, and using it in the best way involves keeping the length of
strings around (or a pointer to the end, which amounts to the same
thing). The question is why builtin C strings use a sentinel method
rather than a length/end-pointer method to indicate their extent[%] - are
there any downsides to the latter?

- Kevin.

[%] Obviously the horse has not only well and truly bolted, but gone on
to live and long and happy life roaming the countryside and long since
died peacefully. So the question is merely of academic interest at this
point.
 
R

Richard Heathfield

James said:
That's a little optimistic, there are very few cases where you couldn't
just as easily use memcpy() ... that aren't errors.

I disagree (although of course that might just mean that I have less
experience of fighting malware than you do). I find strcpy to have
expressive power, which is why I prefer it to memcpy when strings are
involved.
I've seen code like...

strcpy(s1, s1 + 1);

Um, yes, I've seen code like that too. My LART had memmove written on it (on
the bit just surrounding the sticky-out nail), in large letters. Once the
blood had stopped flowing out quite so freely.
As for the rest of the industry, they seem to be desperatly trying to
change language, once every 5 years ... which seem like buying a Ford
because the car stereo in your Mercedes doesn't play tapes, but they're
having fun I guess :).

Crazy world. 'Twas ever thus.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,077
Messages
2,570,566
Members
47,202
Latest member
misc.

Latest Threads

Top