Text-Based Windows Library

D

Default User

Keith said:
Bill Pursell said:
I'm confused....if there is no string type in C, but strstr
requires a string as input, then what exactly is it that
strstr is expecting as input?
[...]

strstr() takes two arguments, of course.

Each argument must be a char* (a type, enforced at compile time) and
each argument must point to a string (a data format, not enforced at
compile time). If you try to give it something other than a char*,
your program won't compile.

To nitpick, you can also give it a void*, or any other pointer cast to
char* or void*.



Brian
 
K

Keith Thompson

Clem Clarke said:
And for those people that suggest that C does have a string type, I
would suggest that it doesn't, compared with Pascal, PL/I, COBOL or
even Assembler. It has an array of characters, which is quite
different from a string.
[...]

Those people who suggest that C has a string type simply need to read
the standard, or any decent textbook. C doesn't have a string type,
and I'm not aware that there's any real controversy on that point.
It's understandable that someone not familiar with the language might
not know that.
 
E

Ed Jensen

Kenneth Brody said:
It's "dangerous" in that a newbie doesn't see the difference between:

char token[6] = "123456";
and
char token[] = "123456";

After all, they only see 6 chars in both.

And, as far as the compiler is concerned, both can be passed to
strstr() without issue.

However, a language that prevents a newbie from doing anything that
you can call "dangerous" is also not going to allow an experienced
programmer from doing anything beyond newbie-stuff.

With power comes responsibility.

It seems to me that the responsible thing to do is to use "safe"
strings unless performance is absolutely critical, otherwise you're
just putting your users at risk.
 
W

websnarf

Kenneth Brody said:
It's "dangerous" in that a newbie doesn't see the difference between:
char token[6] = "123456";
and
char token[] = "123456";
After all, they only see 6 chars in both.
And, as far as the compiler is concerned, both can be passed to
strstr() without issue.
However, a language that prevents a newbie from doing anything that
you can call "dangerous" is also not going to allow an experienced
programmer from doing anything beyond newbie-stuff.
With power comes responsibility.

It seems to me that the responsible thing to do is to use "safe"
strings unless performance is absolutely critical, otherwise you're
just putting your users at risk.

Uhh ... are you making a false choice here -- I mean a *REALLY REALLY*
false choice? Length delimited strings are both safer *AND* faster.
Truly the only advantage of using C's raw char * strings is where you
want to minimize your code size footprint. Otherwise, using an
alternative is usually always better.
 
K

Kelsey Bjarnason

[snips]

It's "dangerous" in that a newbie doesn't see the difference between:

char token[6] = "123456";
and
char token[] = "123456";

After all, they only see 6 chars in both.

As you said, "newbie". Someone not properly competent to be releasing
software unto the world - nor likely to do so, one would hope.
And, as far as the compiler is concerned, both can be passed to strstr()
without issue.

If so, then there's no problem and the issue is irrelevant.

Whoops... there *is* a problem, because they cannot both be passed to
strstr without issue; one of them distinctly does have issues, because -
as noted previously - you cannot chuck incorrect data about and expect to
get correct results.
With power comes responsibility.

Indeed; as a programmer you have the power to corrupt data, crash systems,
wreak all kinds of havoc; if you haven't got enough responsibility to get
something as basic as your data types right, you have no business
releasing your software on an unsuspecting world.
 
K

Kelsey Bjarnason

[snips]

Uhh ... are you making a false choice here -- I mean a *REALLY REALLY*
false choice? Length delimited strings are both safer *AND* faster.

Are they? Doesn't that depend on how they're implemented? Take a simple
example: looping over the characters in a string. Here's a couple
different ways:

size_t idx;
for ( idx = 0; idx < string.length(); idx++ )
...

size_t idx;
for ( idx = 0; idx < string.length; idx++ )
...

char *ptr = string;
while( *ptr++ )
...

Depending on what's in "..." the first may actually require a
re-evaluation of length() at each iteration. Now, length() may do nothing
but returning a value from inside a structure, but it's still got the
overhead of a function call on top of the comparison.

The second strikes me as likely to be the fastest of the lot... but if the
length member is exposed this way, it also strikes me as risking it being
changed outside the scope of the proper string operators - thus making the
"safe" part of "safe strings" somewhat questionable.

The last method relies solely on the data in the string and has the
overhead of incrementing a pointer, rather than calling a function: while
neither is guaranteed to be faster or slower than the other, experience
suggests that pointer increments tend to be faster than function calls.

The only particular dangers with the last option are what happens if
"string" is freed inside the loop, or altered to be shorter than where ptr
is currently pointing? Certainly possible, but I can't say I've ever seen
the latter happen... and the former is one of those things you test for
anyhow if there's a risk of it happening.

It seems to me there are cases where "safe strings" can certainly be
faster than conventional strings, but it seems as well there are cases
where the speed gains are very much in doubt.
 
K

Kelsey Bjarnason

[snips]

I can show you literally dozens of posts -- written in this forum by
experts respected by all -- that contain buffer overrun exploits.

In released production code, or in something tossed off here on the spur
of the moment with minimal testing and no code review?
There is no debate as to whether the construct is dangerous or not.
The fact that it has caused billions of dollars of damage means that
it *is* _by definition_ dangerous.

That doesn't make the construct dangerous, it makes improper use of it
dangerous. A hammer is not particularly dangerous in and of itself, but
used incorrectly - say by dropping it off a tall building - can be fatal.
Does this mean the hammer is improperly designed and it should have been
made out of soft foam? Or does it mean that any tool, in the hands of
people failing to apply due care and diligence, can be dangerous?
I belive that those who say that C strings are not dangerous are not
clear thinkers, or are simply so in love with C that they refuse to
look at the wart on her nose and say that it is anything less than
breathtakingly beautiful.

Or they simply wish to point the blame where it belongs. Replacing all
the tools in your toolbox with replacements made by Nerf might make
the world a little safer... but it doesn't actually buy anything. It
places the blame on the tool, instead of the person wielding the tool.
 
K

Kelsey Bjarnason

[snips]

Cars also kill people from time to time. There are some things about
cars that are clearly unsafe. If the brakes are bad and you hit
someone in a car you borrowed, it was not the driver but the car.

As Kenneth Brody said, "With power comes responsibility." The driver has
gained the power to get about quickly, to transport goods efficiently, to
be sure, but he has also gained the power to kill, maim and otherwise
cause damage. The responsibility is to ensure he has the skill to avoid
this as much as is humanly possible... but it is *also* his responsibility
to ensure the tools he use are maintained correctly - such as checking the
brakes before relying on them.

Incorrect or unsafe use of a tool does not make the tool unsafe, it simply
means the user of it is being irresponsible.
C strings are the bad brakes in the C language.

Except they're not. They never wear out. They never fail to do exactly
the same thing, exactly the same way, whether you use them once or a
million times. They need no maintenance, no care, no looking after.
An acetylene torch is a dangerous thing. You can do lots of wonderful
things with it. Maybe, if it was safer, it would be a lot harder to
get the work done. But when someone comes along and wants to learn
how to weld, I think it prudent to tell the welding student that the
yellow (and sometimes blue) thing that comes out of the end of the
torch is dangerous. If we tell them that it is completely safe, then
we are lying. Not only that, but they are far more liable to damage
of some sort than if properly instructed.

Exactly. The worker needs to learn the _correct use_ of the tools. The
tools, in and of themselves, are not the problem; it is the incorrect use
of them that is the problem.

You're making the point for us: it's not the C strings that are the
problem, but the people who use them carelessly. Yes, isn't that what
people have been telling you?
To tell anyone that C strings are safe is a fabrication because the
evidence proves that they are very, very dangerous.

Few things are "safe" in an absolute sense. Things are safe when used
correctly. That's true of water, of oxygen, of welding torches and of
C-style strings.
Clearly, C strings are a problem because of all the damage they cause.

By this argument, you might as well say it's the end users who are the
problem: after all, they're the ones who cause the damage, by running the
programs, entering the data. If we got rid of the users, the problems
would simply cease to exist.

It's a silly argument. The tools are there, to be used. You can use them
correctly and safely, or incorrectly and dangerously. Blaming the tool
because the programmer can't use it properly is pointless.
 
K

Kelsey Bjarnason

[snips]

That's false, because good programmers have done it. It's not common,
but it happens.

Damned rarely... particularly in released versions of production code.
Now, Lawrence Kirby was not only familiar with the problem of buffer
overrun with scanf() of fscanf() combined with %s, but he often
counseled against it. Certainly, this is not some polished work that
he did and neither was it his original work, but rather a correction
of someone else's work. But if such an attrocity can escape from the
fingers of one Lawrence Kirby, then it can escape from the fingers of
us all.

This, too, is a bit silly. The OP was obviously a newbie, learning his
way about the language. While it would be a good idea to instill good
habits - such as not using scanf this way - in a newbie, this needs to be
balanced against the readiness of the recipient to have such information
imparted.

One must walk before one can run; this person was obviously at the stage
of having learned to walk but not having quite mastered it yet.
Explaining to them the finer points of running - speed versus stamina,
training methods for each and their pros and cons - is pointless; they're
not at that stage yet.

I'm willing to bet if you examined Kirby's released production code, you'd
be hard pressed to find such constructs - but then, he's past the stage of
simply toddling.
 
P

pete

user923005 said:
Now, Lawrence Kirby was not only familiar with the problem of buffer
overrun with scanf() of fscanf() combined with %s, but he often
counseled against it. Certainly, this is not some polished work that
he did and neither was it his original work, but rather a correction
of someone else's work. But if such an attrocity can escape from the
fingers of one Lawrence Kirby, then it can escape from the fingers of
us all.

You will find other scintillating stars of c.l.c also guilty in this
regard.

A bulletproof input routine using fscanf:

http://www.mindspring.com/~pfilandr/C/fscanf_input/fscanf_input.c
 
U

user923005

[snips]

Cars also kill people from time to time. There are some things about
cars that are clearly unsafe. If the brakes are bad and you hit
someone in a car you borrowed, it was not the driver but the car.

As Kenneth Brody said, "With power comes responsibility." The driver has
gained the power to get about quickly, to transport goods efficiently, to
be sure, but he has also gained the power to kill, maim and otherwise
cause damage. The responsibility is to ensure he has the skill to avoid
this as much as is humanly possible... but it is *also* his responsibility
to ensure the tools he use are maintained correctly - such as checking the
brakes before relying on them.

Incorrect or unsafe use of a tool does not make the tool unsafe, it simply
means the user of it is being irresponsible.

A tool that by design is unsafe to use is a badly designed tool. If
there were not literally hundreds of exploits in production code that
have actually been used for evil purposes, then these "C strings are
safe" arguments would make sense. Expert coders who are trying their
hardest still cause errors in this regard.

So now, let the first person who has written at least 10,000 lines of
code and who has also never had even one bug in his code raise his
hand.

What? No hands raised? Then perhaps we should assume that the users
of the tools are not perfect and (in fact) prone to mistakes.

These mistakes do happen, on a frequent basis. These mistakes also
cause billions of dollars in damage. This is not a theoretical
argument. It is a statement of fact, using what we actually observe.
The claim is that responsible coders will not introduce mistakes that
cause damage because of the way C strings are designed.
This claim is clearly and obviously wrong, because these mistakes are
made by senior level programmers.
Now, you can say it was the programmer's fault and that is fine. But
the fact that it happens again and again shows that even competent,
well meaning, conscientious programmers make mistakes which are
avoidable given a real string type that knows how to protect itself.
 
R

Richard Heathfield

user923005 said:

A tool that by design is unsafe to use is a badly designed tool.

Agreed. Strings are not tools. They are, perhaps, wood. The unsafe tools
are gets(), scanf("%s", etc. Let us write decent, safe tools for
handling our raw materials, by all means. Nobody is suggesting
otherwise.
 
C

CBFalconer

user923005 said:
.... snip ...

Now, you can say it was the programmer's fault and that is fine.
But the fact that it happens again and again shows that even
competent, well meaning, conscientious programmers make mistakes
which are avoidable given a real string type that knows how to
protect itself.

All you are saying is that the programmer should choose the
language to fit the project. Pascal, Ada, and others come to mind.

--
<http://www.cs.auckland.ac.nz/~pgut001/pubs/vista_cost.txt>
<http://www.securityfocus.com/columnists/423>
<http://www.aaxnet.com/editor/edit043.html>

"A man who is right every time is not likely to do very much."
-- Francis Crick, co-discover of DNA
"There is nothing more amazing than stupidity in action."
-- Thomas Matthews
 
K

Kelsey Bjarnason

[snips]

A tool that by design is unsafe to use is a badly designed tool.

So let's get rid of hammers, axes, knives, screwdrivers, cutting torches,
most saws, drills, forks, pencils, pens, computers, TVs, cars, toasters...

Every single one of those is a tool. Every single one, used improperly,
is dangerous - even potentially fatal. Most *cannot* be designed to be
safe if used improperly; therefore, by your argument, they are badly
designed and need to be replaced... except... if they cannot be designed
to meet your requirements, all that's left is to abandon them completely.

This, to you, makes sense, does it?
If
there were not literally hundreds of exploits in production code that
have actually been used for evil purposes

And tens of thousands of people killed in car crashes, by knives,
thousands more killed by assorted workshop tools, more still killed by TVs
and radios and toasters...
, then these "C strings are
safe" arguments would make sense. Expert coders who are trying their
hardest still cause errors in this regard.

First, a string isn't a tool; it's a piece of data. The tools in this
case are the functions which operate upon strings. Those tools, like any,
require a certain attention to detail, a certain skill.

Part of that skill, part of that attention, is simply knowing how to use
them correctly in the first place. Another part, at least for "critical"
code, involves items such as code reviews, use of unit testing, use of
scattershot and other randomized testing tools and so forth.

I'm willing to bet you will find *damned* few instances of
C-string-related exploits which have passed a proper testing and
examination regimen.
So now, let the first person who has written at least 10,000 lines of
code and who has also never had even one bug in his code raise his hand.

Umm... er... okay. Mine. 35,000+ lines in a single project... over
200,000 units shipped. While there were a few design issues that came up
- as in "it'd be nice if it could do this" - not a *single* actual bug
report.
What? No hands raised?

Mine is.
Then perhaps we should assume that the users of
the tools are not perfect and (in fact) prone to mistakes.

A tool cannot be prone to anything; it is inert, sitting there waiting for
you to use it. *How* you use it determines the risk. If you choose to
walk into a crowded mall with a high powered rifle, put on a blindfold and
empty the clip in random directions, it is not the fault of the tool that
someone gets hurt.
These mistakes do happen, on a frequent basis.

No, they don't - at least, not to seasoned developers using proper
development, testing and verification strategies.

It is *precisely* when one becomes blase about the use of the tools - I'm
not talking strcpy and the like, I'm talking the entire language, and even
more generally, the entire world of software development, regardless of
language or platform - that dangers arise.

There's an old adage, it goes something like this: "If engineers built
buildings the way programmers built buildings, the first woodpecker to
come along would destroy civilization."

There's a certain truth to that. I have met far too many developers over
the years who simply _don't care_. They rely on functions to "just work"
and don't bother checking return codes. They assume allocations succeed.
They assume they have enough space to copy a string or write a block of
data.

There is *nothing* you can do about such people other than identify them
and either train them properly or try to prevent them ever developing
software. No language, no tool in existence will ever stop them writing
bad software, software full of holes, full of risks. They are simply
incapable - through lack of training, lack of concern, whatever - of
producing quality software.

But that's the whole point: there is *nothing* you can do, from a language
perspective, to stop them. Nothing. They will always manage to do
something wrong. If it's not a failed string copy, it's a division where
they don't check for division by zero. Or something, anything. Lacking
the skills or the concern to do it right, they do it wrong, and the only
way to prevent it is to prevent them writing software - not to try to
create some magical language where it is impossible to write bad code.

Professional developers, writing production code, use professional
methods. They use well-structured designs, for starters. They examine
and handle errors and allocation failures. They make sure buffers are
large enough.

However, a professional developer *also* knows he is human and can make
mistakes, so he'll use other things to help him. Compilers with maximal
warning levels, and he'll treat warnings as fatal errors. He'll run lint
and similar tools. He'll do coverage analysis. He'll use random
injection tests to validate that a piece of code doesn't break when handed
data too large or too small or with improper values. Where possible,
he'll submit the code to review.

Check that failing code you go on about. See how much of it has gone
through all those levels of design, testing and validation and *still*
managed to produce fatal errors.
These mistakes also
cause billions of dollars in damage. This is not a theoretical
argument. It is a statement of fact, using what we actually observe.

Really? Tell you what. How about you provide a single example where the
tools - say the C string handling functions - actually caused billions in
damages... when used correctly.
The claim is that responsible coders will not introduce mistakes that
cause damage because of the way C strings are designed.

No, the claim is that professional developers apply professionalism to
development and take steps to ensure that this sort of thing doesn't
happen in programs properly designed and used within the limits of those
designs. The claim, further, is that professional programmers apply
professionalism to testing and verifying that their code actually does
work as intended, despite being given garbage data.

There's another old saying, GIGO - Garbage In, Garbage Out. However,
while trite and well known, it is actually the mark of a poor programmer.
The saying suggests that it is okay - even expected - that if bad data is
given to a program, producing flawed results is acceptable.

Ask any professional programmer what his routines do when they encounter a
value out of range, for example. The response will almost invariably be
something like "bail out, reporting an error."

There's a reason for it; if something is outside the realm of what the
code is designed to handle, then something has gone fatally wrong: the
program is being used incorrectly, or outside the scope of the design it
was built to, or a device - or user - is giving it invalid input, or some
other routine has failed in some manner.

Garbage in does not mean garbage out, not if you're a pro; it means
_errors_ out. Not crashes, not overflows, not random writes to random
bits of memory - errors. As in "I don't know what to do with this, so
tell someone about it and let them sort it out."

Again, it is precisely the lack of concern and diligence that causes these
problems - not the tools. Short of making a "perfect" - and perfectly
unusable - language, you cannot stop poor developers writing poor code.
The best you can accomplish is making it difficult for good programmers to
write good code - and a good programmer doesn't need that hand-holding in
the first place. He already has the tools necessary to detect and deal
with these problems, and he already *uses* those tools.
 
C

CBFalconer

pete said:
.... snip ...

A bulletproof input routine using fscanf:

http://www.mindspring.com/~pfilandr/C/fscanf_input/fscanf_input.c

It would have been much simpler to simply publish it right here.
However, according to N869:

s Matches a sequence of non-white-space
characters.228)

The following is quoted from the reference and modified to reduce
whitespace:
#include <stdio.h>

#define LENGTH 40
#define str(x) # x
#define xstr(x) str(x)

int main(void)
{
int rc;
char array[LENGTH + 1];

puts("The LENGTH macro is " xstr(LENGTH) ".");
do {
fputs("Enter any line of text to continue,\n"
"or just hit the Enter key to quit:", stdout);
fflush(stdout);
rc = fscanf(stdin, "%" xstr(LENGTH) "[^\n]%*[^\n]", array);
if (!feof(stdin)) getc(stdin);
if (rc == 0) array[0] = '\0';
if (rc == EOF) puts("rc equals EOF");
else printf("rc is %d. Your string is:%s\n\n", rc, array);
} while (rc == 1);
return 0;
}

which doesn't seem to match the specification here. How come?

--
Please do not top-post. Your answer belongs after (or intermixed
with) the quoted material to which you reply, after snipping all
irrelevant material. See the following links:

<http://www.catb.org/~esr/faqs/smart-questions.html>
<http://www.caliburn.nl/topposting.html>
<http://www.netmeister.org/news/learn2quote.html>
<http://cfaj.freeshell.org/google/> (taming google)
<http://members.fortunecity.com/nnqweb/> (newusers)
 
W

websnarf


Uhh ... this is bullet broof? You have to hand synchronize the LENGTH
and remember the +1 in the array declaration ... using some global
macro namespace for str() ... and this thing doesn't exactly roll off
the tongue does it? Here are the Bstrlib way of doing things:

bstring b = bgets ((bNgetc) getc, stdin, '\n');

Or:

bstring b = bSecureInput (LENGTH, '\n', (bNgetc) getc, stdin);

if the truncation semantics are that important to you. Its one line,
and there is no confusion, ambiguity or danger. Its also more
powerful as you can easily implement this on top of sockets or other
interesting input streams.
 
W

websnarf

user923005 wrote:

... snip ...


All you are saying is that the programmer should choose the
language to fit the project. Pascal, Ada, and others come to mind.

No he is not saying that at all. Where do you even read that as an
implication?
 
E

Ed Jensen

Uhh ... are you making a false choice here -- I mean a *REALLY REALLY*
false choice? Length delimited strings are both safer *AND* faster.
Truly the only advantage of using C's raw char * strings is where you
want to minimize your code size footprint. Otherwise, using an
alternative is usually always better.

You are, of course, correct.
 
C

Charlton Wilbur

KB> Exactly. The worker needs to learn the _correct use_ of the
KB> tools. The tools, in and of themselves, are not the problem;
KB> it is the incorrect use of them that is the problem.

Beyond that, continuing the analogy, when you need a worker to weld
things together but can't trust that he will use the acetylene torch
correctly, you build a custom tool. But you don't elminate the
acetylene torch from your toolset entirely, because there will be
situations in which the acetylene torch in the hands of a competent
welder is the only tool that can get a particular job done under a
certain set of constraints.

Likewise, when you need a programmer to manipulate strings, but can't
trust that he will use C strings correctly, you build a custom tool.
But you don't eliminate C strings from your toolset entirely, because
there will be situations in which C strings in the hands of a
competent programmer is the only tool that can get a particular job
done under a certain set of constraints.

What makes this argument even more ludicrous is that there *are*
custom C string libraries that solve all of these problems. Look at
CFString in Apple's Core Foundation library, for instance. If you
want safer strings in C, you've got them!

Charlton
 
K

Kenneth Brody

Kelsey said:
[snips]

It's "dangerous" in that a newbie doesn't see the difference between:

char token[6] = "123456";
and
char token[] = "123456";

After all, they only see 6 chars in both.

As you said, "newbie". Someone not properly competent to be releasing
software unto the world - nor likely to do so, one would hope.
And, as far as the compiler is concerned, both can be passed to strstr()
without issue.

If so, then there's no problem and the issue is irrelevant.

Whoops... there *is* a problem, because they cannot both be passed to
strstr without issue; one of them distinctly does have issues, because -
as noted previously - you cannot chuck incorrect data about and expect to
get correct results.

Note my qualification of "as far as the compiler is concerned".
Indeed; as a programmer you have the power to corrupt data, crash systems,
wreak all kinds of havoc; if you haven't got enough responsibility to get
something as basic as your data types right, you have no business
releasing your software on an unsuspecting world.

Well, the point here is that they are both character arrays. The
data in the arrays is different, of course.

--
+-------------------------+--------------------+-----------------------+
| Kenneth J. Brody | www.hvcomputer.com | #include |
| kenbrody/at\spamcop.net | www.fptech.com | <std_disclaimer.h> |
+-------------------------+--------------------+-----------------------+
Don't e-mail me at: <mailto:[email protected]>
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,161
Messages
2,570,892
Members
47,427
Latest member
HildredDic

Latest Threads

Top