Text-Based Windows Library

U

user923005

Clem Clarke said:


C strings are perfectly safe. Some people, however, should not be let
anywhere near a text editor.

I think that this is a dangerous oversimplification, because the
language allows:
gets(str);
and
scanf() with %s
and memcpy which does not know how long the objects are
and strcpy which does not know how long the objects are

The billions of dollars of damage caused by computer virus attacks is
largely due to buffer overrun.

Let's not pretend that there isn't any problem. If we do not tread
carefully in C, we can do stupendous damage.

On the other hand, let's also not pretend that we can have s'afe and
fast'. We can have 'safe or fast'. Either I check the physical target
storage address of every character I transfer by stupendous effort or
I don't.

String handling in C is definitely the greatest weakness of the
language. And so we must be keenly aware of it and tread very, very,
very carefully.

IMO-YMMV
[snip]
 
K

Keith Thompson

user923005 said:
On the other hand, let's also not pretend that we can have s'afe and
fast'. We can have 'safe or fast'. Either I check the physical target
storage address of every character I transfer by stupendous effort or
I don't.

But it is possible to have "safe and nearly as fast as the unsafe
version". Straying slightly off-topic, in a language that has bounds
checking built into the language definition, a sufficiently clever
compiler can often eliminate many, probably most, of the checks.

For example, consider something like this:

char *s = "hello, world";
int len = strlen(s);
int i;
for (i = 0; i < len; i ++) {
printf("s = '%c'\n", s);
}

Theoretically, the expression s requires a bounds check on each
evaluation; if i < 0 or i >= strlen(s), the check fails. But a clever
compiler can perform data flow analysis and determine that the check
will never fail, and thus can be eliminated.

In other words, a lot of checks can be done at compilation time, *if*
there's enough information there for the compiler to work with.

Since C wasn't designed with this kind of thing in mind, it can be
difficult to provide this kind of information in the source.
 
R

Richard Heathfield

user923005 said:
I think that this is a dangerous oversimplification, because the
language allows:
gets(str);
and
scanf() with %s

This is a problem with gets and scanf("%s", not a problem with C
strings. That is why I do not use gets or scanf("%s".
and memcpy which does not know how long the objects are

Yes, it does, because you tell it. Of course, if the programmer tells it
the wrong information, that's a problem, but it's not a memcpy problem.
It's a programmer problem.
and strcpy which does not know how long the objects are

That doesn't matter, since the programmer takes responsibility for
ensuring that he or she doesn't ask strcpy to do something unsafe. Of
course, some people should not be let anywhere near a text editor...
The billions of dollars of damage caused by computer virus attacks is
largely due to buffer overrun.

This is not a C string issue, though. This is a programmer issue.
Let's not pretend that there isn't any problem. If we do not tread
carefully in C, we can do stupendous damage.

That is a good reason to tread carefully. It is not a good reason to
reject C strings as being inherently unsafe.

On the other hand, let's also not pretend that we can have s'afe and
fast'. We can have 'safe or fast'. Either I check the physical target
storage address of every character I transfer by stupendous effort or
I don't.

I have yet to see a fast string. :)

String handling in C is definitely the greatest weakness of the
language.

Personally, I disagree, but I certainly respect your opinion.
And so we must be keenly aware of it and tread very, very,
very carefully.

And those who do tread carefully will still occasionally foul up, but
that's what testing's for, right? As for those who do not tread
carefully, I wouldn't want them programming in *any* language, using
*any* string model.

A bit, yes. I think it's just a question of perspective, actually. We
both see the same problems, but we think about them in different ways.
 
U

user923005

user923005 said:



This is a problem with gets and scanf("%s", not a problem with C
strings. That is why I do not use gets or scanf("%s".


Yes, it does, because you tell it. Of course, if the programmer tells it
the wrong information, that's a problem, but it's not a memcpy problem.
It's a programmer problem.


That doesn't matter, since the programmer takes responsibility for
ensuring that he or she doesn't ask strcpy to do something unsafe. Of
course, some people should not be let anywhere near a text editor...


This is not a C string issue, though. This is a programmer issue.


That is a good reason to tread carefully. It is not a good reason to
reject C strings as being inherently unsafe.


I have yet to see a fast string. :)


Personally, I disagree, but I certainly respect your opinion.


And those who do tread carefully will still occasionally foul up, but
that's what testing's for, right? As for those who do not tread
carefully, I wouldn't want them programming in *any* language, using
*any* string model.


A bit, yes. I think it's just a question of perspective, actually. We
both see the same problems, but we think about them in different ways.

The C language strings are clearly a danger. The literal billions of
dollars in damage caused by careless use of strings is a clear
indication of that fact. It is possible to design strings that know
how big they are and protect themselves from over-writes. Other
languages have this feature. Of course the cost is that these strings
are not fast like C strings.

Here is a C language feature that is dangerous which has to do with
strings:

static const char token[6] = "123456";

....

if (strstr(string, token) != NULL) puts("found it.");

The fact that it is so insidious made the C++ people choose this as
something that simply had to be fixed and could not be passed on as
allowed in the new language.
 
B

Barry

user923005 said:
user923005 said:





This is a problem with gets and scanf("%s", not a problem with C
strings. That is why I do not use gets or scanf("%s".


Yes, it does, because you tell it. Of course, if the programmer tells it
the wrong information, that's a problem, but it's not a memcpy problem.
It's a programmer problem.


That doesn't matter, since the programmer takes responsibility for
ensuring that he or she doesn't ask strcpy to do something unsafe. Of
course, some people should not be let anywhere near a text editor...


This is not a C string issue, though. This is a programmer issue.


That is a good reason to tread carefully. It is not a good reason to
reject C strings as being inherently unsafe.


I have yet to see a fast string. :)


Personally, I disagree, but I certainly respect your opinion.


And those who do tread carefully will still occasionally foul up, but
that's what testing's for, right? As for those who do not tread
carefully, I wouldn't want them programming in *any* language, using
*any* string model.


A bit, yes. I think it's just a question of perspective, actually. We
both see the same problems, but we think about them in different ways.

The C language strings are clearly a danger. The literal billions of
dollars in damage caused by careless use of strings is a clear
indication of that fact. It is possible to design strings that know
how big they are and protect themselves from over-writes. Other
languages have this feature. Of course the cost is that these strings
are not fast like C strings.

Here is a C language feature that is dangerous which has to do with
strings:

static const char token[6] = "123456";

...

if (strstr(string, token) != NULL) puts("found it.");

The fact that it is so insidious made the C++ people choose this as
something that simply had to be fixed and could not be passed on as
allowed in the new language.

You just said a few minutes ago anyone can be a C programmer in
a month. While you are not in error here, your argument isn't
consistent.
 
R

Richard Heathfield

user923005 said:

The C language strings are clearly a danger.

Well, obviously it's clear to you, but it's not so clear to me.
The literal billions of
dollars in damage caused by careless use of strings is a clear
indication of that fact.

To me, it indicates, rather, that carelessness is clearly a danger.

Here is a C language feature that is dangerous which has to do with
strings:

static const char token[6] = "123456";

I see nothing dangerous about this.
if (strstr(string, token) != NULL) puts("found it.");

But this is just carelessness on the part of the programmer, since
'token' - whilst an array of char - is not a string, so he has no
business passing it to a function that expects one.
The fact that it is so insidious made the C++ people choose this as
something that simply had to be fixed and could not be passed on as
allowed in the new language.

Insidious? MMMV.
 
E

Ed Jensen

Richard Heathfield said:
and memcpy which does not know how long the objects are

Yes, it does, because you tell it. Of course, if the programmer tells it
the wrong information, that's a problem, but it's not a memcpy problem.
It's a programmer problem.
[SNIP]
and strcpy which does not know how long the objects are

That doesn't matter, since the programmer takes responsibility for
ensuring that he or she doesn't ask strcpy to do something unsafe. Of
course, some people should not be let anywhere near a text editor...
[SNIP]
The billions of dollars of damage caused by computer virus attacks is
largely due to buffer overrun.

This is not a C string issue, though. This is a programmer issue.

[SNIP]

Maybe we should write old style (K&R) C code too. You know, without
unnecesary things like function prototypes.

After all, if you screw up your parameters and the compiler fails to
tell you, "it's just a programmer problem" and "that's what testing is
for".

We wouldn't want the compiler making life easier for the developer and
safer for the end user. No, oh no. That's a programmer issue!
 
R

Richard Heathfield

Ed Jensen said:

Maybe we should write old style (K&R) C code too. You know, without
unnecesary things like function prototypes.

After all, if you screw up your parameters and the compiler fails to
tell you, "it's just a programmer problem" and "that's what testing is
for".

But function prototypes do not increase the execution time of the code.
They're a translation feature. If we can make the programmer's life
easier by making the translator spot more errors, that's great! If the
compiler can spot bounds violations at compilation time, that's
fabulous! Nothing but good news, surely. But let's not put the
bounds-checker into the C runtime - ***please***.

If I write a loop that I know won't violate bounds, what's the point in
the computer agreeing with me over and over again?

And if I write a loop that I know *will* violate bounds but which will
cause a (highly non-portable) behaviour that I happen to want, then I
don't want the compiler telling me I can't.

And if I write a loop without realising that it violates bounds, it's my
own stupid fault.
 
W

websnarf

Some 20 years ago, it became clear that C strings were not as safe, nor
as fast, as strings in PL/I, Assembler or Pascal. [...] I have spent some
years studying this problem and have developed some User friendly C macros
that solve the problem. [...]

These macros just look like they encode classic tricks that any
competent C programmer could or would duplicate by hand. The set of
functions your propose is also extremely small, (fgets(), strftime(),
strtok() anyone?). So it seems hard to see this as an effective
programming aid. Data truncation or runtime stops are also kinds of
errors -- while usually better than overruning the buffer, this leaves
the problem of dynamic strings unsolved in the C language. The real
reason why Pascal strings are still better than C is because storage
for the strings is automatically managed.

Look, if you want to compare safety and speed, you should compare it
to my library:

http://bstring.sf.net/

I set a pretty high bar for both speed and safety. Its also extremely
easy to use, its portable, and works directly with ordinary C string-
based libraries without onerous conversion penalties, if the need
arises. This library also performs automatic dynamic memory
management so that buffer overflows are basically eliminated.
 
E

Ed Jensen

Richard Heathfield said:
Ed Jensen said:

But function prototypes do not increase the execution time of the code.
They're a translation feature. If we can make the programmer's life
easier by making the translator spot more errors, that's great! If the
compiler can spot bounds violations at compilation time, that's
fabulous! Nothing but good news, surely. But let's not put the
bounds-checker into the C runtime - ***please***.

No matter how much some people might advocate it, nobody is going to
take regular C strings away from C developers. On the other hand,
having access to macros or a library that implements "safe" strings
might be a tradeoff some developers want to make.

Hell, it's very likely a tradeoff most end users would want the
developer to make, unless we're talking about low level, performance
critical code.

Therefore, given that C developers lose nothing (they can keep using
regular C strings), and only incur the overhead of "safe" strings if
they choose it, I really don't see what the debate is all about.
 
C

Clem Clarke

Great. Thank you, I will have a look at them.

Do they keep a length byte somewhere for speed like PL/I or Pascal, or
do they use strlen (or equivalent)?


Cheers,

Clem Clarke
 
C

Clem Clarke

Ed Jensen wrote:

I agree with you Ed. No-one would dare to try and take away normal strings!

However, having an easy to use set of routines that make string much
safer, that even make strings faster, seems like a GOOD thing to me.


Cheers,

Clem Clarke


,-._|\ Clement V. Clarke - Author Jol, EASYJCL, EASYPANEL, 370TO486
/ Oz \ Web: http://www.ozemail.com.au/~oscarptyltd
\_,--.x/ 16/38 Kings Park Road, West Perth, AUSTRALIA, 6005.
v Tel (61)-8-9324-1119, Mob 0401-054-155.
Email: (e-mail address removed)
 
C

Clem Clarke

Re Overheads.

And I have been thinking about making a more simplified version (maybe
using C++) that doesn't allow for "fixed" PL/I or 370 Assembler fields.
This would reduce the overhead to a quick binary compare and swap the
maximum lengths. But you would still get the huge speed advantage of
not having to search for (or check for) the binary zero.

Having "fixed" length strings creates a bit of code bloat on Borland 5.x
32 bit compilers. The earlier 16 bit Borland compilers used to optimize
everything away not needed code.

I am all for speed and efficiency, which is why I mentioned this at all.
Every version of Windows gets bigger and slower, and I am quite
convinced that C strings are partly to blame.

Even IBM is using C in Z/OS, and it isn't nearly as fast as it could be,
even though IBM has added a special instruction to go searching for
things like binary zeros.

And on the mainframes, there was always had an instruction to move a
bunch of bytes quickly, in one hit (MVC and later MVCL). So I like to
use them where possible. Although on 64 bit machines you can move 8
bytes pretty quickly by loading and storing a register...


Create only joy and happiness in your life and in others,

Clem Clarke


,-._|\ Clement V. Clarke - Author Jol, EASYJCL, EASYPANEL, 370TO48
/ Oz \ Web: http://www.ozemail.com.au/~oscarptyltd
\_,--.x/ 16/38 Kings Park Road, West Perth, AUSTRALIA, 6005.
v Tel (61)-8-9324-1119, Mob 0401-054-155.
 
C

Clem Clarke

Hi Brian,

(This is definitely off topic...)

I could easily just ignore this totally, but it is an excellent
opportunity to talk about philosophy and judgment.

You have all heard the saying "Judge not" or similar.

What I didn't realise was that judging stops you receiving. It puts
walls around you and stops things coming in.

So, it might be money (too much money is evil...). It might be love (I
am unlovable). It might be ideas (only birds can fly or if we were
meant to to fly, we would have wings).

It might be thoughts about programming languages (PL/I is better than C).

It could be anything....

So, I am doing my darndest to remove all judgments and allow all sorts
of ideas etc to flow to me, from this wonderfully abundant universe.

If you haven't seen the movie "What The Bleep Do we Know", or "The
Secret", they are two of many movies that will give you an idea of what
I am talking about. www.whatthebleep.com and
http://www.thesecret.tv/home.html


And you might like to go to a site
http://www.accessconsciousness.com/about.asp to learn more.

Enjoy life,

Clem

"I am only one, but I am one. I cannot do everything, but I can do
something. And I will not let what I cannot do interfere with what I can
do." ~Edward Everett Hale
 
C

CBFalconer

Clem said:
And I have been thinking about making a more simplified version (maybe
using C++) that doesn't allow for "fixed" PL/I or 370 Assembler fields.
This would reduce the overhead to a quick binary compare and swap the
maximum lengths. But you would still get the huge speed advantage of
not having to search for (or check for) the binary zero.

Top-posting has greatly decreased your readership. Your answer
belongs after (or intermixed with) the quoted material to which you
reply, after snipping all irrelevant material. See the following
links:

--
<http://www.catb.org/~esr/faqs/smart-questions.html>
<http://www.caliburn.nl/topposting.html>
<http://www.netmeister.org/news/learn2quote.html>
<http://cfaj.freeshell.org/google/> (taming google)
<http://members.fortunecity.com/nnqweb/> (newusers)
 
D

Default User

Clem said:
Hi Brian,

(This is definitely off topic...)

I could easily just ignore this totally, but it is an excellent
opportunity to talk about philosophy and judgment.

I think I already plonked you with my work account. At any rate, I'm
definitely plonking you at home.




Brian
 
M

Martin Ambuhl

Clem said:
Hi Brian,

(This is definitely off topic...)

I could easily just ignore this totally, but it is an excellent
opportunity to talk about philosophy and judgment.

[followed by random crap that is definitely not about philosophy or
judgment, as well as being not topical in comp.lang.c]

Bye. I'm sure that by now you've convinced many that you have nothing
worthwhile to say. At least I won't be seeing any of your crap until
you do what trolls usually do to avoid killfiles.
 
W

websnarf

Great. Thank you, I will have a look at them.

Do they keep a length byte somewhere for speed like PL/I or
Pascal, or do they use strlen (or equivalent)?

It has a length field, but its not hidden. Its an explicit part of
the structure and properly defines the correct length of the
"bstring". '\0' is not considered a real terminator except where
semantic equivalency with standard C strings has been ensured (the
situations are intuitive, and explained in the documentation).

Besides the obvious gargantuan performance advantage over strcat,
simple things like loop unrolling for string comparison can be
performed without having to check for terminators first, for example.
For copying, the faster memcpy() and memmove() functions (which can
use system based block copy enhancements) can be used in place of
strcpy(). Functionality improves by allowing you to make arbitrary
reference based sub-strings in a safe way, the library never truncates
results so long as there is available memory, it never buffer
overflows if you are just using the API, it allows you to treat any
binary block of data as a string and it comes with lots of example and
utility code to show you how to use it.

Its also extremely easy to understand and use. Its feels as easy as
Python, Java, Pascal or whatever in terms of string use, while
retaining all the power and representational functionality of C.
 
A

Al Balmer

So, I am doing my darndest to remove all judgments and allow all sorts
of ideas etc to flow to me, from this wonderfully abundant universe.

You've succeeded. If you had any judgment, you wouldn't be posting
this crap here.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,161
Messages
2,570,892
Members
47,427
Latest member
HildredDic

Latest Threads

Top