Zero terminated strings

jacob navia · Jul 31, 2009

jameskuyper said:
Lew said:

On July 31, 2009 13:17, in comp.lang.c, jacob navia ([email protected])
wrote:

[snip]

(2) I am not selling anything there. lcc-win (and the source code
of the string library that it contains) are distributed free of charge.

You can read?

In case you don't, here is it again in BIG LETTERS (so that you can take
your glasses off if you want)

FREE OF CHARGE!

Click to expand...

From your website:

License:

This software is not freeware, it is copyrighted by Jacob Navia. It's
free for non-commercial use, if you use it professionally you have to
have to buy a licence.

Professional use is:
* Related to business (e.g you use it in a corporation)
* If you sell your software.

Let me repeat the key phrase: "If you use it professionally, you have to buy
a licence". That doesn't look free to me. That looks like a paid-for
product, no different than MS Visual C++ (which you only buy a licence
for). That makes you a vendor, sir.

Click to expand...

It could be just a language issue. He's certainly got something (his
compiler, for commercial use) up for sale. However, when he says "I'm
not selling anything", it could be that what he means is that he
hasn't had any commercial users who were willing to pay for it.

I am not different from Cygwin, that asks for thousands of dollars for
the "right" to use their stuff if you do not disclose your source code
i.e. for professional users

and not different from redhat that asks for US$ 20 000 for technical
help from a gcc professional (year technical support license)
5support contracts for professional users)

and not different from Suse, that sells their stuff too.

and not different from all other products that live from professional
users that allow them to keep expenses down!

AND SO WHAT?

Does my license make zero terminated strings any better?

YOU HAVE NO ARGUMENTS Mr. kuyper, that is why you HAVE
to get personal and to start personal attacks.

You have nothing ELSE to argue technically.

Chris M. Thomasson · Jul 31, 2009

Richard Heathfield said:
jacob navia said:

Lew Pitcher wrote:

[snip]

What the OP complains about (his direct complaint) is the result of
a failure to validate, and that can happen in any language.

Click to expand...

Yes bugs can happen in any language.
Right.

Specially in C using zero terminated strings that are a badly
designed data structure.

Click to expand...

They have advantages and disadvantages. They are simple, they are
supported, and they are fast

[...]

I am not exactly sure how using zero-terminated strings could be faster than
using a string abstraction which always knows it's length:
______________________________________________________________
#include <stdio.h>
#include <string.h>

int main(void) {
char buf[12];

/* faster? */
memcpy(buf, "Hello World", 12);

/* slower? */
strcpy(buf, "Hello World");

return 0;
}
______________________________________________________________

if used properly.

Properly in the sense of something like:
______________________________________________________________
#include <stdio.h>
#include <string.h>

int main(void) {
char buf[12];

/* faster? */
memcpy(buf, "Hello World", 12);

/* perhaps just as fast? */
strncpy(buf, "Hello World", 12);

return 0;
}
______________________________________________________________

?

jacob navia · Jul 31, 2009

Eric said:
jacob said:

Lew Pitcher wrote:

[snip]

What the OP complains about (his direct complaint) is the result of a
failure to validate, and that can happen in any language.

Click to expand...

Yes bugs can happen in any language.

Specially in C using zero terminated strings that are a badly designed
data structure.

Normal software developers and the associated community should take care
of modifying the corresponding data structures to avoid error prone
constructs that have proved error-prone since decades!

What you refuse to acknowledge is that a change in the language is
needed.

Click to expand...

And what *you* refuse to acknowledge is that a change in the
language will not solve the problems, whatever they are.

If you do not know what the problems are ("whatever they are")
how can you argue then?

Please take the time to think this over.

Have you been paying attention to the rapidity of adoption of
the C99 Standard, now approaching its tenth birthday? How much
existing C code did that Standard invalidate? Very little, I think:
People who'd been using `restrict' and `inline' as identifiers may
have been discommoded a bit, but were probably a tiny minority.
Similarly, people who'd already written their own functions named
cacos() and cabs() and so on were no doubt disappointed when C99
expropriated the names; this was probably a larger group than the
first, but still a small one. Yet C99's adoption rate has been
"Slow as molasses running uphill in January," as an old saying goes.

But WHY would anyone want to adopt that standard since it brought
no significant improvement to anything important in the language?

You are wrong. It is not bvecause of the little problems that 0.000001%
of the users had with some name clashes that the standard wasn't warmly
welcomed but BECAUSE it did not bring any new improvement!

Look at 1989 when the standard brought function prototypes. THAT WAS
an improvement and everyone welcomed it.

I put it to you, Jacob, that a hypothetical Cxx Standard that
invalidated nearly every existing C program would never be adopted
by anybody at all.

Why "invalidate" any existing code?

I just proposed that counted strings could be accepted in the
language TOO!!!

[snip ridiculous argument against something nobody proposes]

Perhaps the notion is that Cxx would not eliminate zero-terminated
strings, but would offer something else as an alternative, something
that could be used in newly-written code. Fine, but does it solve
the problem that prompted you to start this thread?
Yes.

I don't see how.

By using counted strings, the scanner would read all characters without
stopping at some zero

First, it would do nothing at all for the existing code that already
mangled the existing data.

Sure. A new standard would be for new code OBVIOUSLY.

Second, code is much more often written
to operate with other existing code than written ab origine and in
isolation, which means that there must be a bridge of some kind to
allow the new and old representations to coexist, and in fact to
interoperate.

That is why my sample implementation provides conversion functions

The safety (if achieved) of the new strings disappears
as soon as you push them over into the old-string realm and pull them
back again; they may have been polluted or otherwise mistreated while
abroad.

Yes, but interoperability is important, as you said.

Finally, we have it on the authority of one J.N. that nobody
does new development in C anyhow.

This is a completely untrue. I have never said anything like that
and all my work since 10 years goes against what you put in my
mouth.

Your suggestions might lead to a more robust language (or might not,
but let's just suppose). But that doesn't solve any existing problems,
not until somebody can actually use your MRL and use it without
compromising the robustness it supposedly offers.

That can only happen if the C language evolves. If the
standard changes

You're not offering
a solution; you're offering -- well, I said it earlier: A complaint.

I have proposed a first implementation what is a proof of
concept. I do not have all alone the power to go beyond what I am
doing now.

But at least I try to do something positive about it. What do YOU
do?

Not even a complaint.

jacob navia · Jul 31, 2009

Chris said:
I am not exactly sure how using zero-terminated strings could be faster
than using a string abstraction which always knows it's length:

This argument has been brough countless times and even if the
zero terminated string people try to offer a lot of smoke
around it, it is OBVIOUS to all.

strcat is a MUCH faster operation if you do NOT seek the
terminating zero.

strlen is instantaneous.

strcpy can use block moves like memcpy.

etc etc.

Paul Hsieh · Jul 31, 2009

jacob said:
jacob said:

Lew Pitcher wrote:
[snip]

What the OP complains about (his direct complaint) is the result of a
failure to validate, and that can happen in any language.

Click to expand...

Click to expand...

Yes bugs can happen in any language.

Click to expand...

I believe misinterpreted NUL termination is unique to the C language.

And what *you* refuse to acknowledge is that a change in the
language will not solve the problems, whatever they are.

And that's why programmers still use the original K&R function
declaration style to this very day ... NOT!

Have you been paying attention to the rapidity of adoption of
the C99 Standard, now approaching its tenth birthday? How much
existing C code did that Standard invalidate? Very little, I think:
People who'd been using `restrict' and `inline' as identifiers may
have been discommoded a bit, but were probably a tiny minority.
Similarly, people who'd already written their own functions named
cacos() and cabs() and so on were no doubt disappointed when C99
expropriated the names; this was probably a larger group than the
first, but still a small one. Yet C99's adoption rate has been
"Slow as molasses running uphill in January," as an old saying goes.

I put it to you, Jacob, that a hypothetical Cxx Standard that
invalidated nearly every existing C program would never be adopted
by anybody at all.

Really? Why do you think that? The Microsoft compiler already
complains at you if you use half of the C library (not that I agree
with their approach, but there's a clear itch being scratched here).
The gcc linker yells at you for using gets(). If the language
actually went in the direction developers with real issues are trying
to go maybe the standard *WOULD* be adopted! Has it not occurred to
you that the reason people didn't adopt C99 is because C99 is in of
itself worthless? It is just impossible to believe that the ANSI C
committee are just a bunch of failures who have received their report
card in the form of lack of adoption of their standard?

[...] It would be a non-starter, dead on arrival, an
object of derision.

Yeah, just like C++ or HTML 2, 3, 4 ...

The failure of adoption has to do with the content of the standard. If
the standard actually had real value in it, I am sure it would be
taken more seriously.

[...] People faced with a choice between spending
their development budget on writing new programs and enhancing old
ones or spending it on retrofitting their existing programs to a new
Standard for no discernible benefit[*] will not spend money on the
latter; they'll keep on using their C99 or C90 compilers, and will
spend a vastly smaller sum on coping with whatever problems ensue.

[*] No, none. Changing a program, for whatever reason, involves
cost and risk. If it's a program that's believed to be working as
intended, has no known bugs that can't be tolerated, and is in all
ways behaving satisfactorily, it will be left alone. Nobody is going
to check the code out, go through the exercise of adapting it to a
new string scheme, check the changed code back in again, and spend
still more money trying to prove nothing got broken in the process,
just so they can have a warm fuzzy feeling about their strings. I
deal with a third party who's still using an ancient compiler that's
so far beyond its end-of-life that we won't offer support at any
price, simply because the program they build with it is working and
they don't want to bother porting it forward to this millennium. I
believe the people in that organization are in no way unusual, but
are simply allocating their resources to more important matters.

Oh that's right. Because every developer in the universe works for
Lockheed Martin. Web browsers and DNS servers are not 40 years old and
many are open source. The dynamics are not as you suggest.

Perhaps the notion is that Cxx would not eliminate zero-terminated
strings, but would offer something else as an alternative, something
that could be used in newly-written code. Fine, but does it solve
the problem that prompted you to start this thread?

I believe it does.

[...] I don't see how.
First, it would do nothing at all for the existing code that already
mangled the existing data.

Except that such code would be known to be "legacy". And if you have
a competitor that has written a better maintained alternative, then
there you go. You don't see a lot of Windows 98 virus announcements
these days do you? So long as there is pressure to upgrade, that
might come *BECAUSE* you can identify code as legacy versus modern,
then this problem gets addressed by market forces.

[...] Second, code is much more often written
to operate with other existing code than written ab origine and in
isolation, which means that there must be a bridge of some kind to
allow the new and old representations to coexist, and in fact to
interoperate. The safety (if achieved) of the new strings disappears
as soon as you push them over into the old-string realm and pull them
back again; they may have been polluted or otherwise mistreated while
abroad.

Not according to the RFC that Lew Pitcher posted (with the full
grammar for some reason). A modern library would allow you to cleanse
string characters more naturally and easily than the pain you endure
trying to implement anything on NUL terminated strings.

[...] Finally, we have it on the authority of one J.N. that nobody
does new development in C anyhow.

And do you think maybe that might have something to do with the state
of the standard? If you give people complex numbers when they are
asking for safer coding libraries with easier or better memory
management is it any wonder that people are eager to look for
alternatives? restrict and inline are clever, but now that I see how
they are used, its clear that they were just short sighted ways for
some system vendors to try to win some benchmark that matters to
nobody. Compare this to the language innovation and ideas going into
Python, Java, Lua, C++, Ruby, Haskell, Erlang, etc.

Your suggestions might lead to a more robust language (or might not,
but let's just suppose). But that doesn't solve any existing problems,
not until somebody can actually use your MRL and use it without
compromising the robustness it supposedly offers. You're not offering
a solution; you're offering -- well, I said it earlier: A complaint.

Well you offer incorrect analysis which is even less.

Flash Gordon · Jul 31, 2009

jacob said:
jameskuyper said:

Lew said:

On July 31, 2009 13:17, in comp.lang.c, jacob navia ([email protected])
wrote:

[snip]
(2) I am not selling anything there. lcc-win (and the source code
of the string library that it contains) are distributed free of charge.

You can read?

In case you don't, here is it again in BIG LETTERS (so that you can
take
your glasses off if you want)

FREE OF CHARGE!
From your website:

License:

This software is not freeware, it is copyrighted by Jacob Navia. It's
free for non-commercial use, if you use it professionally you have to
have to buy a licence.

Professional use is:
* Related to business (e.g you use it in a corporation)
* If you sell your software.

Let me repeat the key phrase: "If you use it professionally, you have
to buy
a licence". That doesn't look free to me. That looks like a paid-for
product, no different than MS Visual C++ (which you only buy a licence
for). That makes you a vendor, sir.

Click to expand...

It could be just a language issue. He's certainly got something (his
compiler, for commercial use) up for sale. However, when he says "I'm
not selling anything", it could be that what he means is that he
hasn't had any commercial users who were willing to pay for it.

Click to expand...

I am not different from Cygwin, that asks for thousands of dollars for
the "right" to use their stuff if you do not disclose your source code
i.e. for professional users

Yes you are. I can sell software I've compiled using Cygwin as long as I
don't link it to the cygwin dll (by using their mingw stuff, for
example) OR if I'm willing to let customers have the code under GPL
(which some companies do). I can't do this with your compiler.

and not different from redhat that asks for US$ 20 000 for technical
help from a gcc professional (year technical support license)
5support contracts for professional users)

Yes it is, I can and do use gcc on a RedHat distribution without paying
them that amount, and yes this IS for commercial closed source software.
In fact, I can use Fedora and the gcc that comes with it completely for
free!

and not different from Suse, that sells their stuff too.

Same applies as for RedHat.

and not different from all other products that live from professional
users that allow them to keep expenses down!

AND SO WHAT?

The SO WHAT, is that for most of us here using your compiler would NOT
be free, because we would have to buy a license from you. So your
repeated claim that it IS free is disingenuous at best!

There is NOTHING wrong with you selling your compiler, the problem is
you claiming it is free to people who would have to pay to use it or be
in breach of your license terms.

Does my license make zero terminated strings any better?

YOU HAVE NO ARGUMENTS Mr. kuyper, that is why you HAVE
to get personal and to start personal attacks.

You have nothing ELSE to argue technically.

Presenting Paul's library or one of the others that really IS free
(including to commercial users, and since your library REQUIRES your
compiler, it is effectively not free for commercial use because a
licence for your compiler needs to be bought to use it) would have been
a better option if you wanted to suggest a free alternative.

Personally I don't think adding operator overloading to C would be a
good thing, for reasons I've stated before. Whether some string library
that does not rely on it would be a useful addition to the standard C
library is another matter. Selecting one which has a free (even for
closed-source commercial use) implementation written in standard C would
have a big advantage, because an individual could add it to their
implementation (by simply compiling it and linking to it) without having
to wait for vendors to catch up! Also vendors could catch up quickly by
simply incorporating that free library!

Oh, and I have seen equivalent attacks which used methods other than an
embedded string termination character. For a laugh, many years ago, I
worked out some attacks on the BBC Micro which worked by embedding
control characters in file names, so when you did a directory listing
what you saw was not actually quite what was on the disk! I can't
remember the details of what I did (and I never released it) since it
was too many years ago. However, I have seen more recent examples used
in real attacks, such as the old double extension trick.

bartc · Jul 31, 2009

Eric Sosman said:
jacob navia wrote: ....
I put it to you, Jacob, that a hypothetical Cxx Standard that
invalidated nearly every existing C program would never be adopted
by anybody at all. It would be a non-starter, dead on arrival, an
object of derision.

This doesn't seem to bother the Python people. I think Python 3.x is not
backwards compatible with 2.x.

2.x is still available for those who don't want to change. 3.x is available
to those who want to write new code using the latest language without any
obsolete baggage. And I believe conversion tools are available for those who
want to upgrade existing code.

Contrast with C...

jacob navia · Jul 31, 2009

Paul said:
Really? Why do you think that? The Microsoft compiler already
complains at you if you use half of the C library (not that I agree
with their approach, but there's a clear itch being scratched here).
The gcc linker yells at you for using gets(). If the language
actually went in the direction developers with real issues are trying
to go maybe the standard *WOULD* be adopted! Has it not occurred to
you that the reason people didn't adopt C99 is because C99 is in of
itself worthless? It is just impossible to believe that the ANSI C
committee are just a bunch of failures who have received their report
card in the form of lack of adoption of their standard?

This was exactly what I have argued in another message in this same
thread. The lack of adoption of C99 is its failure to address the
real issues of the language and add a lot of stuff that looks completely
irrelevant to developers today.

Nothing about

o The C library is too old and full of CRAP. gets() is an example
but there are others, like asctime() etc.

o The zero terminated strings are error prone and unusable. Still
they will fight any proposition as if the language was tied for
all eternity to that outdated data structure.

o Multi threading? Nothing at all.

o New operations like clamped addition etc? Nothing.

o Vector operations? Zero.

The failure of adoption has to do with the content of the standard. If
the standard actually had real value in it, I am sure it would be
taken more seriously.

Exactly.

[...] Finally, we have it on the authority of one J.N. that nobody
does new development in C anyhow.

Click to expand...

I have never said that!

This guy is putting words on my mouth.

jacob navia · Jul 31, 2009

Richard said:
"C is for dummies, C++ is for real programming, as everybody
should know by now. C should be kept in the basement and made
obsolete as fast as possible." - Jacob Navia, 2004

You are misquoting. I as trying to FIGHT against that attitude
and I quote it on an ironic tone.

"The fact is that no new development is done in C" - Jacob Navia, 2004

"This makes C an obsolete language, i.e. one where no new development
is conceivable." - Jacob Navia, 2007

That is, I was trying to AVOID that.

You are just misquoting me in bad faith (as always)

spinoza1111 · Aug 1, 2009

Yes, but the trouble is Jacob, you have a bad reputation, mainly due to
having the interpersonal skills of a borg and Nero's ability to take

What are "interpersonal skills"? Upon investigation one finds that
that word is corporatese for a superficial *schlamperei*. It has
nothing to do with genuine friendliness: quite the opposite. It means
being able to find the local authority and to be subservient to that
authority, while seeking out safe targets to bully or ostracise.

FT Baker was the IBM programmer who developed the automatic "morgue"
or filing system for news articles for IBM in record time in the
1970s, consciously applying then-new structured techniques and getting
it done on time and under budget. He published an article on this
accomplishment only for a letter to the editor of the journal accusing
him of being "self-serving". He'd failed in other words to manifest
that unpleasant combination of fawning humility and self-abnegation,
with willingness to bully the powerless, which most Korporate types
exhibit.

Navia (and Herbert Schildt) are primarily hated because they have
chops as the single authors of programs and in Schildt's case, books.
They aren't little "team players" who fawn upon Power.

This makes them targets.

criticism. Its like asking people to trust Microsoft on security issues,
or the TV Evangelists on morality.

In other words, if a person has clear and distinct views, he's "hard
to get along with" and "like" Microsoft or a TV evangelist. He can be
pictured, imaged, and filed away. Therefore the goal is to be, like
the younger Stalin, a sort of black hole or force field without a
strong POV. In the case of C, it is to adopt an uncritical view which
allows the language to self-reproduce, and propagate.

Weary of being human, people prefer to be programming languages in the
sense that they brook no criticism of their language.

Keith Thompson · Aug 1, 2009

jacob navia said:
You are misquoting. I as trying to FIGHT against that attitude
and I quote it on an ironic tone.

[snip]

jacob, I say this with no disrespect intended, though I don't
expect you to believe that. You are not good at irony and sarcasm.
In my opinion, your communications here would be much improved if
you did not attempt them. Those who disagree with you are quite
capable of stating their own views; your attempts to parody those
views are neither successful nor helpful.

Keith Thompson · Aug 1, 2009

Mark McIntyre said:
Because you are advocating removing null-terminated strings from the
Standard. Existing code would therefore no longer compile with a Cxx
compiler.

Do you understand this?

I believe he's advocating adding counted strings to the language
while leaving null-terminated strings in place.

Kaz Kylheku · Aug 1, 2009

Zero terminated strings are a continuing security nightmare.

Slashdot reports this today:

"Two researchers, Dan Kaminsky and Moxie Marlinspike, came up with exact
same way to fake being a popular website with authentication from a
certificate authority.

Wired has the details: 'When an attacker who owns his own domain â€”
badguy.com â€” requests a certificate from the CA, the CA, using contact
information from Whois records, sends him an email asking to confirm his
ownership of the site. But an attacker can also request a certificate
for a subdomain of his site, such as Paypal.com\0.badguy.com, using the
null character \0 in the URL.

Obviously, this bug was caused by idiots who thought that they could solve some
imaginary problem by using a ``better'' string library that can represent a
null byte in the middle of a string.

A null byte has absolutely no place in character (i.e. text) strings. If an
array of bytes contains nulls, it's not a character string, but a binary
string, or blob if you will. Null is not really a character, basically. It has
no glyph, and no signaling action for printing control.

There is no legitimate need, ever, in a data representation for text, to
support an embedded null byte. It's not text; it's a special code which says
``I am not text''. So, implicitly, if a null byte follows text, it means either
that the text has ended, or the text is corrupt with the repugnant inclusion of
non-text data.

The moral of this story is that if your language or string library allows nulls
in the middle of a string, it's wrong, and you should fix it such that the null
is treated as a terminator, or such that an exception is triggered if it
occurs.

There are good reasons for working with strings in a representation other than
the null-terminated array, but being able to represent a null in the middle of
a string is not one of those good reasons. Strings that know their own length
should still banish the null byte from being a constituent.

Anand Hariharan · Aug 1, 2009

Zero terminated strings are a continuing security nightmare.
(...)

(Note that C++ uses zero terminated strings too)

~/code$ cat NulTermStr.cc
#include <string>
#include <iostream>

int main()
{
std::string StrWithNul = "Paypal.com";
StrWithNul += '\0';
StrWithNul += ".badguy.com";

std::cout << StrWithNul << std::endl;
std::cout << StrWithNul.c_str() << std::endl;

return 0;
}

~/code$ g++ -Wall -W -ansi -pedantic NulTermStr.cc -O2 -o NulTermStr
~/code$ ./NulTermStr
Paypal.com.badguy.com
Paypal.com
~/code$

robertwessel2 · Aug 1, 2009

Obviously, this bug was caused by idiots who thought that they could solve some
imaginary problem by using a ``better'' string library that can represent a
null byte in the middle of a string.

A null byte has absolutely no place in character (i.e. text) strings. If an
array of bytes contains nulls, it's not a character string, but a binary
string, or blob if you will. Null is not really a character, basically.. It has
no glyph, and no signaling action for printing control.

There is no legitimate need, ever, in a data representation for text, to
support an embedded null byte. It's not text; it's a special code which says
``I am not text''. So, implicitly, if a null byte follows text, it means either
that the text has ended, or the text is corrupt with the repugnant inclusion of
non-text data.

The moral of this story is that if your language or string library allows nulls
in the middle of a string, it's wrong, and you should fix it such that the null
is treated as a terminator, or such that an exception is triggered if it
occurs.

While I might accept that nulls have no place in *text* it's a pretty
big jump from text to strings. The basic CS definition of a string is
a sequence of symbols, although often a distinction is made between
text and binary strings.

Consider the simple C statement:

char s[10]="abc\0def";

You can't write a C compiler using normal C style strings to compile
that.

Jacob is abrasive, but he's not wrong on this: C style strings are a
poor general purpose tool. They're fine for some stuff, but they're
far to easy to use in places where they shouldn't be, not least
because there's not much of alternative, and they *look* like they're
a general purpose tool.

Falcon Kirtaran · Aug 1, 2009

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

jacob said:
Zero terminated strings are a continuing security nightmare.

Slashdot reports this today:

"Two researchers, Dan Kaminsky and Moxie Marlinspike, came up with exact
same way to fake being a popular website with authentication from a
certificate authority.

Wired has the details: 'When an attacker who owns his own domain —
badguy.com — requests a certificate from the CA, the CA, using contact
information from Whois records, sends him an email asking to confirm his
ownership of the site. But an attacker can also request a certificate
for a subdomain of his site, such as Paypal.com\0.badguy.com, using the
null character \0 in the URL.

The CA will issue the certificate for a domain like
PayPal.com\0.badguy.com because the hacker legitimately owns the root
domain badguy.com. Then, due to a flaw found in the way SSL is
implemented in many browsers, Firefox and others theoretically can be
fooled into reading his certificate as if it were one that came from the
authentic PayPal site. Basically when these vulnerable browsers check
the domain name contained in the attacker's certificate, they stop
reading any characters that follow the "\0 in the name.'"

And still we will hear the same old arguments from the same
people again and again...

There is nothing wrong

C is like that

etc etc.

(Note that C++ uses zero terminated strings too)

In no way is this a problem with C. If it is not trusted that a null
character represents the end of a string, the length should be checked.
This is true of any delimiter.

- --
- --Falcon Darkstar Christopher Momot
- --
- --OpenPGP: (7902:4457) 9282:A431

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.9 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkpz5ioACgkQeQJEV5KCpDFOXgCbBAZg3SebF8BeayFfjfkOBjS+
V08Amweu0dk0SHYv9js5qIBPitgpkBYT
=ZPJe
-----END PGP SIGNATURE-----

Flash Gordon · Aug 1, 2009

Richard said:
(e-mail address removed) said:

Consider the simple C statement:

char s[10]="abc\0def";

You can't write a C compiler using normal C style strings to compile
that.

Click to expand...

Why on earth not? (Incidentally, I compiled it just fine using gcc
just now.)

He said *write* a compiler, not compile with a compiler. The naive wait
to write the compiler would be to load the characters between the quotes
(after dealing with escaping) in to a C string. When your compiler then
naively (and incorrectly) check the string length, or writes it to the
object file or whatever, the compiler gets it wrong.

Of course, gcc does not do what I just described, so it does not simply
use a null-terminated string to hold string literals in memory during
compilation.

Um, yes, he is. He is claiming that the C feature of null-terminated
strings is responsible for the certification authority bug he
mentioned, which is clearly not true.

I agree that the root cause of this is lack of validation of untrusted
data, both at the CA and in the browser. Both should have rejected it on
the basis of non-printable characters (the CA for non-printable
characters in the request, the browser for non-printable characters in a
text field). Actually it should check for the data containing only valid
characters, which is a smaller set than printable characters.

Another thing the attackers could do (using automated domain
registration systems) which might work on some browsers is create a
domain called "l<bs><bs><bs>.com" where <bs> represents the backspace

They have advantages and disadvantages. I do not dispute the utility
of a counted string data structure, and I hope nobody here does. But
this is not the forum in which to get the language changed.

Agreed.

arnuld · Aug 1, 2009

Richard, the assertion that "every programmer except me is careless"
is a sign of a narcissitic personality disorder. You believe that YOU
will never write buggy code and to shore up this narcissistic belief,

I have hanging on comp.lang.c over a longer time and based on my
experience here, the above statement is completely false. Your
analysis of Richard Heathfield's programming ideas is way off the
reality. Perhaps you need to be more fundamental to C's technical
details in order to understand what *exactly* Heathfield wants to say
when he says something about someone's code.

Read C. Wright Mills' White Collar. Each little white collar employee
believes himself special, and blames his workmates for problems rather
than pitching in to fix the problem as do most blue-collar employees.
Rather than take responsibility for his own poor decisions,

I think I learned exactly the opposite from Richard (and from CLC).
Take responsibility of your code. Why someone else has to pay for your
mistakes. I even wrote entire Coding Guidelines for my company based
on what I have learned.

You're a twat and a **** of the first water, and a Mean Girl.

History has taught me that average biological students and
academicians always had trouble understanding Charles Darwin's ideas.
His ideas were accepted after 100 years of his death. Perhaps you need
time to understand the ANSI C.

Flash Gordon · Aug 1, 2009

Gareth said:
That depends. It's probably true for long strings.

If your string implementation looks roughly like

struct string
{
unsigned len;
char * str;
}

then for short strings, looking up the length could easily cause a
cache miss -- or even a page fault -- depending on memory access
pattern and string accesses require an extra level of indirection.
NULL terminated strings have guaranteed memory locality.

A more likely structure would be...

struct string
{
unsigned len; /* possibly size_t rather than unsigned */
char str[];
}

So the length is guaranteed to be a few bytes before the start of the
character data. So memory locality should not be a problem unless your
pages are very short indeed!

If you have lots of short strings (say you are trying to load a massive
dictionary in to memory, one string per work) the memory overhead of the
length could become more significant.

Other issues exist, and solutions to them also exist, but tend to add to
the complexity of the string handling library.

robertwessel2 · Aug 1, 2009

(e-mail address removed) said:

Consider the simple C statement:

Click to expand...

char s[10]="abc\0def";

Click to expand...

You can't write a C compiler using normal C style strings to compile
that.

Click to expand...

Why on earth not? (Incidentally, I compiled it just fine using gcc
just now.)

The C compiler cannot internally store (and usefully be able to
retrieve) that C string literal in (from) a C string. It will have to
implement some other type of string representation.

Um, yes, he is. He is claiming that the C feature of null-terminated
strings is responsible for the certification authority bug he
mentioned, which is clearly not true.

They have advantages and disadvantages. I do not dispute the utility
of a counted string data structure, and I hope nobody here does. But
this is not the forum in which to get the language changed.

I was clearly referring the Jacob's general claim that C style strings
have significant problems. Hence the single sentence and colon in
what I wrote.

I agree that this is not the best forum for discussing language
changes, and it's likely moot anyhow. Jacob fails to acknowledge that
by far the biggest value that C has is its ubiquity. For larger
systems, C++ has inherited a large part of that, and has already
solved most of the problems Jacob is trying to solve. Maybe it's not
perfect, but hey, we're talking about C and a language derived from C
here... And for smaller systems, Jacob hasn't engaged them at all –
after all a lot of those folks want no part of dynamic allocation or
anything like garbage collection. So I have no real clue who he's
pitching this at.

That said, a decent string library based on counted strings and some
minimal language support (you might be able to reduce it to a new
string literal format that created a counted string, IOW something
like C”abc”) wouldn’t be a bad thing (in addition to the current
stuff), but unless it’s standard enough to be widely implemented, it’s
simply irrelevant. Boost is somewhat instructive here – a fair chuck
of stuff in the next C++ standard has been proven in Boost.

Jacob’s desire to improve C is commendable, but he needs to get some
actual users on board, and that means something beyond his compiler.

Zero Byte Terminated Strings	10	Mar 28, 2007
Working with NON-NULL terminated strings	4	Jul 14, 2007
strncpy() and null terminated strings	4	Apr 8, 2004
Reading null terminated strings in Java	9	Feb 4, 2009
Exact Arithmetic and Strings	4	Jul 13, 2010
Null-terminated strings with struct module?	2	Mar 6, 2004
Null character and JavaScript strings	16	Mar 4, 2011
FAQ 6.23 How can I match strings with multibyte characters?	0	Jan 11, 2011

Zero terminated strings

jacob navia

Chris M. Thomasson

jacob navia

jacob navia

Paul Hsieh

Flash Gordon

bartc

jacob navia

jacob navia

spinoza1111

Keith Thompson

Keith Thompson

Kaz Kylheku

Anand Hariharan

robertwessel2

Falcon Kirtaran

Flash Gordon

arnuld

Flash Gordon

robertwessel2

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads