[snips]
It's not clear that length delimited strings are faster than
terminated strings in all cases. [...]
Actually its clear to anyone who has measured it.
Really. Explain how such strings will improve the speed of strchr.
Using the conventional approach, each operation involves the following:
Dereference the pointer
Compare value against \0
If comparison fails, compare value against "needle"
If comparison fails, increment pointer and loop
With an integer-stored-length string variant, the operation is more like:
Compare index to length
if comparison fails, dereference pointer+index
Compare value against "needle"
If comparison fails, increment index and loop
One might argue that comparing the index to the length might be faster
than comparing the characters, but if so it should be pointed out that the
base+index computation may well be slower than the pointer dereference; it
is not clear that either is guaranteed to be a performance win on all
possible implementations.
Pedantic, perhaps, but the statement involved "all cases", not just some.
[...] It depends on what you do with them. Using plain C strings, it's
often possible to remember the length rather than recomputing it.
Remember it? Remember it where? In a variable somewhere? How about
just holding it in another field instead? Wait! That's exactly what
length delimited strings are isn't it?
Yes, but that involves additional computation with any string-modifying
operation, when such computation may only be necessary - or even desirable
- in a small number of cases.
A commonly used example is strcat():
char big_string[BIG_ENOUGH];
Of course this may lead to a load-time error. But that's never
discussed.
Depends how large BIG_ENOUGH is; if it's 256 bytes, it is unlikely to
cause problems. If it's 256K, there's no expectation the code will work
at all, except in a limited number of cases.
And these of course are just buffer overflows waiting to happen anyways.
Assuming one doesn't know how long foo and bar are. Generally, when I
write my code, I actually pay attention to such details.
You are also ignoring the fact that each character from foo and bar are
checked against '\0' for no really good reason. Block copying
mechanisms from the underlying platform are not available. So *both*
the strcpy and strcat functions are potentially slower than they need to
be.
Sure; length-specified strings can be faster in some cases. I think
you'll find it hard to prove they're faster in all cases, as you assert.
And it *ALWAYS* increases danger, because there are no language
semantics or assistance available to you to keep bigs_string and foo_len
in synch.
Umm... so? It doesn't take any particular genius to load a block of data
from a file, for example, then compare length of input + length of
existing data to length of buffer.
Its horrible in this case because what happens if you decide
you want to insert a strcat (big_string, "|"); in between those two
lines?
Then you increment your size by one.
And of course, in a discussion about performance you completely miss
the real performance opportunity:
memcpy (big_string, foo, foo_len);
strcpy(big_string + foo_len, bar);
Which, of course, should generally go faster, but does nothing to
alleviate the danger of buffer overflows or maintenance issues.
Hmm. This would seem to require either that memcpy be replaced with a
function which has innate knowledge of the new length-managed string type,
or exposing the string data itself to the outside world - outside the
control of the built-in string functions. This means we can modify the
buffer directly - potentially making the length field completely invalid.
I the point here is to prevent bad programming practices - the sort that
lead to buffer overflows - this strikes me as a not overly good approach.
You need to keep the entire thing in an opaque type, but if you do, this
removes the option of doing the memcpy above.
Besides arguing incorrectly, you argue by straw man:
1) With length delimited string you can perform unrolled string
searching algorithms without testing each character for '\0'.
You have to compare the index to the length, instead. Replacing one
integer comparison with another doesn't seem all that significant.
Furthermore if the two strings are of similar length, and they
completely mismatch, then you avoid any comparisons which would take the
tail of the search string beyond the tail of the string you are
searching in.
Not sure what that means. Example:
a_str = "abcde";
b_str = "fghijkl";
Similar strings, complete mismatches. Comparison of the strings involves
comparison of _one_ character: 'a' is not equivalent to 'f', so the
strings don't compare, so why compare further? Or perhaps you mean
strstr, rather than strcmp? In which case:
a_str = "xxx";
b_str = "abcdef";
Similar lengths, complete mismatch... but a maximum of four character
comparisons, plus determination of length, are required to determine this.
In the case of long strings, the gains could be more significant, but if
you're comparing long strings on a regular basis, I'd tend to think Boyer
Moore or the like would be more appropriate, and these perform quite well;
your only gain would be in calculating end-of-string, and that's assuming
the code doesn't already have such information available.
2) String comparison includes length checking which speeds up the
typical scenario of long prefix matching for different strings.
Assuming one doesn't already have such information available. If I'm
writing code to process thousands of strings, each potentially thousands
or even tens of thousands of characters long, I'm _already_ going to be
recording string lengths, and using modified algorithms which make use of
this information.
3) Operations such as insert and delete can be done in one pass.
4) And as long as we talking about "clever tricks", any block-based
algorithms (such as is possible for things like "toupper" or "tolower"
or character scanning) become unavailable unless the length is known.
Yes, yes, again, some operations can be improved. Even most. However,
once again, the key concept was "all cases". It has not been demonstrated
that this mechanism would, in fact, improve all cases. The simple example
of strchr, for example, throws the notion in doubt.