Ben said:
CBFalconer said:
(e-mail address removed) wrote:
CBFalconer wrote:
... snip ...
The last time I took an (admittedly cursory) look at Bstrlib, I
found it cursed with non-portabilities
You perhaps would like to name one?
I took another 2 minute look, and was immediately struck by the use
of int for sizes, rather than size_t. This limits reliably
available string length to 32767.
[snip]
[...] I did find an explanation and
justification for this. Conceded, such a size is probably adequate
for most usage, but the restriction is not present in standard C
strings.
Your going to need to conceed on more grounds than that. There is a
reason many UNIX systems tried to add a ssize_t type, and why TR 24731
has added rsize_t to their extension. (As a side note, I strongly
suspect the Microsoft, in fact, added this whole rsize_t thing to TR
24731 when they realized that Bstrlib, or things like it, actually has
far better real world safety because its use of ints for string
lengths.) Using a long would be incorrect since there are some systems
where a long value can exceed a size_t value (and thus lead to falsely
sized mallocs.) There is also the matter of trying to codify
read-only and constant strings and detecting errors efficiently
(negative lengths fit the bill.) Using ints is the best choice
because at worst its giving up things (super-long strings) that nobody
cares about,
I think it's fair to expect the possibility of super-long strings in a
general-purpose string library.
Ok, so you can name a single application of such a thing right?
What anomalies? Are these a consequence of using signed long, or
size_t?
I am describing what int does (*BOTH* the encoding scenarios and
avoiding anomolies), Using a long int would allow for arithmetic on
numbers that exceed the maximum value of size_t on some systems (that
actually *exist*), so when there was an attempt to malloc or realloc on
such sizes, there would be a wrap around to some value that would just
make it screw up. And if I used a size_t, then there would be no
simple space of encodings that can catch errors, constants and write
protected strings.
If you only need a single "special" marker value (for which you were
perhaps using -1), you could consider using ~(size_t) 0.
For the mlen, I need one value that indicates a write protected string
(that can be unprotected) and one the indicates a constant (that can
never be unprotected). The slen has to be of the same type as mlen,
and so in order to check for potential errors, I set it to -1 to
indicate that it has been deterministically set to an invalid value.
Of course I could just isolate a handful of values, however, but this
makes the error space extremely small, which reduces your chances of
finding accidental full corruptions, and removes a useful debugging
mechanism (where you could pass around useful information through
negative values.)
Things will go wrong for at most one possible string length, but that's
more than can be said for using int.
Huh? You *WANT* more erroneous scenarios. You want the mechanism to
require a somewhat tighter form of correctness, with it otherwise
leading to the thing stopping or other feeding back detectable errors.
If you have only a small error trap, random behaviour will not fall
into it.
But whatever the difference in efficiency, surely correctness and safety
first, efficiency second has to be the rule for a general-purpose
library?
It *IS* correct and safe. (And its fast, and easy to use/understand
and powerful and portable and secure ...) What are you talking about?
I'll take it you have never tried to use or understand Bstrlib either.