Boost process and C

R

Rod Pemberton

Ian Collins said:
That kind of assumes everyone uses windows....

Untrue, I learned of it when programming on Stratus Continuum's in PL/1...
In fact, I doubt anyone knows the original authors names to LCC either...


Rod Pemberton
 
W

websnarf

Ben said:
CBFalconer said:
(e-mail address removed) wrote:
CBFalconer wrote:
... snip ...
The last time I took an (admittedly cursory) look at Bstrlib, I
found it cursed with non-portabilities

You perhaps would like to name one?

I took another 2 minute look, and was immediately struck by the use
of int for sizes, rather than size_t. This limits reliably
available string length to 32767.
[snip]
[...] I did find an explanation and
justification for this. Conceded, such a size is probably adequate
for most usage, but the restriction is not present in standard C
strings.
Your going to need to conceed on more grounds than that. There is a
reason many UNIX systems tried to add a ssize_t type, and why TR 24731
has added rsize_t to their extension. (As a side note, I strongly
suspect the Microsoft, in fact, added this whole rsize_t thing to TR
24731 when they realized that Bstrlib, or things like it, actually has
far better real world safety because its use of ints for string
lengths.) Using a long would be incorrect since there are some systems
where a long value can exceed a size_t value (and thus lead to falsely
sized mallocs.) There is also the matter of trying to codify
read-only and constant strings and detecting errors efficiently
(negative lengths fit the bill.) Using ints is the best choice
because at worst its giving up things (super-long strings) that nobody
cares about,

I think it's fair to expect the possibility of super-long strings in a
general-purpose string library.

Ok, so you can name a single application of such a thing right?
What anomalies? Are these a consequence of using signed long, or
size_t?

I am describing what int does (*BOTH* the encoding scenarios and
avoiding anomolies), Using a long int would allow for arithmetic on
numbers that exceed the maximum value of size_t on some systems (that
actually *exist*), so when there was an attempt to malloc or realloc on
such sizes, there would be a wrap around to some value that would just
make it screw up. And if I used a size_t, then there would be no
simple space of encodings that can catch errors, constants and write
protected strings.
If you only need a single "special" marker value (for which you were
perhaps using -1), you could consider using ~(size_t) 0.

For the mlen, I need one value that indicates a write protected string
(that can be unprotected) and one the indicates a constant (that can
never be unprotected). The slen has to be of the same type as mlen,
and so in order to check for potential errors, I set it to -1 to
indicate that it has been deterministically set to an invalid value.
Of course I could just isolate a handful of values, however, but this
makes the error space extremely small, which reduces your chances of
finding accidental full corruptions, and removes a useful debugging
mechanism (where you could pass around useful information through
negative values.)
Things will go wrong for at most one possible string length, but that's
more than can be said for using int.

Huh? You *WANT* more erroneous scenarios. You want the mechanism to
require a somewhat tighter form of correctness, with it otherwise
leading to the thing stopping or other feeding back detectable errors.
If you have only a small error trap, random behaviour will not fall
into it.
But whatever the difference in efficiency, surely correctness and safety
first, efficiency second has to be the rule for a general-purpose
library?

It *IS* correct and safe. (And its fast, and easy to use/understand
and powerful and portable and secure ...) What are you talking about?
I'll take it you have never tried to use or understand Bstrlib either.
 
G

Giorgos Keramidas

without operator overloading, how about just an infix notation
for 2-ary functions (with, e.g., functions evaluated left to right,
all with the same priority) ?

typedef struct Vect { double x, y; } Vect;

infix Vect Vect_Sub (Vect u, Vect v) {
return (Vect) { .x= u.x - v.x, .y= u.y - v.y };
}
infix Vect Vect_Scale (double lambda, Vect u) {
return (Vect) { .x= lambda*u.x, .y= lambda*u.y };
}
infix double Vect_Dot (Vect u, Vect v) {
return u.x * v.x + u.y * v.y;
}
int main (void) {
Vect u, v, w, p, q, r, s, t;
...
t= ((v Vect_Sub u) Vect_Dot (w Vect_Sub v))
Vect_Scale (p Vect_Sub q Vect_Sub r Vect_Sub s);
...
}

No, please. This looks strangely familiar if you know LISP :p

Plus, it doesn't really work for functions with an arbitrary number of
arguments, and this creates an inconsistency in the elegantly simple
syntax of C.
 
B

Ben C

Why? And why do you think objects of user-defined types have to be
"allocated and freed manually"?

They don't _have_ to be, but they _might_ be.

One of the "features" of C is that the programmer has control over
memory allocation and de-allocation.

Usually in practice this just means a lot of bugs and crashes; but there
are good reasons for it too: you can write domain-specific allocators
that are more efficient and/or tunable in the amount of space or time
they use, instead of relying on a general-purpose allocator or
garbage-collector all the time.

The programmer also might implement things like shallow-copy and
copy-on-write.

Somehow all of these things need to happen when an expression like this
is evaluated:

string a = b + c;

In C++ the basic mechanism you use for this is constructors. For example
the string copy constructor might set up a shallow copy-on-write copy.
Someone has to write the code for that. If the programmer writes it, and
it's not just part of the framework, then it has to get implicitly
called.
struct foo {
int x, y;
};

foo operator+ (const foo& a, const foo& b)
// for it you are of the "I hate references" camp: foo operator+ (foo a,
foo b)
{
const foo z = {a.x + b.x, a.y + b.y};
return z;
}

foo x = {1, 2};
foo y = {3, 4};
foo z = x + y;

simplistic, but no constructors.

Yes exactly, and AFAIK the kind of operator-overloading that has been
proposed for C is something like this-- it's fine for structs
representing things like complex numbers (that are a few words long and
don't contain pointers).

But this is quite limited. You can use it for complex numbers, numbers
longer than the largest machine type, and as has been suggested perhaps
to wrap assembler intrinsics for multimedia instructions.

But you can't easily use it efficiently as it stands for matrices or
strings (which are two other common uses for operator overloading).

On its own it's not enough; with the extra workarounds you need, you end
up with C++ (or some other kind of "octopus made by nailing four extra
legs onto a dog").
 
B

Ben C

[snip]
I think it's fair to expect the possibility of super-long strings in a
general-purpose string library.

Ok, so you can name a single application of such a thing right?

No, but I don't assume that everything I can't name an example of
doesn't exist.
I am describing what int does (*BOTH* the encoding scenarios and
avoiding anomolies), Using a long int would allow for arithmetic on
numbers that exceed the maximum value of size_t on some systems (that
actually *exist*), so when there was an attempt to malloc or realloc on
such sizes, there would be a wrap around to some value that would just
make it screw up. And if I used a size_t, then there would be no
simple space of encodings that can catch errors, constants and write
protected strings.

OK, I think I understand that part now.
If you only need a single "special" marker value (for which you were
perhaps using -1), you could consider using ~(size_t) 0.

For the mlen, I need one value that indicates a write protected string
(that can be unprotected) and one the indicates a constant (that can
never be unprotected). The slen has to be of the same type as mlen,
and so in order to check for potential errors, I set it to -1 to
indicate that it has been deterministically set to an invalid value.
Of course I could just isolate a handful of values, however, but this
makes the error space extremely small, which reduces your chances of
finding accidental full corruptions, and removes a useful debugging
mechanism (where you could pass around useful information through
negative values.)
Things will go wrong for at most one possible string length, but that's
more than can be said for using int.

Huh? You *WANT* more erroneous scenarios.[..]

Sorry, I was unclear; I meant "that's better than you can say of the
situation if you use int".
It *IS* correct and safe. (And its fast, and easy to use/understand
and powerful and portable and secure ...)

I have nothing against Bstrlib.
What are you talking about?

What if int is bigger than size_t?
I'll take it you have never tried to use or understand Bstrlib either.

No I'd never heard of it.
 
F

Flash Gordon

Ben said:
CBFalconer wrote:
(e-mail address removed) wrote:
CBFalconer wrote:
... snip ...
The last time I took an (admittedly cursory) look at Bstrlib, I
found it cursed with non-portabilities
You perhaps would like to name one?
I took another 2 minute look, and was immediately struck by the use
of int for sizes, rather than size_t. This limits reliably
available string length to 32767. [snip]

[...] I did find an explanation and
justification for this. Conceded, such a size is probably adequate
for most usage, but the restriction is not present in standard C
strings.
Your going to need to conceed on more grounds than that. There is a
reason many UNIX systems tried to add a ssize_t type, and why TR 24731
has added rsize_t to their extension. (As a side note, I strongly
suspect the Microsoft, in fact, added this whole rsize_t thing to TR
24731 when they realized that Bstrlib, or things like it, actually has
far better real world safety because its use of ints for string
lengths.) Using a long would be incorrect since there are some systems
where a long value can exceed a size_t value (and thus lead to falsely
sized mallocs.) There is also the matter of trying to codify
read-only and constant strings and detecting errors efficiently
(negative lengths fit the bill.) Using ints is the best choice
because at worst its giving up things (super-long strings) that nobody
cares about,
I think it's fair to expect the possibility of super-long strings in a
general-purpose string library.

Ok, so you can name a single application of such a thing right?

Handling an RTF document that you will be writing to a variable length
record in a database. Yes, I do have good reason for doing this. No, I
can't stream the document in to the database so I do have to have it all
in memory. Yes, RTF documents are encoded as text. Yes, they can be
extremely large, especially if they have graphics embedded in them
encoded as text.
I am describing what int does (*BOTH* the encoding scenarios and
avoiding anomolies), Using a long int would allow for arithmetic on
numbers that exceed the maximum value of size_t on some systems (that
actually *exist*), so when there was an attempt to malloc or realloc on
such sizes, there would be a wrap around to some value that would just
make it screw up. And if I used a size_t, then there would be no
simple space of encodings that can catch errors, constants and write
protected strings.

Is an extra byte (or word, or double word) for a flags field really that
big an overhead?

<snip>
--
Flash Gordon, living in interesting times.
Web site - http://home.flash-gordon.me.uk/
comp.lang.c posting guidelines and intro:
http://clc-wiki.net/wiki/Intro_to_clc

Inviato da X-Privat.Org - Registrazione gratuita http://www.x-privat.org/join.php
 
J

jacob navia

Ben C a écrit :
Yes exactly, and AFAIK the kind of operator-overloading that has been
proposed for C is something like this-- it's fine for structs
representing things like complex numbers (that are a few words long and
don't contain pointers).

But this is quite limited. You can use it for complex numbers, numbers
longer than the largest machine type, and as has been suggested perhaps
to wrap assembler intrinsics for multimedia instructions.

But you can't easily use it efficiently as it stands for matrices or
strings (which are two other common uses for operator overloading).

Why not?

Suppose Matrix A,B,C;

C = A+B;

Your operator + function would allocate the space, add the matrix to a
possible linked lists of matrices that allows to GC unused ones, and
return the results.

Or, instead of taking all this trouble you could just use the GC and
forget about destructors. All intermediate results would be
automatically garbage collected.
On its own it's not enough; with the extra workarounds you need, you end
up with C++ (or some other kind of "octopus made by nailing four extra
legs onto a dog").

The crucial point in this is to know when to stop. There are NO
constructors/destructors in C, and none of the proposed extensions
proposes that.

Besides, I think that using the addition operator to "add" strings is an
ABOMINATION because:

a+b != b+a
"Hello" + "World" != "World" + "Hello"

It just makes NO SENSE. The same thing when you apply the addition
operator to dates: it makes NO SENSE to ADD dates. Only subtraction
makes sense. And yes, multiplying dates is left "as an exercise" for the
fools!

If you feel that operator overloading would not solve the problem for
matrices addition, then you will have to devise other means of doing that.

The GC however, is an ELEGANT solution to all this problems. We would
have the easy of use of C++ with its automatic destructors, WITHOUT
PAYING THE PRICE in language and compiler complexity.

This last point is important: compiler complexity increases the effort
that the language implementor must do and increases the "bug surface".

The module that handles the operator overloading in lcc-win32 is 1732
lines long, including all commentaries and lines that contain just a '{'
or a '}'.

The compiled operators module is 11K machine code. All the extensions of
lcc-win32 are conceptually simple, even if operator overloading is the
most complex one. The others like generic functions are much simpler to
implement.

jacob
 
J

jacob navia

Flash Gordon a écrit :
Is an extra byte (or word, or double word) for a flags field really that
big an overhead?

Well, I have that extra "Flags" field in the string library of
lcc-win32. I have the size as a size_t as you propose, and I need 32
bits for the flags.

The problem is that 32 bits is quite a lot for a few bits info... For
programs that use extensively strings, 32 bits multiplied by several
thousand small strings can make a big difference in RAM used, specially
for the more common short strings.

I see the point of Bstrlib, and it is a very valid design decision.
 
F

Flash Gordon

jacob said:
Flash Gordon a écrit :

Well, I have that extra "Flags" field in the string library of
lcc-win32. I have the size as a size_t as you propose, and I need 32
bits for the flags.

The problem is that 32 bits is quite a lot for a few bits info... For
programs that use extensively strings, 32 bits multiplied by several
thousand small strings can make a big difference in RAM used, specially
for the more common short strings.

I see the point of Bstrlib, and it is a very valid design decision.

I've yet to see software where short strings made up a significant
portion of the memory footprint and saving the memory that avoiding the
flags would be of real use. Of course, such applications might exist.

Personally I would say that using negative lengths was asking for
problems because at some point a negative length will be checked without
first changing it to positive.
 
J

Jordan Abel

Ben said:
CBFalconer wrote:
(e-mail address removed) wrote:
CBFalconer wrote:
... snip ...
The last time I took an (admittedly cursory) look at Bstrlib, I
found it cursed with non-portabilities

You perhaps would like to name one?

I took another 2 minute look, and was immediately struck by the use
of int for sizes, rather than size_t. This limits reliably
available string length to 32767.
[snip]

[...] I did find an explanation and
justification for this. Conceded, such a size is probably adequate
for most usage, but the restriction is not present in standard C
strings.
Your going to need to conceed on more grounds than that. There is a
reason many UNIX systems tried to add a ssize_t type, and why TR 24731
has added rsize_t to their extension. (As a side note, I strongly
suspect the Microsoft, in fact, added this whole rsize_t thing to TR
24731 when they realized that Bstrlib, or things like it, actually has
far better real world safety because its use of ints for string
lengths.) Using a long would be incorrect since there are some systems
where a long value can exceed a size_t value (and thus lead to falsely
sized mallocs.) There is also the matter of trying to codify
read-only and constant strings and detecting errors efficiently
(negative lengths fit the bill.) Using ints is the best choice
because at worst its giving up things (super-long strings) that nobody
cares about,

I think it's fair to expect the possibility of super-long strings in a
general-purpose string library.

Ok, so you can name a single application of such a thing right?
What anomalies? Are these a consequence of using signed long, or
size_t?

I am describing what int does (*BOTH* the encoding scenarios and
avoiding anomolies), Using a long int would allow for arithmetic on
numbers that exceed the maximum value of size_t on some systems (that
actually *exist*), so when there was an attempt to malloc or realloc on
such sizes, there would be a wrap around to some value that would just
make it screw up. And if I used a size_t, then there would be no
simple space of encodings that can catch errors, constants and write
protected strings.

If it's longer than the maximum size_t value, you probably can't have it
anyway, so there's no point in being able to represent it.

Silly encoding tricks buy you nothing, just use another field with bit
flags.
For the mlen, I need one value that indicates a write protected string
(that can be unprotected) and one the indicates a constant (that can
never be unprotected). The slen has to be of the same type as mlen,
and so in order to check for potential errors, I set it to -1 to
indicate that it has been deterministically set to an invalid value.
Of course I could just isolate a handful of values, however, but this
makes the error space extremely small, which reduces your chances of
finding accidental full corruptions,

This shouldn't be left to chance anyway, pretending that it can be
caught invites disaster when inevitably one of the cases comes up when
it _doesn't_ get caught.
 
W

websnarf

Flash said:
Ben said:
CBFalconer wrote:
(e-mail address removed) wrote:
CBFalconer wrote:
... snip ...
The last time I took an (admittedly cursory) look at Bstrlib, I
found it cursed with non-portabilities
You perhaps would like to name one?
I took another 2 minute look, and was immediately struck by the use
of int for sizes, rather than size_t. This limits reliably
available string length to 32767.
[snip]

[...] I did find an explanation and
justification for this. Conceded, such a size is probably adequate
for most usage, but the restriction is not present in standard C
strings.
Your going to need to conceed on more grounds than that. There is a
reason many UNIX systems tried to add a ssize_t type, and why TR 24731
has added rsize_t to their extension. (As a side note, I strongly
suspect the Microsoft, in fact, added this whole rsize_t thing to TR
24731 when they realized that Bstrlib, or things like it, actually has
far better real world safety because its use of ints for string
lengths.) Using a long would be incorrect since there are some systems
where a long value can exceed a size_t value (and thus lead to falsely
sized mallocs.) There is also the matter of trying to codify
read-only and constant strings and detecting errors efficiently
(negative lengths fit the bill.) Using ints is the best choice
because at worst its giving up things (super-long strings) that nobody
cares about,
I think it's fair to expect the possibility of super-long strings in a
general-purpose string library.

Ok, so you can name a single application of such a thing right?

Handling an RTF document that you will be writing to a variable length
record in a database. Yes, I do have good reason for doing this. No, I
can't stream the document in to the database so I do have to have it all
in memory. Yes, RTF documents are encoded as text. Yes, they can be
extremely large, especially if they have graphics embedded in them
encoded as text.

So now name the platform where its *possible* to deal with this, but
where Bstrlib fails to be able to deal with them due to its design
choices.
Is an extra byte (or word, or double word) for a flags field really that
big an overhead?

I need two *bits* for flags, and I want large ranges to catch errors in
the scalar fields (this is a *safe* library). An extra struct entry is
the wrong way to do this because it doesn't help my catch errors in the
scalar fields, and its space inefficient.

ssize_t would have been a reasonable *functional* choice, but its not
standard. size_t is no good because it can't go negative. long int is
no good because there are plenty of real platforms where long int is
larger than size_t. int solves all the main real problems, and as a
bonus the compiler is designed to make sure its the fastest scalar
primitive available.
 
R

REH

Ben said:
They don't _have_ to be, but they _might_ be.

One of the "features" of C is that the programmer has control over
memory allocation and de-allocation.
Yes, C++ has this same "feature." Memory allocation is completely
under control of the programmer.
Usually in practice this just means a lot of bugs and crashes; but there
are good reasons for it too: you can write domain-specific allocators
that are more efficient and/or tunable in the amount of space or time
they use, instead of relying on a general-purpose allocator or
garbage-collector all the time.
C++ does not do GC, nor are you required to use any "general-purpose"
allocator.
The programmer also might implement things like shallow-copy and
copy-on-write.

Somehow all of these things need to happen when an expression like this
is evaluated:

string a = b + c;

In C++ the basic mechanism you use for this is constructors. For example
the string copy constructor might set up a shallow copy-on-write copy.
Someone has to write the code for that. If the programmer writes it, and
it's not just part of the framework, then it has to get implicitly
called.
Yes, the programmer can write a constructor to do this. He does not
have to.
Yes exactly, and AFAIK the kind of operator-overloading that has been
proposed for C is something like this-- it's fine for structs
representing things like complex numbers (that are a few words long and
don't contain pointers).

But this is quite limited. You can use it for complex numbers, numbers
longer than the largest machine type, and as has been suggested perhaps
to wrap assembler intrinsics for multimedia instructions.

But you can't easily use it efficiently as it stands for matrices or
strings (which are two other common uses for operator overloading).

On its own it's not enough; with the extra workarounds you need, you end
up with C++ (or some other kind of "octopus made by nailing four extra
legs onto a dog").

I still don't get your point.

REH
 
W

websnarf

Flash said:
I've yet to see software where short strings made up a significant
portion of the memory footprint and saving the memory that avoiding the
flags would be of real use. Of course, such applications might exist.

Any program that reads words from any language dictionary. Like a
spell checker, or a word puzzle solver/creator, or a spam filter. For
dictionaries the size of the english language dictionary, these kinds
of applications can typically push the L2 cache of your CPU pretty
hard.
Personally I would say that using negative lengths was asking for
problems because at some point a negative length will be checked without
first changing it to positive.

I think you miss the point. If the string length is negative then it
is erroneous. That's the point of it. But the amount of memory
allocated being negative, I use to indicate that the memory is not
legally modifiable at the moment, and being 0 meaning that its not
modifiable ever. The point being that the library blocks erroneous
action due to intentionally or unintentionally having bad header values
in the same test. So it reduces overhead, while increasing safety and
functionality at the same time.

You know, you can actually read the explanation of all this in the
documentation if you care to do so.
 
W

websnarf

Jordan said:
Ben said:
CBFalconer wrote:
(e-mail address removed) wrote:
CBFalconer wrote:
... snip ...
The last time I took an (admittedly cursory) look at Bstrlib, I
found it cursed with non-portabilities

You perhaps would like to name one?

I took another 2 minute look, and was immediately struck by the use
of int for sizes, rather than size_t. This limits reliably
available string length to 32767.

[snip]

[...] I did find an explanation and
justification for this. Conceded, such a size is probably adequate
for most usage, but the restriction is not present in standard C
strings.

Your going to need to conceed on more grounds than that. There is a
reason many UNIX systems tried to add a ssize_t type, and why TR 24731
has added rsize_t to their extension. (As a side note, I strongly
suspect the Microsoft, in fact, added this whole rsize_t thing to TR
24731 when they realized that Bstrlib, or things like it, actually has
far better real world safety because its use of ints for string
lengths.) Using a long would be incorrect since there are some systems
where a long value can exceed a size_t value (and thus lead to falsely
sized mallocs.) There is also the matter of trying to codify
read-only and constant strings and detecting errors efficiently
(negative lengths fit the bill.) Using ints is the best choice
because at worst its giving up things (super-long strings) that nobody
cares about,

I think it's fair to expect the possibility of super-long strings in a
general-purpose string library.

Ok, so you can name a single application of such a thing right?
it allows in an efficient way for all desirable encoding scenarios,
and it avoids any wrap around anomolies causing under-allocations.

What anomalies? Are these a consequence of using signed long, or
size_t?

I am describing what int does (*BOTH* the encoding scenarios and
avoiding anomolies), Using a long int would allow for arithmetic on
numbers that exceed the maximum value of size_t on some systems (that
actually *exist*), so when there was an attempt to malloc or realloc on
such sizes, there would be a wrap around to some value that would just
make it screw up. And if I used a size_t, then there would be no
simple space of encodings that can catch errors, constants and write
protected strings.

If it's longer than the maximum size_t value, you probably can't have it
anyway, so there's no point in being able to represent it.
Huh?

Silly encoding tricks buy you nothing, just use another field with bit
flags.

If I do that, I lose space, speed, and error detection. I see it as
buying me a whole hell of a lot actually.
This shouldn't be left to chance anyway, pretending that it can be
caught invites disaster when inevitably one of the cases comes up when
it _doesn't_ get caught.

Uhh ... that's the situation we have with basically all other string
libraries in existence for C *today*. My library and TR 24731 are the
only ones to attempt to catch these errors *before* any undefined
scenario occurs. In practice this means that a greater percentage of
corruption errors are simply caught in your normal error handling.
 
B

Ben C

Ben C a écrit :

Why not?

Suppose Matrix A,B,C;

C = A+B;

Your operator + function would allocate the space, add the matrix to a
possible linked lists of matrices that allows to GC unused ones, and
return the results.

A reference to them presumably.

Yes indeed, if you have a garbage collector (and references) there is no
problem.

That's why I say operator-overloading works well in languages where
the framework manages storage for you (e.g. in Python, and apparently
lcc-extended C).

[snip]
Besides, I think that using the addition operator to "add" strings is an
ABOMINATION because:

a+b != b+a
"Hello" + "World" != "World" + "Hello"

It just makes NO SENSE. The same thing when you apply the addition
operator to dates: it makes NO SENSE to ADD dates. Only subtraction
makes sense. And yes, multiplying dates is left "as an exercise" for the
fools!

And what about left-shifting iostreams :)
If you feel that operator overloading would not solve the problem for
matrices addition, then you will have to devise other means of doing that.

The GC however, is an ELEGANT solution to all this problems. We would
have the easy of use of C++ with its automatic destructors, WITHOUT
PAYING THE PRICE in language and compiler complexity.

This last point is important: compiler complexity increases the effort
that the language implementor must do and increases the "bug surface".

Yes of course. Although I would say, why not leave poor C alone and
start a new language? Or just use a different language that already
exists... there are a lot out there.

I often get the feeling there's a lot of pain and complexity in C++ that
could have been avoided if it hadn't started out trying to be compatible
with C.
 
B

Ben C

C++ does not do GC, nor are you required to use any "general-purpose"
allocator.

Yes I know. But you do get constructors, destructors and references, so
you can fit explicit memory management "under the hood" of operator
overloading.
Yes, the programmer can write a constructor to do this. He does not
have to.

I don't know of a way to do it without a constructor (for a
shallow-copied copy-on-write string class).

[snip]
I still don't get your point.

Show me the string example, and hopefully either you will get my point
or I will get yours :)
 
J

Jordan Abel

Jordan said:
Ben C wrote:
CBFalconer wrote:
(e-mail address removed) wrote:
CBFalconer wrote:
... snip ...
The last time I took an (admittedly cursory) look at Bstrlib, I
found it cursed with non-portabilities

You perhaps would like to name one?

I took another 2 minute look, and was immediately struck by the use
of int for sizes, rather than size_t. This limits reliably
available string length to 32767.

[snip]

[...] I did find an explanation and
justification for this. Conceded, such a size is probably adequate
for most usage, but the restriction is not present in standard C
strings.

Your going to need to conceed on more grounds than that. There is a
reason many UNIX systems tried to add a ssize_t type, and why TR 24731
has added rsize_t to their extension. (As a side note, I strongly
suspect the Microsoft, in fact, added this whole rsize_t thing to TR
24731 when they realized that Bstrlib, or things like it, actually has
far better real world safety because its use of ints for string
lengths.) Using a long would be incorrect since there are some systems
where a long value can exceed a size_t value (and thus lead to falsely
sized mallocs.) There is also the matter of trying to codify
read-only and constant strings and detecting errors efficiently
(negative lengths fit the bill.) Using ints is the best choice
because at worst its giving up things (super-long strings) that nobody
cares about,

I think it's fair to expect the possibility of super-long strings in a
general-purpose string library.

Ok, so you can name a single application of such a thing right?

it allows in an efficient way for all desirable encoding scenarios,
and it avoids any wrap around anomolies causing under-allocations.

What anomalies? Are these a consequence of using signed long, or
size_t?

I am describing what int does (*BOTH* the encoding scenarios and
avoiding anomolies), Using a long int would allow for arithmetic on
numbers that exceed the maximum value of size_t on some systems (that
actually *exist*), so when there was an attempt to malloc or realloc on
such sizes, there would be a wrap around to some value that would just
make it screw up. And if I used a size_t, then there would be no
simple space of encodings that can catch errors, constants and write
protected strings.

If it's longer than the maximum size_t value, you probably can't have it
anyway, so there's no point in being able to represent it.

Huh?

size_t has to be able to represent the size of any object. to have
a string longer than its maximum value you have to have an array of
characters longer than that maximum value - which you can't have.
If I do that, I lose space, speed, and error detection. I see it as
buying me a whole hell of a lot actually.

Space and speed are cheap these days.

Even if you have a million strings, that's still only four megabytes
saved. If you make a million calls, that's still only a few million
cycles saved.

It does _not_ buy you error detection in general, and a false sense of
safety can be dangerous.

Probably the best thing to do to prevent errors would be make everything
use your API, and make sure your functions don't have bugs. Once you
have that, the only possible source of errors is bit rot, and you can't
do anything about that.
 
J

Jordan Abel

It just makes NO SENSE. The same thing when you apply the addition
operator to dates: it makes NO SENSE to ADD dates. Only subtraction
makes sense. And yes, multiplying dates is left "as an exercise" for
the fools!

The addition operator on dates would work _exactly_ the same way as the
addition operator on pointers - you can subtract two of them, or add one
to a number (representing an interval)

Presumably, the number would be taken as seconds [so that the
subtraction operator would call difftime, and addition, on systems where
it's not trivial, could call localtime, modify tm_secs, and then call
mktime]
 
J

jacob navia

Jordan Abel a écrit :
It just makes NO SENSE. The same thing when you apply the addition
operator to dates: it makes NO SENSE to ADD dates. Only subtraction
makes sense. And yes, multiplying dates is left "as an exercise" for
the fools!


The addition operator on dates would work _exactly_ the same way as the
addition operator on pointers - you can subtract two of them, or add one
to a number (representing an interval)

Presumably, the number would be taken as seconds [so that the
subtraction operator would call difftime, and addition, on systems where
it's not trivial, could call localtime, modify tm_secs, and then call
mktime]

Yes adding a number to a date makes sense, but I was speaking about
adding two dates!
 
F

Flash Gordon

Flash said:
Ben C wrote:
CBFalconer wrote:
(e-mail address removed) wrote:
CBFalconer wrote:
... snip ...
The last time I took an (admittedly cursory) look at Bstrlib, I
found it cursed with non-portabilities
You perhaps would like to name one?
I took another 2 minute look, and was immediately struck by the use
of int for sizes, rather than size_t. This limits reliably
available string length to 32767.
[snip]

[...] I did find an explanation and
justification for this. Conceded, such a size is probably adequate
for most usage, but the restriction is not present in standard C
strings.
Your going to need to conceed on more grounds than that. There is a
reason many UNIX systems tried to add a ssize_t type, and why TR 24731
has added rsize_t to their extension. (As a side note, I strongly
suspect the Microsoft, in fact, added this whole rsize_t thing to TR
24731 when they realized that Bstrlib, or things like it, actually has
far better real world safety because its use of ints for string
lengths.) Using a long would be incorrect since there are some systems
where a long value can exceed a size_t value (and thus lead to falsely
sized mallocs.) There is also the matter of trying to codify
read-only and constant strings and detecting errors efficiently
(negative lengths fit the bill.) Using ints is the best choice
because at worst its giving up things (super-long strings) that nobody
cares about,
I think it's fair to expect the possibility of super-long strings in a
general-purpose string library.
Ok, so you can name a single application of such a thing right?
Handling an RTF document that you will be writing to a variable length
record in a database. Yes, I do have good reason for doing this. No, I
can't stream the document in to the database so I do have to have it all
in memory. Yes, RTF documents are encoded as text. Yes, they can be
extremely large, especially if they have graphics embedded in them
encoded as text.

So now name the platform where its *possible* to deal with this, but
where Bstrlib fails to be able to deal with them due to its design
choices.

If the DOS port hadn't been dropped then depending on the compiler we
might have hit this. A significant portion of the SW I'm thinking of
originated on DOS, so it could have hit it.
I need two *bits* for flags, and I want large ranges to catch errors in
the scalar fields (this is a *safe* library). An extra struct entry is
the wrong way to do this because it doesn't help my catch errors in the
scalar fields, and its space inefficient.

ssize_t would have been a reasonable *functional* choice, but its not
standard. size_t is no good because it can't go negative. long int is
no good because there are plenty of real platforms where long int is
larger than size_t. int solves all the main real problems, and as a
bonus the compiler is designed to make sure its the fastest scalar
primitive available.

Strangely enough, when a previous developer on the code I'm dealing with
thought he could limit size to a "valid" range an assert if it was out
of range we found that the asserts kept getting triggered. However, it
was always triggered incorrectly because the size was actually valid! So
I'll stick to not artificially limiting sizes. If the administrator of a
server the SW is installed on wants then s/he can use system specific
means to limit the size of a process.
--
Flash Gordon, living in interesting times.
Web site - http://home.flash-gordon.me.uk/
comp.lang.c posting guidelines and intro:
http://clc-wiki.net/wiki/Intro_to_clc

Inviato da X-Privat.Org - Registrazione gratuita http://www.x-privat.org/join.php
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,183
Messages
2,570,968
Members
47,524
Latest member
ecomwebdesign

Latest Threads

Top