Two Questions about "strlen", "strcat" and "strcpy"

K

Keith Thompson

2. Would there be any advantage in having strcat and strcpy return a
pointer to the "end" of the destination string rather than returning a
pointer to its beginning?

Possibly a slight one. Returning a pointer to the beginning doesn't
give you any information you didn't already have. Returning a pointer
to the end (presumably to the trailing '\0') could, for example, let
you catenate more characters onto the end of the string without having
to scan the whole string again to find the end of it.

The disadvantage is that it would break code that depends on the
current behavior.
 
D

Douglas A. Gwyn

Matt said:
1. strlen returns an unsigned (size_t) quantity. Why is an unsigned
value more approprate than a signed value? Why is unsighned value less
appropriate?

It can't simultaneously be both.
This sounds like a homework question. The idea of
homework is to get you to do your own analysis.
2. Would there be any advantage in having strcat and strcpy return a
pointer to the "end" of the destination string rather than returning a
pointer to its beginning?

How would you use such a pointer if it did,
and what must you do instead to achieve a
similar goal using the existins standard
functions?
 
B

Brian Inglis

On 26 Aug 2004 19:44:06 GMT in comp.lang.c.moderated,
I have 2 questions:

1. strlen returns an unsigned (size_t) quantity. Why is an unsigned
value more approprate than a signed value? Why is unsighned value less
appropriate?

What does the standard say about size_t? What do you think?
2. Would there be any advantage in having strcat and strcpy return a
pointer to the "end" of the destination string rather than returning a
pointer to its beginning?

The non-standard but commonly available stpcpy/stpncpy functions do
just that.
 
B

Brian Raiter

I have 2 questions:
1. strlen returns an unsigned (size_t) quantity. Why is an unsigned
value more approprate than a signed value? Why is unsighned value less
appropriate?

2. Would there be any advantage in having strcat and strcpy return a
pointer to the "end" of the destination string rather than returning a
pointer to its beginning?

I also have 2 questions:

1. Posting homework questions to usenet frequently returns
misinformation and insults. Why is this more appropriate than actual
answers?

2. Would there be any advantage in having you do your own homework?

b
 
D

Dan Pop

In said:
I have 2 questions:

1. strlen returns an unsigned (size_t) quantity. Why is an unsigned
value more approprate than a signed value?

o Strings can't have negative sizes.

o size_t is also capable of representing the size of the largest object
supported by the implementation. If you fill this object with a string,
strlen will be able to correctly return its size.
Why is unsighned value less appropriate?

Unsigned and signed types don't mix well in C. Ideally, unsigned types
should be used only for bit manipulation and modulo arithmetic purposes.
In real life, however, their ability to represent larger positive
values than their signed counterparts imposes their usage for normal
computational purposes.
2. Would there be any advantage in having strcat and strcpy return a
pointer to the "end" of the destination string rather than returning a
pointer to its beginning?

Most definitely. It would render multiple string concatenation faster.
fgets would also benefit from such a behaviour, since the end of its
string *must* be examined in order to determine if a complete line has
been read.

Dan
 
K

Karthiik Kumar

Matt said:
I have 2 questions:

1. strlen returns an unsigned (size_t) quantity. Why is an unsigned
value more approprate than a signed value? Why is unsighned value less
appropriate?

I cannot think of a string that has its length to be -ve. How would
a string of length -1 look like (as if that makes sense !! ) ?
2. Would there be any advantage in having strcat and strcpy return a
pointer to the "end" of the destination string rather than returning a
pointer to its beginning?


With the whole lot of string manipulation functions expecting a
pointer to the beginning of the string , to have strcpy / strcat return
pointer to the beginning of the string would be orthogonal to the API,
instead of other ways like pointing to its 'end' etc.
 
P

Paul Hsieh

I have 2 questions:

1. strlen returns an unsigned (size_t) quantity. Why is an unsigned
value more approprate than a signed value? Why is unsighned value less
appropriate?

The reasoning is that unsigned values have a larger maximum value and
that negative lengths don't have any meaning. So under the assumption
that strlen can never fail and only needs to represent all possible
outputs of a string length, size_t is the most appropriate output.
2. Would there be any advantage in having strcat and strcpy return a
pointer to the "end" of the destination string rather than returning a
pointer to its beginning?

As others have posted, incremental string concatenation is simplified
by such a scheme. The claims of superior performance is kind of funny
though -- while technically true, it misses the greater point that in
fact calls to strlen(), implicit or not, is the real performance
problem.

People may write strcpy() in "hand coded assembly language" if they
want, but the semantics for it limits the advantange one can gain from
doing this on most platforms (on modern x86 compilers you will gain
nothing by doing this.) memcpy() on the other hand, can be
*drammatically* improved in performance using assembly language on
most platforms -- the reason is that aligned block copying is
something most hardware has good support for that is far superior to
char by char copying.

So returning an "end pointer" helps half of the problem by implicitely
tracking the end-address for the destination, but does nothing about
the other half of the problem of not knowing the length of the source
and thus still doing the *implicit strlen()* as part of the strcat or
strcpy. If, on the other hand, the length of your source and
destination strings is known beforehand, then one could use memcpy()
instead, which would lead to a *true* performance boost.

People who have seen me post here before already know the punchline.
I've written a string library that does precisely this sort of thing
(as well as all sorts of other things related to speed, safety,
functionality and maintainability). You can learn more by visiting
the second link below.
 
J

jacob navia

Kenny said:
Innovation is OT here (*). I thought you understood that by now.

Well, that is a sentence that needs to be framed, and
right away kept in the museum.

Down with innovation!

Operator overloading is an accepted way of working with
numerical quantities that has gotten accepted even in
traditional languages like FORTRAN.

I like C. I do not want it to disappear as a computer
language that can provide simplicity and power.

jacob
 
J

jacob navia

Paul said:
People who have seen me post here before already know the punchline.
I've written a string library that does precisely this sort of thing
(as well as all sorts of other things related to speed, safety,
functionality and maintainability). You can learn more by visiting
the second link below.

The approach you take is the good one. Length delimited strings!

The string library lcc-win32 proposes is the same idea but with some
syntatic sugar around it:
1) You can index Strings like char *:
String S1 = "abc";
S1[1] // yields 'b'
S1[1] = 'X' // String is now "aXc"

2) You can assign them to local variables in the normal way as shown above

3) All strings are garbage collected. No more "free" problems.

4) Function names are cloned from the C library:
Strcpy
Strcat
etc

Easy to use, easy to learn.

Lcc-win32:
http://www.cs.virginia.edu/~lcc-win32

jacob
 
C

CBFalconer

jacob said:
Paul said:
People who have seen me post here before already know the punchline.
I've written a string library that does precisely this sort of thing
(as well as all sorts of other things related to speed, safety,
functionality and maintainability). You can learn more by visiting
the second link below.

The approach you take is the good one. Length delimited strings!

The string library lcc-win32 proposes is the same idea but with
some syntatic sugar around it:
1) You can index Strings like char *:
String S1 = "abc";
S1[1] // yields 'b'
S1[1] = 'X' // String is now "aXc"

2) You can assign them to local variables in the normal way as
shown above

3) All strings are garbage collected. No more "free" problems.

4) Function names are cloned from the C library:
Strcpy
Strcat
etc

Easy to use, easy to learn.

The problem is that you are incorporating this into the
compiler/library, instead of a separate module available with
source and written in standard C, as does Paul. Things like your
indexing operations above are not possible without an extension in
the compiler, and thus are inherently non-portable. You could
make something that accepted:

s1.body = 'a';

etc. in a fully portable manner, and this would be on-topic here.
Meanwhile your system runs only under Windoze (and even only a
subset of that), and is very far from portable.
 
D

Dan Pop

In said:
imo that should be "...would possibly be..."

The only way to know is to measure.

To measure what? The behaviour of one specific implementation?

The point is that such a function solves the problem with NO overhead,
while both strcat and sprintf have overheads. One doesn't have to
measure anything to realise this.

Dan
 
J

jacob navia

CBFalconer said:
The problem is that you are incorporating this into the
compiler/library, instead of a separate module available with
source and written in standard C, as does Paul. Things like your
indexing operations above are not possible without an extension in
the compiler, and thus are inherently non-portable. You could
make something that accepted:

s1.body = 'a';

etc. in a fully portable manner, and this would be on-topic here.
Meanwhile your system runs only under Windoze (and even only a
subset of that), and is very far from portable.


That is *still* possible OF COURSE.

Nobody hinders you to write
s1.content[1] = 'a';
instead of
s1[1] = 'a';

Lcc-win32 is *still* a C compiler and will swallow that
in no time. The second form will be slightly faster since
it avoids bounds checking that is performed automatically
with length prefixed strings.

The structure is defined in the header file, and you
can avoid any dependencies with lcc-win32 by sticking
to a standard notation if you feel like.

The point here is that we need a portable way of using this
length delimited strings in C. Not a particular
implementation such as mine.

Important is that we have in the standard language a way of using
length prefixed strings in the same way as now we have zero terminated
strings.

The decision as to which strings are more convenient and appropiate
to the task at hand should be left to the C programmer. There is NO
CHOICE now. That is what bothers me.

jacob

P.S. Most languages accept operator overloading now, even
FORTRAN. (Fortran 90 standard), including C#, Perl, Ruby,
and many others. It is a proved compiler technology with
few mysteries left...

Operator overloading makes possible to write such libraries without
too much pain.
 
A

Alan Balmer

Well, that is a sentence that needs to be framed, and
right away kept in the museum.

Down with innovation!

Operator overloading is an accepted way of working with
numerical quantities that has gotten accepted even in
traditional languages like FORTRAN.

I like C. I do not want it to disappear as a computer
language that can provide simplicity and power.
There's another newsgroup, just down the hall, which discusses the C
standard and things which should go in its next version.
 
C

CBFalconer

jacob said:
.... snip ...

Important is that we have in the standard language a way of using
length prefixed strings in the same way as now we have zero
terminated strings.

The decision as to which strings are more convenient and
appropiate to the task at hand should be left to the C programmer.
There is NO CHOICE now. That is what bothers me.

Then present an appropriate module in source form, together with
suggested extensions to the language to ease use, IN THE
comp.std.c newsgroup. The source module can be advertised here,
as does Paul. If the technique is useful, simple, safe, and in
use it will be considered for a future version. And don't call
them strings, that already has a defined implementation in C.
 
M

Michael Wojcik

No. It returns the pointer to the start so you can chain calls:

Which makes it so much easier to overflow buffers, fail to check
results, and so forth - without which software would not be the
great success that it is today.

This reminds me of Mike Whaler's construction from earlier in this
thread:
char *p = buffer;
p += sprintf(p, "%s", "Hello");
p += sprintf(p, "%s", " world\n");

Certainly in a trivial case there's no danger, but in real code this
sort of thing is a Bad Idea. It's worse with fprintf, since there's
a wider range of error conditions, but *I* don't want to bet that I
never mistype a format string, just so I can write terse code.
 
D

Default User

Michael said:
Which makes it so much easier to overflow buffers, fail to check
results, and so forth - without which software would not be the
great success that it is today.

No more so than any other use of those functions. Whether they're
chained or not is irrelevant.



Brian Rodenborn
 
K

Kenny McCormack

There's another newsgroup, just down the hall, which discusses the C
standard and things which should go in its next version.

I think we all understand that (certainly Jacob & I do). And I think we
understand the reasons why it is so. I think we all understand the context
of my little jibe.

It is just that, and this is a point made several times, through the years,
by so many different people, it is so counter-intuitive that a newsgroup
named simply comp.lang.c would be about something so abstract and unrelated
to what most people consider C and for something like comp.lang.std.c (or
whatever that thing to which you obliquely refer is named) would:
a) Not be what clc in fact is (i.e., the home of the abstraction to
which I allude earlier)
b) Actually be closer (according to your statement) to what most
people would intuitively think clc would be.
 
A

Alan Balmer

I think we all understand that (certainly Jacob & I do). And I think we
understand the reasons why it is so. I think we all understand the context
of my little jibe.

I'm sure that Jacob understands it, but he seems reluctant to accept
it. Personally, I weary of the constant advertisements for his "better
than C" language.
 
P

Paul Hsieh

jacob navia said:
The point here is that we need a portable way of using this
length delimited strings in C. Not a particular
implementation such as mine.

Uhh ... that's exactly what bstrlib does. bstrlib is extremely
portable -- semantically speaking, I argue its even more portable than
CLIB itself since its defined usage is maximal and necessarily
identical over all platforms.
Important is that we have in the standard language a way of using
length prefixed strings in the same way as now we have zero terminated
strings.

No. zero terminated strings is the whole problem in the first place.
This is where both buffer overflows and retrograde performance comes
from. The whole CLIB style of '\0' terminated strings forces the
programmer to think in terms of implementation and constantly respin
solutions to common problems. Every other modern language in
existence basically solves lets the programmer think about strings
just as strings. So what we want is something closer to how other
programming languages deal with strings (like Python, Perl, Java,
Pascal, etc), while supersetting the CLIB char * nonsense. Guess what
-- bstrlib *does* this.
Operator overloading makes possible to write such libraries without
too much pain.

Ok, this is a different thing, and it solves a different problem.
This is basically a focus on syntactical concerns more than anything
else -- and its not a very good solution to it either. The set of
operators available are dictated by the syntax/considerations for
arithmetic on scalars. For example if you want a comprehensive vector
library, which operator are you proposing to assign for cross-product,
dot-product, SIMD-product, tensor and trace, while retaining sum and
difference? Ask for enough operations and you will eventually just
run out of operators.

As I have posted before, it would make more sense if there were
*additional* operators (using multipler operator characters and some
previously unused characters such as @, $ to create things that look
like: @+, $*, @==, $^^, etc) that basically had empty definitions
unless actually defined. There are other operator overloading things
such as:

X"..."
"..."X
3.75X

Where the X could be other letters. This gives programmers a way of
using operator overloading without suffering from the confusion of
using previous operators that someone reading the code can easily
mistake for having different semantics from what they think. See?
Its still explicit (something C is useful for) while providing for
more modern functionality.

The C standards people have the opportunity to truly move the language
forward, and in fact get a small leg up over other languages if they
would consider things like this. But they are clearly such
retrograde, short-sighted and as we now see from the C99 debacle, now
ultimately ineffective. The standards committee is stuck in their old
"we have to make sure the VAX people, the embedded people, the super
computer people, the DOS people and the UNIX people can all adopt our
standard while actually doing completely different things in their
compiler" mantra.

The computing universe has changed. People don't want just the minor
silly things in C99 over C89. C still represents the lowest level
language, but there still remain several real problems not solved in
the language.

Jacob, people like you and the guy that is making "D" (Walter B---?)
make me sad. You are both seem overly concerned with such
ridiculously superficial aspects of the language -- yet both of you
two are the only ones putting your money where your mouth is and
converting your capability to make *compilers* to demonstrate the
possibility of evolution of C. Yet you haven't figured out that you
haven't sold your ideas to any significant population of programmers.

There are ideas in programming languages, especially the C-class of
programming langauges that I *desperately* want to see:

1. A preprocessor with "Turing complete" (or close enough) power.
The point is that the LISP people continue to laugh at C
programmers who have no code generation or "lambda"
programming ability.
a) Compile time type assertions to allow for type safe macros.
2. Exposing more important hardware functions such as a widening
multiply, bit scan and bit count operations. Many CPUs have
instructions for accelerating these operations, and its
otherwise fairly straightforward to simulate these operations.
a) Include things like round() and trunc() as seperate
functions.
3. For more powerful and useful C library.
a) Fix all the stupid problems like strtok(), gets(), etc.
b) More powerful heap API (freeall(), memsize(), sub-heaps,
totalAllocCount(), allAllocsIterate() etc).
4. Some way of performing a restartable vsnprintf (), or
generally va_* reuse (according to the standard, not the
implementation).
5. Real co-routines -- none of this setjmp/longjmp nonsense.
6. A scope specific "goto case ____". This kind of functionality
is *mandatory* for implementing fast state machines. Today
programmers are either forced to have redundant case ____ and
labels that are not scope protected or they do is the slow way
(while wrapped switch statement). Think about it -- there is
literally *NO* programming language with control mechanisms
that perfectly match the very common programming idiom of
state machines (except assembly language!)
7. Create an API for making "virtual file" objects. I.e., memory
files, network streams, algorithmic fractal strings, etc. could
be fopen(*,"r")'ed, fread()'ed, etc.

Think about it. This list is a set of *REAL* programming language
improvements, that are not really duplicated by any other language
(well ok, except for CPP improvements, which are duplicated with LISP
lambdas and MASM's preprocessor, of m4 or whatever). If you add these
things I don't see how anyone could *deny* that you were moving the
language forward in a significant way without just trying to be "me
too" with other languages.
 
J

jacob navia

Paul said:
No. zero terminated strings is the whole problem in the first place.
This is where both buffer overflows and retrograde performance comes
from. The whole CLIB style of '\0' terminated strings forces the
programmer to think in terms of implementation and constantly respin
solutions to common problems. Every other modern language in
existence basically solves lets the programmer think about strings
just as strings. So what we want is something closer to how other
programming languages deal with strings (like Python, Perl, Java,
Pascal, etc), while supersetting the CLIB char * nonsense. Guess what
-- bstrlib *does* this.

By giving the programmer the choice of zero terminated strings
OR length prefixed strings the language would retain compatibility
and at the same time allow the development of more robust
applications.
Ok, this is a different thing, and it solves a different problem.
This is basically a focus on syntactical concerns more than anything
else -- and its not a very good solution to it either. The set of
operators available are dictated by the syntax/considerations for
arithmetic on scalars. For example if you want a comprehensive vector
library, which operator are you proposing to assign for cross-product,
dot-product, SIMD-product, tensor and trace, while retaining sum and
difference? Ask for enough operations and you will eventually just
run out of operators.

1: Overloading the [] operator makes it possible to design container
objects that add semantics to object access. Strings can be implemented
that use the [] indexing operator to bounds check the access, and to
implement length delimited strings. Above all, instead of hard wiring
length prefixed strings into the compiler it allows the user to define
string libraries that use this feature.

2: Overloading the assignment operator allows the user to write:
String S1 = "abc";
using the usual semantics. Yes This *is* syntatic sugar since the user
could write:
String S1 = new_string("abc");
but syntax sugar *is* important to easy the usage of the new data
type. If not, we would all program on assembler.
As I have posted before, it would make more sense if there were
*additional* operators (using multipler operator characters and some
previously unused characters such as @, $ to create things that look
like: @+, $*, @==, $^^, etc) that basically had empty definitions
unless actually defined. There are other operator overloading things
such as:

X"..."
"..."X
3.75X

Where the X could be other letters. This gives programmers a way of
using operator overloading without suffering from the confusion of
using previous operators that someone reading the code can easily
mistake for having different semantics from what they think. See?
Its still explicit (something C is useful for) while providing for
more modern functionality.

This is quite correct Paul. But I have an incredible difficult time
trying to convince people about the need of evolution in C, and by the
other answers in this thread you can immediately see that the
conservative mood (innovation is off topic here) prevents any discussion
about improvements like the one you propose. I wanted to use greek
letters for new operators like
SIGMA j=1,inf (expression in j)
The C standards people have the opportunity to truly move the language
forward, and in fact get a small leg up over other languages if they
would consider things like this. But they are clearly such
retrograde, short-sighted and as we now see from the C99 debacle, now
ultimately ineffective. The standards committee is stuck in their old
"we have to make sure the VAX people, the embedded people, the super
computer people, the DOS people and the UNIX people can all adopt our
standard while actually doing completely different things in their
compiler" mantra.

The standards comitee refuses any change, since change should not exist
in C. Change and improvements are only allowed in C++. The comitee
has refused until now to correct even ouright *bugs* like the buffer
overflow specified in the asctime function. I pointed to that bug, and
was told that the comitee rejected any correction in 2001.

The computing universe has changed. People don't want just the minor
silly things in C99 over C89. C still represents the lowest level
language, but there still remain several real problems not solved in
the language.

The standards comitee is convinced that all development should be only
in C++. There isn't even a public forum where innovation can be
presented and discussed. The comp.lang.c group discusses only important
questions like the eternal:
I wrote i++ = i++ and it doesn't work...
Jacob, people like you and the guy that is making "D" (Walter B---?)
make me sad. You are both seem overly concerned with such
ridiculously superficial aspects of the language -- yet both of you
two are the only ones putting your money where your mouth is and
converting your capability to make *compilers* to demonstrate the
possibility of evolution of C. Yet you haven't figured out that you
haven't sold your ideas to any significant population of programmers.

I spoke with Walter about D. He has overall good ideas, and his
language/compiler *is* an improvement. The problem with it is that is
object oriented, i.e. it provides support for a specific way of
organizing data. I think that C should be paradigm neutral, without
forcing *any* preconceived schema into the user's throat.
There are ideas in programming languages, especially the C-class of
programming langauges that I *desperately* want to see:

1. A preprocessor with "Turing complete" (or close enough) power.
The point is that the LISP people continue to laugh at C
programmers who have no code generation or "lambda"
programming ability.

Can you give an example?
You mean anonymous code blocks?
a) Compile time type assertions to allow for type safe macros.

typeof() ?
2. Exposing more important hardware functions such as a widening
multiply, bit scan and bit count operations. Many CPUs have
instructions for accelerating these operations, and its
otherwise fairly straightforward to simulate these operations.

Lcc-win32 introduces intrinsics like _overflow(), MMX intrinsics, etc.
a) Include things like round() and trunc() as seperate
functions.

They are separate functions now in C99.
3. For more powerful and useful C library.
a) Fix all the stupid problems like strtok(), gets(), etc.

I have been saying this since quite a long time but nobody wants to
change anything.
b) More powerful heap API (freeall(), memsize(), sub-heaps,
totalAllocCount(), allAllocsIterate() etc).
Ditto.

4. Some way of performing a restartable vsnprintf (), or
generally va_* reuse (according to the standard, not the
implementation).
5. Real co-routines -- none of this setjmp/longjmp nonsense.
6. A scope specific "goto case ____". This kind of functionality
is *mandatory* for implementing fast state machines. Today
programmers are either forced to have redundant case ____ and
labels that are not scope protected or they do is the slow way
(while wrapped switch statement). Think about it -- there is
literally *NO* programming language with control mechanisms
that perfectly match the very common programming idiom of
state machines (except assembly language!)
7. Create an API for making "virtual file" objects. I.e., memory
files, network streams, algorithmic fractal strings, etc. could
be fopen(*,"r")'ed, fread()'ed, etc.

This things should be done in a library. A low level language can be
improved with powerful libraries.

Think about it. This list is a set of *REAL* programming language
improvements, that are not really duplicated by any other language
(well ok, except for CPP improvements, which are duplicated with LISP
lambdas and MASM's preprocessor, of m4 or whatever). If you add these
things I don't see how anyone could *deny* that you were moving the
language forward in a significant way without just trying to be "me
too" with other languages.

Well "me too" is not intrinsically bad. Better to improve C than leave
it as it is now.

C is becoming obsolete, as FORTRAN did. Of course there are still places
where FORTRAN is good today, and it is still used.

C will go the same road. Nobody will want to program new software in it,
like now in FORTRAN. (Yes there are still some applications being
developed in FORTRAN, and yes, there will be always some people that
will develop in C. But the mainstream of programming will forget C)

This future is not inevitable, but by refusing innovation, this group
contributes a lot to make it real.

I have never seen a newsgroup about a programming language where you can
write:

"Innovation is off topic here as you should know by now"

and get away with it.

jacob
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,147
Messages
2,570,833
Members
47,377
Latest member
MableYocum

Latest Threads

Top