strcpy - my implementation

arnuld · Sep 9, 2008

It isn't very good camouflage, though, since any experienced C
programmer will recognise it for what it is. It's just a more elegant
way to write the code. We don't write char foo[8] = { 'H', 'e', 'l',
'l', 'o', '\0', '\0', '\0' } just because it makes explicit the fact
that eight characters are being copied into the array. We write char
foo[8] = "Hello", and trust that competent programmers will understand.

[snip]

That whole discussion leaves me wondering whether:

char arrc[100] = {0};

is same as:

char arrc[100];
memset(arrc, '\0', 100);

or whether latter is more expansive than former ? and which one is
advised to use by c.l.c ?

arnuld · Sep 9, 2008

You would be amazed at how few would actually do it this way. There are
many people out there who discourage such stuff. I even read here once
that it "misuses C" ... the mind boggles. I have seen many code bases
where you hardly ever see a pointer used in its natural habitat. A
crying shame IMO.

I am not a native English man so I don't know what you mean by "....used
in its natural habitat..."

Do you want to say that c.l.c discourages *p++ = *q++ ?

arnuld · Sep 9, 2008

On Mon, 08 Sep 2008 11:51:45 +0500, arnuld wrote:

Here is the new version whihc puts a check on maximum input:

/* My version of "strcpy - a C Library Function
*
* version 1.1
*
*/

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

enum { ARRSIZE = 4 };

char* my_strcpy( char*, char* );

int main( int argc, char** argv )
{
char* pc;
int input_size;

char src[ARRSIZE+1] = {0};
char dest[ARRSIZE+1] = {0};

if( 2 != argc )
{
perror("USAGE: ./exec \" your input \"\n");
exit( EXIT_FAILURE );
}
else
{
input_size = strlen( argv[1] );

if( ARRSIZE < input_size )
{
fprintf(stderr, "Input must be %d characters or less\n", ARRSIZE);
exit(EXIT_FAILURE);
}

strcpy( src , argv[1] );
}

pc = my_strcpy( dest, src );

while( *pc )
{
printf("*pc = %c\n", *pc++);
}

return EXIT_SUCCESS;
}

char* my_strcpy( char* dest, char* src )
{
char *const pc = dest;

while( (*dest++ = *src++) )
{
;
}

return pc;
}

The one thing I do not understand here, if the arrays are created with
size ARRSIZE or even ARRSIZE+1 ( +1 for extra NULL character), the output
is not affected. Since the user has to enter 4 characters in this case
like "Love" + 1 for NULL but even with ARSSIZe = 4 in totoal, it works
fine. Is there some problem here ?

CBFalconer · Sep 9, 2008

arnuld said:
.... snip ...

That whole discussion leaves me wondering whether:

char arrc[100] = {0};

is same as:

char arrc[100];
memset(arrc, '\0', 100);

or whether latter is more expansive than former ? and which one
is advised to use by c.l.c ?

They are basically the same in code generated, although that is
obviously up to the decisions of the implementor. The advantage of
the two statement version is that you can separate the activity
from the sawing off of the memory space (barring use of const), and
the generated code is closer to what you actually typed. To me,
this adds understanding.

Flash Gordon · Sep 9, 2008

CBFalconer wrote, On 09/09/08 05:39:

However, notice that replacing strcpy is different than adding
strlcpy.

On some implementations it is the same.

strcpy exists in the current libraries. strlcpy does
not.

Apart from the implementations which provide it as an extension,
something they are allowed to do.

Any possible problem is somewhere in the future.

Apart from it being undefined behaviour and the fact that some
implementations do define it.

And, as I
pointed out, those names are alterable in the source code.

I agree with that. However they do not have anything to do with what the
OP was doing which was as an exercise re-implementing strcpy.

Nick Keighley · Sep 9, 2008

It does, of course; that's not the point.

Since I share pete's opinion on this style issue, I'll try to explain.

When I write an if or while statement, I prefer to use an expression
that is *conceptually* boolean. By "conceptually boolean", I mean
that the value of the expression can be thought of as either true or
false, and carries no additional information.

For example, if I'm examining the value of a character, the following
are equivalent:
if (c) { ... }
if (c != '\0') { ... }
I prefer to write the latter, because the value of c by itself isn't
just a true or false value, but the result of the "!=" operator is.

Similarly, I would write
if (strcmp(s1, s2) != 0) { ... }
rather than
if (!strcmp(s1, s2)) { ... }

and I would write
if ((ptr = malloc(N)) != NULL) { ... }
rather than
if (ptr = malloc(N)) { ... }

and so forth.

I'm perfectly well aware (as is pete, I'm sure) that in each case the
two forms are precisely equivalent, and will most likely result in
identical generated code. I'm also aware that some C programmers
(including, if I'm not mistaken, Kernighan and Ritchie themselves)
prefer the terser forms and consider the forms that I prefer to be too
verbose. I don't necessarily think that preference is wrong, I just
don't share it. Finally, I don't have any real difficulty
understanding either form; it might sometimes take me a marginally
longer time to understand something in the shorter form, but it's not
really significant.

I use the same conventions for the same reasons

Old Wolf · Sep 9, 2008

I disagree. The (c) expression worries only about whether the
character is or is not something with a zero value. The (c !=
'\0') expression expressly converts that zeroness into either the
value 0 or the value 1 before testing. Optimization may affect
this.

I would normally expect the second expression to generate larger
code than does the first, with optimization disabled.

I don't know what you're smoking today, but
the above two if statements are exactly
equivalent in effect, and I challenge you
to come up with a compiler that generates
different code for them, let alone one that
actually causes a different code branch to
execute.

Old Wolf · Sep 9, 2008

That whole discussion leaves me wondering whether:

char arrc[100] = {0};

is same as:

char arrc[100];
memset(arrc, '\0', 100);

or whether latter is more expansive than former ? and which one is
advised to use by c.l.c ?

They both have the same effect, but the first
one is far better because it is less error-prone.

Anyone who disagrees needs to notice that the
1970s have ended, IMHO.

Examples of memset errors:
http://www.google.com/codesearch?hl=en&lr=&q=memset\s*\(.+,\s*0\s*\)&btnG=Search

vippstar · Sep 9, 2008

Old said:
Old said:

That whole discussion leaves me wondering whether:
char arrc[100] = {0};
is same as:
char arrc[100];
memset(arrc, '\0', 100);
or whether latter is more expansive than former ? and which one is
advised to use by c.l.c ?

Click to expand...

Click to expand...

They both have the same effect, but the first
one is far better because it is less error-prone.

Click to expand...

In this particular case, yes. I have explained in my other post in
this thread why they are not equivalent for pointers/floating points.

The first way makes it more obvious to the compiler
what it is that needs to be done.

I like code that gives the compiler a better chance
to take advantage of any available information.

Also it's far less characters to type, and it's more obvious to the
programmer as well.
If there were more object definitions,

char arrc[100];
int i;
size_t n;
/* ... */

memset(arrc, 0, sizeof arrc);

It's not immediately obvious that the array is going to be zeroed.

Richard · Sep 9, 2008

pete said:
We were talking about which kind of expressions
I compare explicitly to zero
and which kind of expressions
I don't explicitly compare to zero.

And why. But you snipped so I dont know...

--

Richard · Sep 9, 2008

Richard Heathfield said:
CBFalconer said:

Does everyone really expect every poster to explain every nuance of every
aspect of C they use in every article?

Chuck is unable to follow a thread. He also has zero retention for
posters obvious skills and previous posts when he feels he can utilise
his ignorance of their posting history to belittle the other poster.

James Kuyper · Sep 9, 2008

arnuld said:
I am not a native English man so I don't know what you mean by "....used
in its natural habitat..."

He's using a metaphor, talking about pointers as if they were a species
of animal. C is one of the natural habitats for pointers, far more so
than most other languages. What he's saying is that some people
deliberately write C so that there's practically no use of pointers,
which is a shame, because C is the place where pointers belong. I
presume that he means explicitly declared variables of pointer type. I
can't imagine any significant amount of C code being written without the
use of expressions with a pointer type.

Do you want to say that c.l.c discourages *p++ = *q++ ?

No. He's saying that many people never use that idiom, and that some
people deliberately avoid all use of pointers; he's not saying that clc
has that opinion. I can confirm that first point. I've been asked to
interview dozens of people since 1994 for a variety of entry-level C
positions. I've given every single one of them a test built around

while(*p++ = *q++);

Few of them could even tell me what it does. Almost none of them could
explain to me how it works. The sticking point seems to be figuring out
why it is that the loop actually stops looping. Even the ones who could
tell me that it stops when it reaches a null character, generally could
not tell me WHY it stops when it reaches a null character. We had to
hire such people (!), because people who could actually pass this test
seem to be unavailable, at least at the salary levels we could afford to
offer.

James Kuyper · Sep 9, 2008

arnuld said:
On Mon, 08 Sep 2008 17:48:20 +0000, Richard Heathfield wrote:

It isn't very good camouflage, though, since any experienced C
programmer will recognise it for what it is. It's just a more elegant
way to write the code. We don't write char foo[8] = { 'H', 'e', 'l',
'l', 'o', '\0', '\0', '\0' } just because it makes explicit the fact
that eight characters are being copied into the array. We write char
foo[8] = "Hello", and trust that competent programmers will understand.

[snip]

Click to expand...

That whole discussion leaves me wondering whether:

char arrc[100] = {0};

is same as:

char arrc[100];
memset(arrc, '\0', 100);

or whether latter is more expansive than former ? and which one is
advised to use by c.l.c ?

I strongly favor the first form, despite what I'm about to say. Whether
or not the same code is generated depends upon the implementation, and
some implementations get this very wrong. In particular, I remember my
horror when I found out why a particular piece of code was running so
slow. It basically said something like this:

double array[40][5416] = {0.0};

What I eventually figured out is that the compiler was generating code
(at default optimization levels!) equivalent to the following:

double array[40][5416];
array[0][0] = 0.0;
array[0][1] = 0.0;
// 216637 similar lines
array[39][5415] = 0.0;

My object file and executable both got hundreds of thousands of bytes
smaller when I removed the {0} and replaced it with an explicit
initialization loop. The initialization loop actually made the code run
faster, too; probably because of the time wasted in the {0} version by
loading those 216640 initialization lines into memory.

However, implementations that stupid are rare (I hope!). Note: I
complained to the vendor, who defended the compiler by saying that this
was the "natural" way of handling the initialization.

Keith Thompson · Sep 9, 2008

CBFalconer said:
I disagree.

You disagree with my statement about my own preference?

Remarkable.

The (c) expression worries only about whether the
character is or is not something with a zero value. The (c !=
'\0') expression expressly converts that zeroness into either the
value 0 or the value 1 before testing.

C99 6.8.4.1:

In both forms, the first substatement is executed if the
expression compares unequal to 0. In the else form, the second
substatement is executed if the expression compares equal to 0.

I suppose you could call that a conversion to 0 or 1 (since the "=="
and "!=" operators yield only those values), but it certainly doesn't
have to be implemented that way.

Optimization may affect
this.

I would normally expect the second expression to generate larger
code than does the first, with optimization disabled.

I wouldn't -- or rather, I'd expect most compilers to perform some
minimal level of optimization, and therefore to generate identical
code for the two forms, even with no optimization options specified.

(A quick experiment shows that one compiler does generate identical
code, another does not. Sun's SPARC compiler generates "cmp %l0,%g0"
for one form, "cmp %l0,0" for the other. "%g0" is a register whose
value is always 0. The effect is obviously identical.

But the difference goes away when optimization is enabled. I'm sure
you weren't suggesting this as a reason to use the shorter form.

Keith Thompson · Sep 9, 2008

Richard Heathfield said:
arnuld said: [...]

That whole discussion leaves me wondering whether:

char arrc[100] = {0};

is same as:

char arrc[100];
memset(arrc, '\0', 100);

or whether latter is more expansive than former ?

Click to expand...

Since you're initialising an array of integers (for chars are integers),
they do the same thing. The first version does it in fewer lines of code.
If the type were non-integer (e.g. pointer, or floating point, or struct
or union of any kind), the two versions would not be equivalent and the
memset version would simply be wrong.

A struct, union, or array whose only non-composite sub-members are of
integer type [*] can safely be initialized with memset, setting the
whole thing to all-bits-zero. It's only unsafe if some of the
sub-members are of floating-point or pointer type.

(Note that it's still not entirely equivalent, since memset will zero
any padding bytes, and {0} won't necessarily do so. Also, {0} might,
on some exotic implementation, use a representation of 0 other than
all-bits-zero -- I think.

The guarantee that all-bits-zero is a valid representation of 0 for
all integer types doesn't appear in the C90 or C99 standard; it was
added in one of the post-C99 Technical Corrigenda, and can be found in
n1256.pdf.

But it's always possible that a pointer or floating-point member might
be added later during maintenance.

[*] That's a clumsy way of saying it, but I couldn't think of a better
one. You have to consider members of the struct or union, or elements
of the array, and recursively for all sub-members and/or sub-elements,
until you get down to things thare are of scalar (numeric or pointer)
type.

That depends on which c.l.c. subscriber you ask, but I'd go for the version
that is right every time, wouldn't you?

Agreed. {0} expresses the intent more clearly and avoids certain
problems. The problems it avoids are rare -- which means they're
difficult to detect.

viza · Sep 10, 2008

Here is the new version whihc puts a check on maximum input:

/* My version of "strcpy - a C Library Function
*
* version 1.1
*
*/

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

enum { ARRSIZE = 4 };

This isn't what enum is for. Use

#define ARRSIZE 4

char* my_strcpy( char*, char* );

const !!!!!

int main( int argc, char** argv )
{
char* pc;
int input_size;

char src[ARRSIZE+1] = {0};
char dest[ARRSIZE+1] = {0};

if( 2 != argc )
{
perror("USAGE: ./exec \" your input \"\n"); exit( EXIT_FAILURE );
}

Don't use perror unless a library function that sets errno has failed or
you have set it yourself.

The one thing I do not understand here, if the arrays are created with
size ARRSIZE or even ARRSIZE+1 ( +1 for extra NULL character), the
output is not affected. Since the user has to enter 4 characters in this
case like "Love" + 1 for NULL but even with ARSSIZe = 4 in totoal, it
works fine. Is there some problem here ?

No, it's the opposite of a problem. It should have exploded your
computer but because there just happened to be a null character (NB: not
NULL character) where you are supposed to put one yourself when you ran
it, it didn't.

You cannot rely on it just being there by coincidence, any more than you
can jump out of a window relying on a truck full of feathers to be
passing by.

CBFalconer · Sep 10, 2008

Old said:
I don't know what you're smoking today, but the above two if
statements are exactly equivalent in effect, and I challenge you
to come up with a compiler that generates different code for them,
let alone one that actually causes a different code branch to
execute.

I didn't say the effect differed. I said the code generated
differed, before optimization. It has to, because one uses the
value of c, and the other converts that to 0 or 1 before testing.

Keith Thompson · Sep 10, 2008

CBFalconer said:
I didn't say the effect differed. I said the code generated
differed, before optimization. It has to, because one uses the
value of c, and the other converts that to 0 or 1 before testing.

No, it doesn't have to differ. I suppose a compiler could generate
painfully naive code with some options, but there's no reason for the
value to be converted to 0 or 1. Even in the case where I found a
difference, there was no such conversion.

In any case, by definition the statement "if (c) { ... }" compares
the value of c to 0. That comparison is done by the equivalent of
"c != 0", which yields a value of 0 or 1. There's just as much basis
(i.e., practically none) for assuming that "if (c)" will convert the
result to 0 or 1 as for assuming that "if (c != '\0')" will do so.

Keith Thompson · Sep 10, 2008

viza said:
This isn't what enum is for.

Says who?

Use

#define ARRSIZE 4

Why?

Here's what the standard says:

The identifiers in an enumerator list are declared as constants
that have type int and may appear wherever such are permitted.

Perhaps enum wasn't designed for the purpose of declaring single
constants, but it does it quite well (if you can live with the
restriction to type int). It's a common and clever idiom, and I see
nothing wrong with using it.

arnuld · Sep 10, 2008

Nevertheless, it is still basically the K&R2 code. Where do you think
Stroustrup first saw it?

I knew you will be ready to rub me for this

.

I meant, I did not see that code in K&R2, I learned that from Stroustrup,
so I don't have much idea of whether K&R2 has this code or not. And I
don't know where Stroustrup saw it first time

union, strcpy and main()	10	Nov 29, 2011
sorting char array	13	Apr 6, 2014
Pointer Arithmetic Problem	22	Oct 3, 2008
Binary Search in C	7	Dec 27, 2010
Command Line Arguments	0	Mar 7, 2023
struct inside struct	5	Jul 23, 2011
Print with command-line arguments	0	Oct 2, 2022
Copying string till newline	23	Sep 1, 2010

strcpy - my implementation

arnuld

arnuld

arnuld

CBFalconer

Flash Gordon

Nick Keighley

Old Wolf

Old Wolf

vippstar

Richard

Richard

James Kuyper

James Kuyper

Keith Thompson

Keith Thompson

viza

CBFalconer

Keith Thompson

Keith Thompson

arnuld

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads