Problem in Strdup()

Gordon Burditt · Aug 29, 2005

I appreciate the opportunity to learn something

new, but statements to the effect of "I like X
better than Y" without any other explanation
rarely afford the opportunity to learn much of
anything beyond the statement itself.

I like X better than Y because X is more like Y than Y is like X.

Gordon L. Burditt

Anton Petrusevich · Aug 29, 2005

CBFalconer said:
Which brings up a style point. Why create multiple exit points
unnecessarily, instead of simply making the copy conditional on
suitable conditions. i.e.:

char *dupstr(const char *src)
{
char *p;

if (p = malloc(strlen(src) + 1)) strcpy(p, src);
return p;
}

Great! But why not to scan src only once?

char *dupstr(const char *src)
{
size_t len;
char *dst;
if((dst = malloc((len = strlen(src) + 1))))
memcpy(dst, src, len);
return dst;
}

Keith Thompson · Aug 29, 2005

Anton Petrusevich said:
Great! But why not to scan src only once?

char *dupstr(const char *src)
{
size_t len;
char *dst;
if((dst = malloc((len = strlen(src) + 1))))
memcpy(dst, src, len);
return dst;
}

That still scans src twice, once for strlen() and once for memcpy().

strcpy() scans until it sees a '\0'. memcpy() scans for a specified
number of bytes. It's not at all obvious that one is going to be
faster than the other (though one or the other could well be on some
particular platform).

pete · Aug 29, 2005

Tim said:
That's all well and good, but what qualities allow
the first to be judged better than the second?
What is your metric?

Keith's is simpler.

Mark B · Aug 29, 2005

pete said:
Keith's is simpler.

They are both 'simple'... the usage of the conditional operator
in the example is not complex.

My personal preference however would be to see the != NULL
and braces used in Keith's condition dropped.
Then I too would like Keith's better for no other reason than
it's how I would have coded it.

As for 2nd example: 2 spaces after return, but none between
the arguments to strcpy()? And why the blank line in the middle?
Sloppy? Maybe. Definately not complicated. Either could be
used as an acceptable solution as they get the job done
efficiently and properly.

Mark

pete · Aug 29, 2005

Mark said:
They are both 'simple'... the usage of the conditional operator
in the example is not complex.

My personal preference however would be to see the != NULL

I like the != NULL.
When you see
(p != NULL)
then you have enough context to know that p is a pointer,
without looking anywhere else.

and braces used in Keith's condition dropped.

I prefer compound statements with all if, else, and loop, statements.
For reasons similar to these:
http://groups.google.com/group/comp.lang.c/msg/1e94fc33830c019b?hl=en&

Then I too would like Keith's better for no other reason than
it's how I would have coded it.

As for 2nd example: 2 spaces after return, but none between
the arguments to strcpy()? And why the blank line in the middle?
Sloppy?

I prefer to separate declarations from statements
with a blank line in compound statements
because it's a common convention.

Mark B · Aug 30, 2005

pete said:
I like the != NULL.
When you see
(p != NULL)
then you have enough context to know that p is a pointer,
without looking anywhere else.

You have enough context to know that p 'should be' a pointer
without having to look anywhere else... doesn't necessarily
mean that it is though, does it? But even if you could ascertain
that it was in fact a pointer (which I don't know how you can do
without a peek at the definition) what does it point to?
What good is knowing you have a pointer if you don't know
what it points to? Don't you still need to familiarize yourself
with the code before making any modifications? Doesn't that
still entail a peek at the definition?

I prefer compound statements with all if, else, and loop, statements.
For reasons similar to these:
http://groups.google.com/group/comp.lang.c/msg/1e94fc33830c019b?hl=en&

(refers to a post made by akarl on 8/22 - applicable portions quoted)

Regarding advantage 1:
* Bugs are avoided (multiple statements on one line, the classical
* else-if situation, wrong indentation...)

I don't put multiple statements on one line, always indent properly, and
there
is no classical else-if situation as I do use braces when necessary...
(when the compiler bitches about something when I have warning turned up
I deem it necessary

Regarding advantage 2:
* If you use braces only when necessary, you have to decide if they are
* required or not at every control statement you write. If you add one
* statement to a single statement you have to add the braces, and if you
* remove one of two statements you probably want to remove the braces as
* well to achieve consistency throughout the code. This requires some
* extra editing. Since all choices in programming are distracting, the
* irrelevant ones should be removed.

I typically know when I'm coding if I intend to write one or many statements
before I even begin to form the condition... by that point the decision has
already been made. If my requirements later change and I have to add
(or remove) a line I don't find it distracting to do so... though I do find
it
distracting to know that some coders consider 'all choices in programming'
to be distracting! Maybe they should familiarize themselves with the code
a little more before making changes. You should know what your
modifications
are going to do before you make them, no?

Regarding advantage 3:
* With explicit control statement termination the language gets less
* complicated and more regular.

Claim doesn't make sense. There is 'explicit control statement termination'
regardless of whether or not braces are used. In one instance it terminates
with the brace... in the other it terminates after the statement(s).
Shouldn't be any ambiguity there.

Regarding disadvantage 1 (and only):
* The code gets somewhat more cluttered and slightly less readable in
* case of short control statements.

I agree

I prefer to separate declarations from statements
with a blank line in compound statements
because it's a common convention.

Yet you liked Keith's code which omitted it?

In general I also tend to seperate my declarations from the first
statement with a blank line, but for no other reason than for
code aesthetics (yes, I think it looks nice).

But: if I'm writing a 2 line function which consists of:
1) a function call to initialize a variable in its declaration
2) a return statement

I consider 1 to be a statement which no longer requires
a blank line to seperating it from the only other statement
in the function... call me silly

Mark

Joe Wright · Aug 30, 2005

Gordon said:
I like X better than Y because X is more like Y than Y is like X.

Gordon L. Burditt

You know Y is better than X. You just like to fight.

Mac · Aug 30, 2005

That still scans src twice, once for strlen() and once for memcpy().

strcpy() scans until it sees a '\0'. memcpy() scans for a specified
number of bytes. It's not at all obvious that one is going to be
faster than the other (though one or the other could well be on some
particular platform).

Well, it is VERY hard for me to imagine that strcpy() will execute faster
than memcpy() when the string length is known ahead of time. I think in
general, memcpy() will be as fast as the implementers know how to make it.

But from the standard's perspective, I guess there is no reason to favor
either over the other.

--Mac

Walter Roberson · Aug 30, 2005

Well, it is VERY hard for me to imagine that strcpy() will execute faster
than memcpy() when the string length is known ahead of time. I think in
general, memcpy() will be as fast as the implementers know how to make it.

A memcpy() implementation could potentially detect the case where
both memory areas had the same word alignment, copy a few bytes to
reach the aligning point, and then copy words (or longwords) until
it reached near the end and then copied bytes again. This would,
for example, work if both areas happened to be the beginning of
malloc()'d memory, as malloc() uses the most restrictive alignment.
It could do similarily for the case where the areas could be aligned
with shorter words... even moving by short int might be faster than
moving by byte.

Unfortunately, in a lot of cases, the two areas will not share any
mutual alignment, such as if one is copying starting at an "odd" memory
location and the source starts at an "even" location. And in that
particular case, strcpy() is potentially faster. When the byte
is moved, many architectures will automatically "test" the byte
value, allowing a "branch on zero" without the overhead of an
explicit test. That can be faster than decrementing a counter
location (which requires a write operation, even if only a
register write) and branching on the implicitly-tested value of it --
it takes one less operation, since one has moved the byte anyhow...

pete · Aug 30, 2005

Mark said:
Yet you liked Keith's code which omitted it?

Yes.
I didn't say it was perfect.

In general I also tend to seperate my declarations from the first
statement with a blank line, but for no other reason than for
code aesthetics (yes, I think it looks nice).

But: if I'm writing a 2 line function which consists of:
1) a function call to initialize a variable in its declaration
2) a return statement

I consider 1 to be a statement
which no longer requires
a blank line to seperating it from the only other statement
in the function... call me silly

Then you are wrong.
Declarations and definitions and are not statements.

Mark B · Aug 30, 2005

pete said:
Yes.
I didn't say it was perfect.

Then you are wrong.
Declarations and definitions and are not statements.

Nor did I mean to claim that it actually was a statement...
I was implying that I would treat that particular declaration as if it were
a statement.
In retrospect, I should have left the last part of the post off completely
as I'd already
(appropriately) stated that for me personally I separate declarations from
statements
for code aesthetics, and no other reason. When it looks better with a blank
line, it
gets a blank line...

Can I assume (based on the fact that you snipped about 90% of my previous
post)
that you agreed with everything else I had written?

Mark

Tim Rentsch · Aug 30, 2005

pete said:
Keith's is simpler.

Is this some objective metric that you're applying,
or just your personal subjective metric? Most of
the metrics I'm used to would rank the second
definition as simpler.

Mac · Aug 31, 2005

A memcpy() implementation could potentially detect the case where
both memory areas had the same word alignment, copy a few bytes to
reach the aligning point, and then copy words (or longwords) until
it reached near the end and then copied bytes again. This would,
for example, work if both areas happened to be the beginning of
malloc()'d memory, as malloc() uses the most restrictive alignment.
It could do similarily for the case where the areas could be aligned
with shorter words... even moving by short int might be faster than
moving by byte.

Unfortunately, in a lot of cases, the two areas will not share any
mutual alignment, such as if one is copying starting at an "odd" memory
location and the source starts at an "even" location. And in that
particular case, strcpy() is potentially faster. When the byte
is moved, many architectures will automatically "test" the byte
value, allowing a "branch on zero" without the overhead of an
explicit test. That can be faster than decrementing a counter
location (which requires a write operation, even if only a
register write) and branching on the implicitly-tested value of it --
it takes one less operation, since one has moved the byte anyhow...

I'm not going to enter into this kind of argument here. Anyone who cares
which way is faster should profile both ways. Enough said.

;-)

--Mac

Tim Rentsch · Aug 31, 2005

Keith Thompson said:
That still scans src twice, once for strlen() and once for memcpy().

strcpy() scans until it sees a '\0'. memcpy() scans for a specified
number of bytes. It's not at all obvious that one is going to be
faster than the other (though one or the other could well be on some
particular platform).

Of course, what Anton meant was that strcpy has to "scan" the
string in the sense of examining the values to see where the
terminating zero is; and that memcpy doesn't have to compare
values to zero, only move them. The entry for "scan" at m-w.com
gives:

to examine by point-by-point observation or checking

It seems reasonable to say that strcpy does this examining/checking
and that memcpy doesn't.

As it appears that Anton has learned English not as his first
language, that makes it more likely that he would use the word in
the sense that a dictionary defines it.

Keith Thompson · Aug 31, 2005

Tim Rentsch said:
Of course, what Anton meant was that strcpy has to "scan" the
string in the sense of examining the values to see where the
terminating zero is; and that memcpy doesn't have to compare
values to zero, only move them. The entry for "scan" at m-w.com
gives:

to examine by point-by-point observation or checking

It seems reasonable to say that strcpy does this examining/checking
and that memcpy doesn't.

As it appears that Anton has learned English not as his first
language, that makes it more likely that he would use the word in
the sense that a dictionary defines it.

As strcpy() traverses the string, it has to check each element to see
whether it's equal to '\0'.

As memcpy() traverses the string, it has to check for each element
whether it's reached the number of bytes it was asked to copy.

My point was simply that it's not clear that either is going to be
faster than the other; I wasn't quibbling about the meaning of "scan".

Tim Rentsch · Sep 1, 2005

Keith Thompson said:
As strcpy() traverses the string, it has to check each element to see
whether it's equal to '\0'.

Right, strcpy() has to check with each char transferred.

As memcpy() traverses the string, it has to check for each element
whether it's reached the number of bytes it was asked to copy.

The memcpy() code doesn't have to check after each char. Because
memcpy() knows the length ahead of time, units larger than char
units can be transferred at once. Standard loop unrolling stuff.
There still will be loop tests, but very likely fewer of them.

My point was simply that it's not clear that either is going to be
faster than the other; I wasn't quibbling about the meaning of "scan".

Of course, there's no guarantee that memcpy() will run faster than
strcpy() on all platforms, but wouldn't you expect it to run faster
on most platforms? Certainly I would, at least for lengths more
than five or ten characters.

Just as a datapoint, timing on an x86 showed dupstr()-using-strcpy()
running behind dupstr()-using-memcpy(), anywhere from about 1%
slower to about 75% slower, depending on how many characters were
being moved.

pete · Sep 9, 2005

Tim said:
Is this some objective metric that you're applying,
or just your personal subjective metric? Most of
the metrics I'm used to would rank the second
definition as simpler.

Keith's only decides whether or not to call strcpy.
The other one also decides which value to return.

Fibonacci	0	May 13, 2023
Scanf is being prioritized over printf ?	1	Nov 5, 2023
making bytes out of bits	33	Dec 7, 2008
Why strdup	7	Jun 25, 2008
Command Line Arguments	0	Mar 7, 2023
C pipe	1	Dec 9, 2021
Please help with C programming to save GPS reception data in Raspberry Pi.	0	Dec 8, 2022
Dynamic Array Size Problem??	9	Jul 10, 2023

Problem in Strdup()

Gordon Burditt

Anton Petrusevich

Keith Thompson

pete

Mark B

pete

Mark B

Joe Wright

Mac

Walter Roberson

pete

Mark B

Tim Rentsch

Mac

Tim Rentsch

Keith Thompson

Tim Rentsch

pete

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads