Hmmm, It's Curious

M

Mark

Netocrat said:
As for your implementation, a little messy and contains unnecessary
overhead... [snip code]
Untested (I know it won't work with the driver below for as I've
previously stated, it's flawed) but it should outperform both your and
the
original poster's functions ;) If you do decide to test it, let us know
the results.

Approximately 4 times slower on my machine. Why did you expect better
performance? Even though you are getting back the length and don't have
to do a separate strlen, sprintf is by its nature much more complex than
strcat or strcpy and no doubt these two simple library functions are
much more highly optimised for this task, probably by using processor
instructions that copy many bytes at a time, which I would not expect
sprintf to be optimised to do.

But yes, your implementation is neat and concise.

You're right... I guess I should have tested it first.
I expected sprintf()'s performance to be nearly the same
as strcpy()'s when no conversion specifiers were present
in the format string... and that 1 pass with sprintf()
would outperform the 2 passes needed by strcpy() and strlen().
Guess I was mistaken. :)
Here's another version... tested this time... and this one should
substantially outperform the other.

char *
vstrcat(char *s, ...)
{
register char *p = &s[strlen(s)];
register char *arg;
va_list ap;
va_start(ap, s);
while((arg = va_arg(ap, char *)) != NULL)
while((*p++ = *arg++) != 0);
va_end(ap);
return s;
}

Mark
 
M

Mark

Mark said:
Netocrat said:
Stan Milam wrote:
As for your implementation, a little messy and contains unnecessary
overhead... [snip code]
Untested (I know it won't work with the driver below for as I've
previously stated, it's flawed) but it should outperform both your and
the
original poster's functions ;) If you do decide to test it, let us know
the results.

Approximately 4 times slower on my machine. Why did you expect better
performance? Even though you are getting back the length and don't have
to do a separate strlen, sprintf is by its nature much more complex than
strcat or strcpy and no doubt these two simple library functions are
much more highly optimised for this task, probably by using processor
instructions that copy many bytes at a time, which I would not expect
sprintf to be optimised to do.

But yes, your implementation is neat and concise.

You're right... I guess I should have tested it first.
I expected sprintf()'s performance to be nearly the same
as strcpy()'s when no conversion specifiers were present
in the format string... and that 1 pass with sprintf()
would outperform the 2 passes needed by strcpy() and strlen().
Guess I was mistaken. :)
Here's another version... tested this time... and this one should
substantially outperform the other.

char *
vstrcat(char *s, ...)
{
register char *p = &s[strlen(s)];
register char *arg;
va_list ap;
va_start(ap, s);
while((arg = va_arg(ap, char *)) != NULL)
while((*p++ = *arg++) != 0);
va_end(ap);
return s;
}

Mark

Might as well post it before someone else does...
without optimizations - my new function still sucks!
Following results with 10,000,000 iterations on my system

gcc X.c
strcat() = 12.460928
pete-vstrcat=9.812500
mine=15.085938

with optimizations, very different results
gcc -O1 X.c
strcat() = 12.437500
pete-vstrcat=9.164062
mine=4.445312

gcc -02 X.c
strcat() = 12.468750
pete-vstrcat=9.156250
mine=4.468750

gcc -03 X.c
strcat() = 12.453125
pete-vstrcat=9.148438
mine=4.460938

Neither the strcat() or pete's vstrcat() seem to be
affected much by the optimizations, though it makes
one hell of a difference to my code!

Mark
 
N

Netocrat

Here's another version... tested this time...

Not too thoroughly though :p

I already posted an equivalent solution to that except that it didn't use
the register keyword, and boy was it slow. With register though it is the
best solution.

Anyhow the problem is that the increments should occur within the loop,
not in the test. A for loop is more appropriate. Also the != 0 is
redundant.
while((*p++ = *arg++) != 0);
becomes
for( ; *p = *arg; p++, arg++) ;
and this one should
substantially outperform the other.

It sure does. It's about 25% faster than multiple strcats.
 
N

Netocrat

Might as well post it before someone else does... without optimizations -
my new function still sucks! Following results with 10,000,000 iterations
on my system

gcc X.c
strcat() = 12.460928
pete-vstrcat=9.812500
mine=15.085938

with optimizations, very different results gcc -O1 X.c
strcat() = 12.437500
pete-vstrcat=9.164062
mine=4.445312

gcc -02 X.c
strcat() = 12.468750
pete-vstrcat=9.156250
mine=4.468750

gcc -03 X.c
strcat() = 12.453125
pete-vstrcat=9.148438
mine=4.460938

Neither the strcat() or pete's vstrcat() seem to be affected much by the
optimizations, though it makes one hell of a difference to my code!

Actually that's it - I thought it was the register keyword that improved
my originally very slow for loop, but I tested it with/without that
keyword and with/without optimisations:

reg no reg
opt 6.98 6.98
no opt 15.59 18.88

Seems that registers are being used anyhow under optimisation; without
optimisation there's not very much improvement using registers. Which
goes to show what I'm coming to believe more and more... leave
optimisation to the compiler. Keywords like register are funky and all,
but in most cases that sort of decision is best made by the compiler.
 
M

Mark

Netocrat said:
Pretty thouroughly, ;) results posted elsewhere ;)
Anyhow the problem is that the increments should occur within the loop,
not in the test.
It is being done in the loop, after the test... no?
A for loop is more appropriate. Also the != 0 is
redundant.

I prefer the redundancy in that instance (assignment in condition) as it
makes the code clear to others who read it someday!

for instance, consider:
if(a = b)
valid line, may be exactly what you want to do, but forces others to examine
surrounding code to see if you've made a mistake... not to mention the fact
that your compiler may complain about it if you've got warnings turned on.

if((a = b))
also a valid line, may stop your compiler from complaining, but forces
others to examine surrounding code to wonder if you threw that in merely
to stop the compiler from bitching...

if((a = b) != 0)
equivalent to both previous examples... but now both your compiler
AND your co-workers should be happy. No question as to your intentions.

while((*p++ = *arg++) != 0);
becomes
for( ; *p = *arg; p++, arg++) ;
I did use a for() loop initially (noticed that's what my implementation
did in strcpy()) but I got slightly better results with the while() for some
reason when I was testing... so that's the version I posted.

Mark
 
S

Stan Milam

CBFalconer said:
Stan Milam wrote:

... snip ...



Which has widely variable running time, and has to access every
byte on the way up. In the bad old days it was also subject to
failure to restart after an interrupt.

I never heard about the failure to restart, or maybe I forgot(it was a
long time ago), and it is true that it scanned every byte on the way up,
but since it is a single instruction executing in the processor it is
way faster than say C code running down to find the end of the string as
a C version of strcat() would have to do.

Regards,
Stan.
 
N

Netocrat

Pretty thouroughly, ;) results posted elsewhere ;)

You have a good case to make that you did some benchmarking. As for
thorough testing... well, here's the output of your function:

"This is a "

which should have been

"This is a bunch of strings that we will concatenate very efficiently by
always knowing where the end of the string is going to be. This makes
vstrcat() much more efficient than successive calls to strcat!"

Doh! :p
It is being done in the loop, after the test... no?

Sure thing. Problem is that even when the test is true and the loop
exits, the increments still occur. So the pointers get incremented past
the terminating '\0'.

When you use a for loop or put the increments into the loop body
rather than the test, this isn't a problem because the increments don't
occur when the loop exits.
A for loop is more appropriate. Also the != 0 is
redundant.

I prefer the redundancy in that instance (assignment in condition) as it
makes the code clear to others who read it someday!
[snip]
if((a = b) != 0)
equivalent to both previous examples... but now both your compiler AND
your co-workers should be happy. No question as to your intentions.

Fair enough, it's a matter of preference and style and you have reasons
for your choice.
I did use a for() loop initially (noticed that's what my implementation
did in strcpy()) but I got slightly better results with the while() for
some reason when I was testing... so that's the version I posted.

Yes I noticed that the incorrect version is slightly faster than the fixed
version under gcc - which compiler we're both using; I don't know why -
it's counter-intuitive - the code requires more increment operations yet
it's faster...
 
M

Mark

Netocrat said:
You have a good case to make that you did some benchmarking. As for
thorough testing... well, here's the output of your function:
"This is a "
which should have been
"This is a bunch of strings that we will concatenate very efficiently by
always knowing where the end of the string is going to be. This makes
vstrcat() much more efficient than successive calls to strcat!"
Doh! :p
:) :) :) :) :) :) :) :)
ouch, you got me ;-)
(those are my shit-eating grins)
 
B

Ben Pfaff

Stan Milam said:
I have heard that before, but like I said it was a long time ago for
me and I have not kept up with that corner of the tech field. Now I
use C for just about everything :).

The point I was trying to make is that performance is very much
system-dependent and thus not a great topic for comp.lang.c,
which focuses on the C language, independent of particular
implementations.
 
J

Jean-Claude Arbaut

The point I was trying to make is that performance is very much
system-dependent and thus not a great topic for comp.lang.c,
which focuses on the C language, independent of particular
implementations.

But it's good to write portable C code more likely to be optimized, so
that's not completely off-topic either. Indeed, it's worth the "pain" of
learning the C standard, and complying to it, if we know that the compiler
is able to optimize, and in which way, because then we don't need to use
processor specific extensions, or at least we need them less often.
So it's a good advertisement to show how portable C is translated.
There were already examples on this NG, with the thread on "switch", and the
other on "converting 4 bytes to an int", and probably much more.
 
S

Stan Milam

pete said:
Stan Milam wrote:




EINVAL isn't standard C.

From the C standard for errno.h:

Additional macro definitions, beginning with E and a digit or E and an
upper-case letter, may also be specified by the implementation.

Regards,
Stan Milam.
 
C

Chris Torek

From the C standard for errno.h:

Additional macro definitions, beginning with E and a digit or E and an
upper-case letter, may also be specified by the implementation.

Unfortunately, all this means is that you, the programmer, cannot
do things like:

#include <errno.h>
...
/* flags for object description */
#define ELASTIC 1 /* stretchy */
#define ELEPHANTINE 2 /* very large */
#define EMERALD 4 /* a nice green color */
#define EMERY 8 /* rough */
#define EXPENSIVE 16 /* costs over a million bucks */
#define EXQUISITE 32 /* and worth it */

because the implementor might put conflicting definitions of each of
those macros in his <errno.h>. You cannot count on them being in
<errno.h> either, of course -- nor, at least in the context of
"pure ANSI/ISO C", can you count on EINVAL, EISDIR, and E2BIG
being present.

(Aside: I am not sure what is very large, green, rough, stretchy,
and worth over a million bucks... :) )

I tend to ignore this particular part of the standard, as I consider
taking away "almost every identifier beginning with uppercase E"
a form of overreaching. In fact, I tend to ignore almost everything
the C standard says about <errno.h>, preferring other standards that
are at least more useful (such as POSIX). But here in comp.lang.c
I will not recommend that others do so. :)
 
C

CBFalconer

Chris said:
.... snip ...

Unfortunately, all this means is that you, the programmer, cannot
do things like:

#include <errno.h>
...
/* flags for object description */
#define ELASTIC 1 /* stretchy */
#define ELEPHANTINE 2 /* very large */
#define EMERALD 4 /* a nice green color */
#define EMERY 8 /* rough */
#define EXPENSIVE 16 /* costs over a million bucks */
#define EXQUISITE 32 /* and worth it */

because the implementor might put conflicting definitions of each
of those macros in his <errno.h>. You cannot count on them being
in <errno.h> either, of course -- nor, at least in the context of
"pure ANSI/ISO C", can you count on EINVAL, EISDIR, and E2BIG
being present.

(Aside: I am not sure what is very large, green, rough, stretchy,
and worth over a million bucks... :) )

ROTFL. Kermit the Frog meets most of the requirements. Will you
settle for 61?
 
J

jdallen2000

Stan said:
I never heard about the failure to restart, or maybe I forgot ...

IIRC the problem arose when there were two prefixes, e.g.
REP and segment-override. Restart overlooked one of the
prefixes.

Talk of fast strcpy()/memcpy() loops reminds me of the
68020-based Workstation-3 from WellKnown Respected Vendor, Inc. (tm).
I was amazed that a simple memcpy() in C dramatically outperformed
the library routine from this Highly Respected company, so I
examined their (assembly language) source code. It was a 2-instruction
loop and the comment told the story:
"optimized for the cache of the 68010."
The 68010 had been used in the company's Workstation-2 which had
been out of production for several years!

James
 
N

Netocrat

IIRC the problem arose when there were two prefixes, e.g. REP and
segment-override. Restart overlooked one of the prefixes.

Talk of fast strcpy()/memcpy() loops reminds me of the 68020-based
Workstation-3 from WellKnown Respected Vendor, Inc. (tm). I was amazed
that a simple memcpy() in C dramatically outperformed the library routine
from this Highly Respected company, so I examined their (assembly
language) source code. It was a 2-instruction loop and the comment told
the story:
"optimized for the cache of the 68010." The 68010 had been used in the
company's Workstation-2 which had been out of production for several
years!

I'm finding something similar with memcmp as implemented by glibc for
pentiums. A naive C implementation compiled with full optimisation
outperforms the library version by more than a factor of three for the
first 3 bytes and is well ahead for at least the first 35. After 100
bytes there doesn't seem to be any significant difference.

The reason: glibc uses the instruction repe; cmpsb which is apparently
quite slow on some chips.
 
R

Randy Howard

I'm finding something similar with memcmp as implemented by glibc for
pentiums. A naive C implementation compiled with full optimisation
outperforms the library version by more than a factor of three for the
first 3 bytes and is well ahead for at least the first 35. After 100
bytes there doesn't seem to be any significant difference.

think that is interesting, try the MSVC implementation of
memcmp(), it is simply HORRIBLE on large blocks. The native
linux glibc implementation is pretty good, about 30-40% faster
on the same hardware. Resorting to an inline asm implementation
of memcmp() on Windows will get almost identical performance.
Basically, memcmp() with MSVC is not worth using if it forms any
part of the critical path for your application.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,166
Messages
2,570,901
Members
47,442
Latest member
KevinLocki

Latest Threads

Top