memcpy() vs. for() performance

X

Xenos

But how do you know when the transfer is complete then ? I assume that
even in synchronous mode, using DMA for large transfers can be beneficial.
DMA engine usually generate an interrupt or have a status register or such
to indicate completion.
 
C

Case -

Dan said:
ALWAYS use memcpy(), NEVER use for loops, unless you have empirical
evidence that your memcpy() is very poorly implemented.

A well implemented memcpy() can use many tricks to accelerate its
operation.

Thanks Dan, I've moved over to always using memcpy(). And as
you say in a later post, its shorter/more elegant too; this is
an impotant thing (for me) too.
gcc is smart enough to inline memcpy calls for short memory blocks,
when optimisations are enabled:

fangorn:~/tmp 273> cat test.c
#include <string.h>

void foo(int *p, int *q)
{
memcpy(q, p, 2 * sizeof *p);
}
fangorn:~/tmp 274> gcc -O2 -S test.c
fangorn:~/tmp 275> cat test.s
.file "test.c"
.text
.p2align 4,,15
.globl foo
.type foo, @function
foo:
pushl %ebp
movl %esp, %ebp
movl 8(%ebp), %edx
movl 12(%ebp), %ecx
movl (%edx), %eax
movl %eax, (%ecx)
movl 4(%edx), %eax
movl %eax, 4(%ecx)
popl %ebp
ret
.size foo, .-foo
.section .note.GNU-stack,"",@progbits
.ident "GCC: (GNU) 3.3.3"

Even if you have no clue about x86 assembly, you can easily see that there
is no memcpy call in the code generated by gcc for this function. One
more reason to prefer memcpy to for loops.

Yes, this clearly states the point!
 
B

Barry Schwarz

#define SIZE 100
#define USE_MEMCPY

int main(void)
{
char a[SIZE];
char b[SIZE];
int n;

/* code 'filling' a[] */

#ifdef USE_MEMCPY
memcpy(b, a, sizeof(a));
#else
for (n = 0; n < sizeof(a); n++)
{
b[n] = a[n];
}
#endif
}

While the two techniques are equivalent for char, they are not for any
type where sizeof(type) is not 1. You can change the limit check in
the for loop from n<sizeof(a) to n<SIZE to eliminate this restriction.
/*
Any (general) ideas about when (depending on SIZE) to use
memcpy(), and when to use for()?

The call to memcpy has a certain amount of overhead. The break even
point is when this overhead balances out the "extra efficiency" that
may be built in to memcpy. The only practical way to tell is to run
some tests.



<<Remove the del for email>>
 
C

Case

Dan said:
ALWAYS use memcpy(), NEVER use for loops, unless you have empirical
evidence that your memcpy() is very poorly implemented.

A well implemented memcpy() can use many tricks to accelerate its
operation.

I did some tests myself, and found out that this is only true
when the block size is fixed/known. GCC nor Sun-CC 'inline/optimize'
the memcpy() when size is a variable. Unfortunately, at many
places in my code, the size is variable. Although my understanding
of this issue has increased, I must admit this was a flaw in my
initial question: an over simplification.

I'd be interested to hear comments/insights about this variable
case.

Case
 
D

Dan Pop

In said:
I did some tests myself, and found out that this is only true
when the block size is fixed/known. GCC nor Sun-CC 'inline/optimize'
the memcpy() when size is a variable. Unfortunately, at many
places in my code, the size is variable. Although my understanding
of this issue has increased, I must admit this was a flaw in my
initial question: an over simplification.

I'd be interested to hear comments/insights about this variable
case.

It would be *very* helpful if you didn't mix up things. Inlining is one
thing and providing a highly optimised library version of memcpy is a
completely different one.

When the size is unknown at compile time (or too large), the compiler
cannot won't inline the memcpy call, it will call the library version.
But the library version can still be much faster than the code generated
by the compiler from a for loop. Especially when dealing with arrays of
characters.

If you want ultimate answers, benchmark the two versions yourself.
Keep in mind that they cannot be extrapolated to other implementations.

Dan
 
C

Case

Dan said:
It would be *very* helpful if you didn't mix up things. Inlining is one
thing and providing a highly optimised library version of memcpy is a
completely different one.

I know the difference. What the compiler does looks like (in my eyes)
a form of inlining (the function call is replaced). But at the same
time the code that is inserted is highly optimized for the particular
block size; it's not just inserting a standard piece of memcpy code.
That's why I write 'inline/optimize', and quoted the expression to
mark it as not to be taken to literally, because it's a combination.
When the size is unknown at compile time (or too large), the compiler
cannot won't inline the memcpy call, it will call the library version.

When I had to make a choice between the two, I would call it
call it optimization. I'm surprized that you seem to prefer the
term inlining. Why?
But the library version can still be much faster than the code generated
by the compiler from a for loop. Especially when dealing with arrays of
characters.

Agreed. And, for simplicity I'd rather use one way all the time,
instead of context depedently (either code-time or even run-time)
choosing between a couple of alternatives. Otherwise this will
easily fall within the famous 97%.
If you want ultimate answers, benchmark the two versions yourself.
Keep in mind that they cannot be extrapolated to other implementations.

Yep, one other good reason to always use memcpy(). However, how was
the saying again .... "Never say always!" :)

Thanks,

Case
 
D

Dan Pop

In said:
When I had to make a choice between the two, I would call it
call it optimization. I'm surprized that you seem to prefer the
term inlining. Why?

Because this is the specific name of that particular optimisation.
What is so difficult to understand?

As I said, inlining is NOT the only way an implementation can optimise
a memcpy call. There are plenty of optimisations that can be applied
to the library version of memcpy (especially if it's not written in C).
Yep, one other good reason to always use memcpy(). However, how was
the saying again .... "Never say always!" :)

Another failed attempt at humour...

Dan
 
K

Keith Thompson

Would that be vitreous humor or aqueous humor :)

checking for [OT] tag ... ok

Yes, the eye certainly lens itself to puns. But enough of this
ocularity. If the jokes get any cornea, I'll give you 40 lashes.
 
C

Case -

Dan said:
Only when a large enough number of beholders perceive it as such.

No, on the contrary! Needing only the personal (i.e.,
individual) observation, is at the heart of the original
'beholder-saying'.

Case
 
D

Dan Pop

In said:
No, on the contrary! Needing only the personal (i.e.,
individual) observation, is at the heart of the original
'beholder-saying'.

Which is why it doesn't apply to humour ;-)

Dan
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,145
Messages
2,570,824
Members
47,369
Latest member
FTMZ

Latest Threads

Top