memcpy() vs. for() performance

C

Case

#define SIZE 100
#define USE_MEMCPY

int main(void)
{
char a[SIZE];
char b[SIZE];
int n;

/* code 'filling' a[] */

#ifdef USE_MEMCPY
memcpy(b, a, sizeof(a));
#else
for (n = 0; n < sizeof(a); n++)
{
b[n] = a[n];
}
#endif
}

/*
Any (general) ideas about when (depending on SIZE) to use
memcpy(), and when to use for()?

<OT>
Any remarks about this issue using GCC, or the Sun compiler,
are welcome.
</OT>
*/
 
J

jacob navia

memcpy implementations tend to be very optimized and well done,
specially for machines that have a block move instruction.

At the other hand, a very clever compiler would recognize
that you are doing a memory move and replace the whole
"for" loop into a memory move instruction if available.

There is no way to know without you measuring the
relative performances in your machine and with your compiler
options

Contrary to what many people think, measuring speeds is not
a waste of time. It provides you with concrete data concerning
your choice.

Why depend on what some "gurus" tell you in C.L.C?

Better find out exactly what is the best: measure it.

jacob
 
C

Case

Case said:
#define SIZE 100
#define USE_MEMCPY

int main(void)
{
char a[SIZE];
char b[SIZE];
int n;

/* code 'filling' a[] */

#ifdef USE_MEMCPY
memcpy(b, a, sizeof(a));

#else
for (n = 0; n < sizeof(a); n++)
{
b[n] = a[n];
}
#endif
}

/*
Any (general) ideas about when (depending on SIZE) to use
memcpy(), and when to use for()?

<OT>
Any remarks about this issue using GCC, or the Sun compiler,
are welcome.
</OT>
*/
 
A

Alex Fraser

Case said:
#define SIZE 100
#define USE_MEMCPY

int main(void)
{
char a[SIZE];
char b[SIZE];
int n;

/* code 'filling' a[] */

#ifdef USE_MEMCPY
memcpy(b, a, sizeof(a));
#else
for (n = 0; n < sizeof(a); n++)
{
b[n] = a[n];
}
#endif
}

/*
Any (general) ideas about when (depending on SIZE) to use
memcpy(), and when to use for()?

I would always use memcpy(). Using a loop instead is a last-resort
optimisation (after a performance problem has been found, and attempts to
reduce the need failed or were rejected).

In practice I would expect the loop to be slower for anything more than a
few bytes, as memcpy() is likely to be implemented efficiently (more so than
can possibly be done in standard C).

Alex
 
D

Dan Pop

In said:
memcpy implementations tend to be very optimized and well done,
specially for machines that have a block move instruction.

They tend to be very optimised and well done for machines without a
block move instruction, too. Been there, done that.

Dan
 
D

Dan Pop

In said:
#define SIZE 100
#define USE_MEMCPY

int main(void)
{
char a[SIZE];
char b[SIZE];
int n;

/* code 'filling' a[] */

#ifdef USE_MEMCPY
memcpy(b, a, sizeof(a));
#else
for (n = 0; n < sizeof(a); n++)
{
b[n] = a[n];
}
#endif
}

/*
Any (general) ideas about when (depending on SIZE) to use
memcpy(), and when to use for()?

ALWAYS use memcpy(), NEVER use for loops, unless you have empirical
evidence that your memcpy() is very poorly implemented.

A well implemented memcpy() can use many tricks to accelerate its
operation.
<OT>
Any remarks about this issue using GCC, or the Sun compiler,
are welcome.

gcc is smart enough to inline memcpy calls for short memory blocks,
when optimisations are enabled:

fangorn:~/tmp 273> cat test.c
#include <string.h>

void foo(int *p, int *q)
{
memcpy(q, p, 2 * sizeof *p);
}
fangorn:~/tmp 274> gcc -O2 -S test.c
fangorn:~/tmp 275> cat test.s
.file "test.c"
.text
.p2align 4,,15
.globl foo
.type foo, @function
foo:
pushl %ebp
movl %esp, %ebp
movl 8(%ebp), %edx
movl 12(%ebp), %ecx
movl (%edx), %eax
movl %eax, (%ecx)
movl 4(%edx), %eax
movl %eax, 4(%ecx)
popl %ebp
ret
.size foo, .-foo
.section .note.GNU-stack,"",@progbits
.ident "GCC: (GNU) 3.3.3"

Even if you have no clue about x86 assembly, you can easily see that there
is no memcpy call in the code generated by gcc for this function. One
more reason to prefer memcpy to for loops.

Dan
 
T

Thomas Matthews

Here are some guidelines for copying data (objects).
1. For small, intrinsic types, use assignment.*
2. For small amounts of data use a "for" loop. **
3. For large amounts of data prefer memcpy. **
4. For large amounts of data don't copy, use pointers.
Copying pointers takes less time.
5. For huge amounts of data, seek hardware assistance.
[Yep, this is not portable.]

* Repeated assignments may be faster and more efficient
than a small "for" loop. Many processors execute
data processing instructions faster than branch
instructions. For example, 4 assignments may be
faster than executing one assignment statement
4 times.

Also try and use your processor's native integer
size. For example, if your processor likes 32-bit
quantities, copy 32-bits at a time, rather than
8-bits.

** The threshold of when to use "for" vs. memcpy
depends on how your compiler uses memcpy. An
inlined version will have less overhead. A
memcpy function will have the minimum overhead
of executing the calling and return sequences.
Measure this overhead. Then determine how many
copy statements can be executed within this
time frame. This will be your threshold of
when to use memcpy vs. for-loop.

I've written my own memcpy function which uses the
processor's specialized instructions. However,
it has a minimum overhead. The threshold between
using memcpy for large areas vs. the DMA device
is very close (on my platform).

The best you can do is to profile. Is the copy
the bottleneck of your system? Is it executed
often?

--
Thomas Matthews

C++ newsgroup welcome message:
http://www.slack.net/~shiva/welcome.txt
C++ Faq: http://www.parashift.com/c++-faq-lite
C Faq: http://www.eskimo.com/~scs/c-faq/top.html
alt.comp.lang.learn.c-c++ faq:
http://www.raos.demon.uk/acllc-c++/faq.html
Other sites:
http://www.josuttis.com -- C++ STL Library book
 
A

Arthur J. O'Dwyer

ALWAYS use memcpy(), NEVER use for loops, unless you have empirical
evidence that your memcpy() is very poorly implemented.

A well implemented memcpy() can use many tricks to accelerate its
operation.

Agreed and agreed. I use 'memcpy' any time I can guarantee it
will be safe, which in C is all the time, as far as I can recall.
Of course, I don't write many programs in which "copy a chunk of
memory from A to B" is much of a bottleneck... :)
<OT>
Any remarks about this issue using GCC, or the Sun compiler,
are welcome.

gcc is smart enough to inline memcpy calls for short memory blocks,
when optimisations are enabled:

fangorn:~/tmp 273> cat test.c
#include <string.h>

void foo(int *p, int *q)
{
memcpy(q, p, 2 * sizeof *p);
}
fangorn:~/tmp 274> gcc -O2 -S test.c
fangorn:~/tmp 275> cat test.s [...]
Even if you have no clue about x86 assembly, you can easily see that there
is no memcpy call in the code generated by gcc for this function. One
more reason to prefer memcpy to for loops.

Unfortunately for your example, "The Dev Team Thinks Of Everything"
in GCC, too:

% cat test.c
#include <string.h>

void foo(int *p, int *q)
{
memcpy(q, p, 2 * sizeof *p);
}
% gcc -O2 -S test.c
% cat test2.c
#include <string.h>

void foo(int *p, int *q)
{
int i;
for (i=0; i < 2; ++i)
q = p;
}
% gcc -O2 -S test2.c
% diff test.s test2.s
1c1
< .file "test.c"
---
.file "test2.c"
%


One more reason to prefer whichever alternative is the more readable
(in this case, the alternative that doesn't involve a function call
to do a one-line task :) .

-Arthur
 
A

Arthur J. O'Dwyer

One more reason to prefer whichever alternative is the more readable
(in this case, the alternative that doesn't involve a function call
to do a one-line task :) .

And to clarify: I mean the function call 'foo', not the function
call 'memcpy'. 'memcpy' is good. 'foo' itself is unnecessary and
ought to be removed. :)
Okay, I think that's clearer.

-Arthur
 
L

luc wastiaux

Thomas said:
I've written my own memcpy function which uses the
processor's specialized instructions. However,
it has a minimum overhead. The threshold between
using memcpy for large areas vs. the DMA device
is very close (on my platform).

Out of curiosity, how do you instruct your processor to use DMA in your
custom memcpy function ?
 
R

Richard Bos

luc wastiaux said:
Out of curiosity, how do you instruct your processor to use DMA in your
custom memcpy function ?

In ISO C, you don't. It all depends on the architecture, and therefore
will differ between, say, an Intel machine and a Sparc.

Richard
 
T

Thomas Matthews

luc said:
Out of curiosity, how do you instruct your processor to use DMA in your
custom memcpy function ?
I use assembly language. The DMA is not a part of the processor,
but a component on the platform. The DMA has a setup overhead,
so it should only be used for large or automated transfers.

--
Thomas Matthews

C++ newsgroup welcome message:
http://www.slack.net/~shiva/welcome.txt
C++ Faq: http://www.parashift.com/c++-faq-lite
C Faq: http://www.eskimo.com/~scs/c-faq/top.html
alt.comp.lang.learn.c-c++ faq:
http://www.raos.demon.uk/acllc-c++/faq.html
Other sites:
http://www.josuttis.com -- C++ STL Library book
 
D

Dan Pop

In said:
Unfortunately for your example, "The Dev Team Thinks Of Everything"
in GCC, too:

% cat test.c
#include <string.h>

void foo(int *p, int *q)
{
memcpy(q, p, 2 * sizeof *p);
}
% gcc -O2 -S test.c
% cat test2.c
#include <string.h>

void foo(int *p, int *q)
{
int i;
for (i=0; i < 2; ++i)
q = p;
}
% gcc -O2 -S test2.c
% diff test.s test2.s
1c1
< .file "test.c"
---
.file "test2.c"
%


Which shows that the memcpy version is still at least as good as the
for loop ;-)
One more reason to prefer whichever alternative is the more readable
(in this case, the alternative that doesn't involve a function call
to do a one-line task :) .

To me, the memcpy alternative is more readable than the other: it
consists of a single, very simple, idiomatic even (for objects that can't
be directly assigned) function call. Which I wouldn't hide behind a
function in real C code: either use as such, inline, or hidden behind
a macro.

Dan
 
E

Edmund Bacon

Arthur said:
Agreed and agreed. I use 'memcpy' any time I can guarantee it
will be safe, which in C is all the time, as far as I can recall.

Aren't there issues with memcpy and overlapping memory locations?

In the following program, isn't the call to memcpy an error?

#include <stdio.h>
#include <string.h>

int main()
{

int x[] = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10};
int *to = x;
int *from = &x[1];

memcpy(to, from, sizeof x - sizeof *x); /* UB ? */

return 0;
}
 
D

Dan Pop

And to clarify: I mean the function call 'foo', not the function
call 'memcpy'. 'memcpy' is good. 'foo' itself is unnecessary and
ought to be removed. :)
Okay, I think that's clearer.

Indeed. foo() was introduced for the sole reason of having a minimal
translation unit ;-)

Dan
 
D

Dan Pop

In said:
I use assembly language. The DMA is not a part of the processor,
but a component on the platform. The DMA has a setup overhead,
so it should only be used for large or automated transfers.

By "automated" I guess you mean "asynchronous to the program execution".
Which has obvious advantages and disadvantages.

Dan
 
D

Dan Pop

In said:
Aren't there issues with memcpy and overlapping memory locations?

Yes, there are.
In the following program, isn't the call to memcpy an error?

#include <stdio.h>
#include <string.h>

int main()
{

int x[] = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10};
int *to = x;
int *from = &x[1];

memcpy(to, from, sizeof x - sizeof *x); /* UB ? */

return 0;
}

Use memmove() in such cases. It has well defined behaviour for
overlapping memory blocks. Depending on the nature of the overlap,
it will either perform an ordinary memcpy() or a copy in the opposite
direction.

Dan
 
E

Eric Sosman

Edmund said:
Arthur J. O'Dwyer wrote:

Agreed and agreed. I use 'memcpy' any time I can guarantee it
will be safe, which in C is all the time, as far as I can recall.


Aren't there issues with memcpy and overlapping memory locations?

In the following program, isn't the call to memcpy an error?
[snip example with overlapping source and destination]

Yes: The behavior of memcpy() is not defined if the
source and destination objects overlap. If that's a
possibility, use memmove() instead.
 
L

luc wastiaux

Dan said:
By "automated" I guess you mean "asynchronous to the program execution".
Which has obvious advantages and disadvantages.

But how do you know when the transfer is complete then ? I assume that
even in synchronous mode, using DMA for large transfers can be beneficial.
 
O

Old Wolf

Arthur J. O'Dwyer said:
Agreed and agreed. I use 'memcpy' any time I can guarantee it
will be safe, which in C is all the time, as far as I can recall.
Of course, I don't write many programs in which "copy a chunk of
memory from A to B" is much of a bottleneck... :)

I have a slight aversion to memcpy, because of one compiler I had to
use, which would copy 65535 bytes if you called it with a third
argument of 0. (I think this is not standard-conforming, but
unfortunately the real world rears its ugly head sometimes).

FWIW this was Hitech C for the Z80 (and I guess the problem came
about because the Z80's block-move instruction does this if you
pass 0 as the length (it decrements and then checks the zero flag),
and the implementers must have not been aware of this behaviour).
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,145
Messages
2,570,824
Members
47,369
Latest member
FTMZ

Latest Threads

Top