memcpy() vs. for() performance

Case · Jun 30, 2004

#define SIZE 100
#define USE_MEMCPY

int main(void)
{
char a[SIZE];
char b[SIZE];
int n;

/* code 'filling' a[] */

#ifdef USE_MEMCPY
memcpy(b, a, sizeof(a));
#else
for (n = 0; n < sizeof(a); n++)
{
b[n] = a[n];
}
#endif
}

/*
Any (general) ideas about when (depending on SIZE) to use
memcpy(), and when to use for()?

<OT>
Any remarks about this issue using GCC, or the Sun compiler,
are welcome.
</OT>
*/

jacob navia · Jun 30, 2004

memcpy implementations tend to be very optimized and well done,
specially for machines that have a block move instruction.

At the other hand, a very clever compiler would recognize
that you are doing a memory move and replace the whole
"for" loop into a memory move instruction if available.

There is no way to know without you measuring the
relative performances in your machine and with your compiler
options

Contrary to what many people think, measuring speeds is not
a waste of time. It provides you with concrete data concerning
your choice.

Why depend on what some "gurus" tell you in C.L.C?

Better find out exactly what is the best: measure it.

jacob

Case · Jun 30, 2004

Case said:
#define SIZE 100
#define USE_MEMCPY

int main(void)
{
char a[SIZE];
char b[SIZE];
int n;

/* code 'filling' a[] */

#ifdef USE_MEMCPY
memcpy(b, a, sizeof(a));

#else
for (n = 0; n < sizeof(a); n++)
{
b[n] = a[n];
}
#endif
}

/*
Any (general) ideas about when (depending on SIZE) to use
memcpy(), and when to use for()?

<OT>
Any remarks about this issue using GCC, or the Sun compiler,
are welcome.
</OT>
*/

Alex Fraser · Jun 30, 2004

Case said:
#define SIZE 100
#define USE_MEMCPY

int main(void)
{
char a[SIZE];
char b[SIZE];
int n;

/* code 'filling' a[] */

#ifdef USE_MEMCPY
memcpy(b, a, sizeof(a));
#else
for (n = 0; n < sizeof(a); n++)
{
b[n] = a[n];
}
#endif
}

/*
Any (general) ideas about when (depending on SIZE) to use
memcpy(), and when to use for()?

I would always use memcpy(). Using a loop instead is a last-resort
optimisation (after a performance problem has been found, and attempts to
reduce the need failed or were rejected).

In practice I would expect the loop to be slower for anything more than a
few bytes, as memcpy() is likely to be implemented efficiently (more so than
can possibly be done in standard C).

Alex

Dan Pop · Jun 30, 2004

In said:
memcpy implementations tend to be very optimized and well done,
specially for machines that have a block move instruction.

They tend to be very optimised and well done for machines without a
block move instruction, too. Been there, done that.

Dan

Dan Pop · Jun 30, 2004

In said:
#define SIZE 100
#define USE_MEMCPY

int main(void)
{
char a[SIZE];
char b[SIZE];
int n;

/* code 'filling' a[] */

#ifdef USE_MEMCPY
memcpy(b, a, sizeof(a));
#else
for (n = 0; n < sizeof(a); n++)
{
b[n] = a[n];
}
#endif
}

/*
Any (general) ideas about when (depending on SIZE) to use
memcpy(), and when to use for()?

ALWAYS use memcpy(), NEVER use for loops, unless you have empirical
evidence that your memcpy() is very poorly implemented.

A well implemented memcpy() can use many tricks to accelerate its
operation.

<OT>
Any remarks about this issue using GCC, or the Sun compiler,
are welcome.

gcc is smart enough to inline memcpy calls for short memory blocks,
when optimisations are enabled:

fangorn:~/tmp 273> cat test.c
#include <string.h>

void foo(int *p, int *q)
{
memcpy(q, p, 2 * sizeof *p);
}
fangorn:~/tmp 274> gcc -O2 -S test.c
fangorn:~/tmp 275> cat test.s
.file "test.c"
.text
.p2align 4,,15
.globl foo
.type foo, @function
foo:
pushl %ebp
movl %esp, %ebp
movl 8(%ebp), %edx
movl 12(%ebp), %ecx
movl (%edx), %eax
movl %eax, (%ecx)
movl 4(%edx), %eax
movl %eax, 4(%ecx)
popl %ebp
ret
.size foo, .-foo
.section .note.GNU-stack,"",@progbits
.ident "GCC: (GNU) 3.3.3"

Even if you have no clue about x86 assembly, you can easily see that there
is no memcpy call in the code generated by gcc for this function. One
more reason to prefer memcpy to for loops.

</OT>
*/

Dan

Thomas Matthews · Jun 30, 2004

Here are some guidelines for copying data (objects).
1. For small, intrinsic types, use assignment.*
2. For small amounts of data use a "for" loop. **
3. For large amounts of data prefer memcpy. **
4. For large amounts of data don't copy, use pointers.
Copying pointers takes less time.
5. For huge amounts of data, seek hardware assistance.
[Yep, this is not portable.]

* Repeated assignments may be faster and more efficient
than a small "for" loop. Many processors execute
data processing instructions faster than branch
instructions. For example, 4 assignments may be
faster than executing one assignment statement
4 times.

Also try and use your processor's native integer
size. For example, if your processor likes 32-bit
quantities, copy 32-bits at a time, rather than
8-bits.

** The threshold of when to use "for" vs. memcpy
depends on how your compiler uses memcpy. An
inlined version will have less overhead. A
memcpy function will have the minimum overhead
of executing the calling and return sequences.
Measure this overhead. Then determine how many
copy statements can be executed within this
time frame. This will be your threshold of
when to use memcpy vs. for-loop.

I've written my own memcpy function which uses the
processor's specialized instructions. However,
it has a minimum overhead. The threshold between
using memcpy for large areas vs. the DMA device
is very close (on my platform).

The best you can do is to profile. Is the copy
the bottleneck of your system? Is it executed
often?

--
Thomas Matthews

C++ newsgroup welcome message:
http://www.slack.net/~shiva/welcome.txt
C++ Faq: http://www.parashift.com/c++-faq-lite
C Faq: http://www.eskimo.com/~scs/c-faq/top.html
alt.comp.lang.learn.c-c++ faq:
http://www.raos.demon.uk/acllc-c++/faq.html
Other sites:
http://www.josuttis.com -- C++ STL Library book

Arthur J. O'Dwyer · Jun 30, 2004

ALWAYS use memcpy(), NEVER use for loops, unless you have empirical
evidence that your memcpy() is very poorly implemented.

A well implemented memcpy() can use many tricks to accelerate its
operation.

Agreed and agreed. I use 'memcpy' any time I can guarantee it
will be safe, which in C is all the time, as far as I can recall.
Of course, I don't write many programs in which "copy a chunk of
memory from A to B" is much of a bottleneck...

<OT>
Any remarks about this issue using GCC, or the Sun compiler,
are welcome.

Click to expand...

gcc is smart enough to inline memcpy calls for short memory blocks,
when optimisations are enabled:

fangorn:~/tmp 273> cat test.c
#include <string.h>

void foo(int *p, int *q)
{
memcpy(q, p, 2 * sizeof *p);
}
fangorn:~/tmp 274> gcc -O2 -S test.c
fangorn:~/tmp 275> cat test.s [...]
Even if you have no clue about x86 assembly, you can easily see that there
is no memcpy call in the code generated by gcc for this function. One
more reason to prefer memcpy to for loops.

Unfortunately for your example, "The Dev Team Thinks Of Everything"
in GCC, too:

% cat test.c
#include <string.h>

void foo(int *p, int *q)
{
memcpy(q, p, 2 * sizeof *p);
}
% gcc -O2 -S test.c
% cat test2.c
#include <string.h>

void foo(int *p, int *q)
{
int i;
for (i=0; i < 2; ++i)
q = p;
}
% gcc -O2 -S test2.c
% diff test.s test2.s
1c1
< .file "test.c"
---

.file "test2.c"

Click to expand...

%

One more reason to prefer whichever alternative is the more readable
(in this case, the alternative that doesn't involve a function call
to do a one-line task .

-Arthur

Arthur J. O'Dwyer · Jun 30, 2004

One more reason to prefer whichever alternative is the more readable
(in this case, the alternative that doesn't involve a function call
to do a one-line task .

And to clarify: I mean the function call 'foo', not the function
call 'memcpy'. 'memcpy' is good. 'foo' itself is unnecessary and
ought to be removed.

Okay, I think that's clearer.

-Arthur

luc wastiaux · Jun 30, 2004

Thomas said:
I've written my own memcpy function which uses the
processor's specialized instructions. However,
it has a minimum overhead. The threshold between
using memcpy for large areas vs. the DMA device
is very close (on my platform).

Out of curiosity, how do you instruct your processor to use DMA in your
custom memcpy function ?

Richard Bos · Jun 30, 2004

luc wastiaux said:
Out of curiosity, how do you instruct your processor to use DMA in your
custom memcpy function ?

In ISO C, you don't. It all depends on the architecture, and therefore
will differ between, say, an Intel machine and a Sparc.

Richard

Thomas Matthews · Jun 30, 2004

luc said:
Out of curiosity, how do you instruct your processor to use DMA in your
custom memcpy function ?

I use assembly language. The DMA is not a part of the processor,
but a component on the platform. The DMA has a setup overhead,
so it should only be used for large or automated transfers.

--
Thomas Matthews

C++ newsgroup welcome message:
http://www.slack.net/~shiva/welcome.txt
C++ Faq: http://www.parashift.com/c++-faq-lite
C Faq: http://www.eskimo.com/~scs/c-faq/top.html
alt.comp.lang.learn.c-c++ faq:
http://www.raos.demon.uk/acllc-c++/faq.html
Other sites:
http://www.josuttis.com -- C++ STL Library book

Dan Pop · Jun 30, 2004

In said:
Unfortunately for your example, "The Dev Team Thinks Of Everything"
in GCC, too:

% cat test.c
#include <string.h>

void foo(int *p, int *q)
{
memcpy(q, p, 2 * sizeof *p);
}
% gcc -O2 -S test.c
% cat test2.c
#include <string.h>

void foo(int *p, int *q)
{
int i;
for (i=0; i < 2; ++i)
q = p;
}
% gcc -O2 -S test2.c
% diff test.s test2.s
1c1
< .file "test.c"
---

.file "test2.c"

Click to expand...

%

Which shows that the memcpy version is still at least as good as the
for loop ;-)

One more reason to prefer whichever alternative is the more readable
(in this case, the alternative that doesn't involve a function call
to do a one-line task .

Click to expand...

To me, the memcpy alternative is more readable than the other: it
consists of a single, very simple, idiomatic even (for objects that can't
be directly assigned) function call. Which I wouldn't hide behind a
function in real C code: either use as such, inline, or hidden behind
a macro.

Dan

Edmund Bacon · Jun 30, 2004

Arthur said:
Agreed and agreed. I use 'memcpy' any time I can guarantee it
will be safe, which in C is all the time, as far as I can recall.

Aren't there issues with memcpy and overlapping memory locations?

In the following program, isn't the call to memcpy an error?

#include <stdio.h>
#include <string.h>

int main()
{

int x[] = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10};
int *to = x;
int *from = &x[1];

memcpy(to, from, sizeof x - sizeof *x); /* UB ? */

return 0;
}

Dan Pop · Jun 30, 2004

And to clarify: I mean the function call 'foo', not the function
call 'memcpy'. 'memcpy' is good. 'foo' itself is unnecessary and
ought to be removed.
Okay, I think that's clearer.

Indeed. foo() was introduced for the sole reason of having a minimal
translation unit ;-)

Dan

Dan Pop · Jun 30, 2004

In said:
I use assembly language. The DMA is not a part of the processor,
but a component on the platform. The DMA has a setup overhead,
so it should only be used for large or automated transfers.

By "automated" I guess you mean "asynchronous to the program execution".
Which has obvious advantages and disadvantages.

Dan

Dan Pop · Jun 30, 2004

In said:
Aren't there issues with memcpy and overlapping memory locations?

Yes, there are.

In the following program, isn't the call to memcpy an error?

#include <stdio.h>
#include <string.h>

int main()
{

int x[] = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10};
int *to = x;
int *from = &x[1];

memcpy(to, from, sizeof x - sizeof *x); /* UB ? */

return 0;
}

Use memmove() in such cases. It has well defined behaviour for
overlapping memory blocks. Depending on the nature of the overlap,
it will either perform an ordinary memcpy() or a copy in the opposite
direction.

Dan

Eric Sosman · Jun 30, 2004

Edmund said:
Arthur J. O'Dwyer wrote:

Agreed and agreed. I use 'memcpy' any time I can guarantee it
will be safe, which in C is all the time, as far as I can recall.

Click to expand...

Aren't there issues with memcpy and overlapping memory locations?

In the following program, isn't the call to memcpy an error?
[snip example with overlapping source and destination]

Yes: The behavior of memcpy() is not defined if the
source and destination objects overlap. If that's a
possibility, use memmove() instead.

luc wastiaux · Jun 30, 2004

Dan said:
By "automated" I guess you mean "asynchronous to the program execution".
Which has obvious advantages and disadvantages.

But how do you know when the transfer is complete then ? I assume that
even in synchronous mode, using DMA for large transfers can be beneficial.

Old Wolf · Jun 30, 2004

Arthur J. O'Dwyer said:
Agreed and agreed. I use 'memcpy' any time I can guarantee it
will be safe, which in C is all the time, as far as I can recall.
Of course, I don't write many programs in which "copy a chunk of
memory from A to B" is much of a bottleneck...

I have a slight aversion to memcpy, because of one compiler I had to
use, which would copy 65535 bytes if you called it with a third
argument of 0. (I think this is not standard-conforming, but
unfortunately the real world rears its ugly head sometimes).

FWIW this was Hitech C for the Z80 (and I guess the problem came
about because the Z80's block-move instruction does this if you
pass 0 as the length (it decrements and then checks the zero flag),
and the implementers must have not been aware of this behaviour).

Adding adressing of IPv6 to program	1	Feb 16, 2023
[memcpy] dst=NULL,size=0	9	Mar 3, 2009
gcc inline memcpy	7	Jul 12, 2012
Homework in C - Help Needed	1	Oct 16, 2024
Linux: using "clone3" and "waitid"	0	Oct 17, 2023
CIN Input #2 gets skipped, I don't understand why.	1	Feb 9, 2023
memcpy() Behaviour	6	Dec 30, 2004
linux <--> windows strcpy etc performance	5	Aug 29, 2010

memcpy() vs. for() performance

Case

jacob navia

Case

Alex Fraser

Dan Pop

Dan Pop

Thomas Matthews

Arthur J. O'Dwyer

Arthur J. O'Dwyer

luc wastiaux

Richard Bos

Thomas Matthews

Dan Pop

Edmund Bacon

Dan Pop

Dan Pop

Dan Pop

Eric Sosman

luc wastiaux

Old Wolf

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads