std::copy versus pointer copy

W

White Wolf

As the application I am writing involves a large amount of array
copies, it is very important to make it work as fast as possible ...
http://magnonel.guild.net/~schwern/talks/How_To_Be_Lazy/full_slides/rules_of_optimization.html

http://tinyurl.com/ncpc

Indeed, that was my first thought, namely that the implementors of the
library would know best how to make their function work optimally, as
they developed the library themselves.

In case of g++ also the compiler.
On the other hand, as somebody else replied, it might not be very
optimal, since it is written with generality in mind, that is, it
should work on classes too ... while if you would write your own
copying routine, it is optimized for the application at hand (in this
case, an array copy).

This is only half true. Nothing whatsoever prevents the library implentor
to specialize its templates for cases dealing with POD types etc. There are
many kinds of magic which can be used to achieve that.
Yet, even in the case of g++, I am sure it is written as optimal as
much as it is made generic ...

As above: it can be made generic *and* optimal. The magic of templates.
 
W

White Wolf

Yes, I need to check the code to see what's going on ... but I would
not be asking this question if it didn't matter of if it wasn't a
serious problem!

Did you *measure*? Not little test programs outside, but *inside* your
application. Does the profiler show these copy operations to be a
bottleneck?
 
F

franky.backeljauw

This is still wrong.

Yes, as I had e-mailed it before I had read your message ...
friend void iCopy( Element *dst, const Element *src, int n ) { int i = n + 1; while ( i-- > 0 ) *dst++ = *src++; }
friend void mCopy( Element *dst, const Element *src, int n ) { int i = n + 1; while ( i-- > 0 ) dst[ i ] = src; }
friend void sCopy( Element *dst, const Element *src, int n ) { std::copy( src, src+n+1, dst ); }

These are all going 1 element too far.


No, this depends on which n is given ...
b and c have uninitialized memory that you are trying to print. This
gives undefined behavior.

Why? Must a double be initialized, in order to get it's memory location
initialized?
I had to make a number of changes to get sensible output. The result was
being truncated to 0. Here are the results (Visual C++ 6.0, default
project settings, original library with a few of the fixes published by
Dinkumware, none of which seem likely to affect this test):

Normal optimizations (release build)
----
time for iCopy : 0.735
time for mCopy : 0.671
time for sCopy : 0.672
time for iCopy : 0.703
time for mCopy : 0.672
time for sCopy : 0.687

Not a terribly significant difference in any case. I wouldn't expect
this to be a bottleneck

No, but your results also seem to indicate that the member copy is the
fastest ... maybe I should have put my question like this: what is the
fastest way to copy arrays of pod?

Regards,

Franky B.
 
F

franky.backeljauw

As for your issue, I would recommend writing your copy function to call
std::copy. Later, when everything's working, you can replace it with a
faster function if you feel it's necessary. But the "faster function"
might be slower on some systems, so you have to watch that. Simply using
std::copy is probably a good compromise anyway.

It is not, at least on my system, because an ordinary member copy takes
only two thirds of the time that std::copy takes to copy the same array.

Regards,

Franky B.
 
K

Kevin Goodsell

No, this depends on which n is given ...

OK, but I submit that it's very odd to use an function interface where
the size parameter actually means "One less than the size".
Why? Must a double be initialized, in order to get it's memory location
initialized?

I don't understand your second question. But C and C++ both forbid
"examining" an object that has not been given a value. The following
program has undefined behavior:

int main()
{
int i;
i;
}

No, but your results also seem to indicate that the member copy is the
fastest ... maybe I should have put my question like this: what is the
fastest way to copy arrays of pod?

The answer is simple: Unspecified.

-Kevin
 
W

White Wolf

I just meant : Yet Another Bookmark ...

I gave up bookmarking. I have too many and too many broken. My bookmark
nowadays is few stable sites (FAQs etc.) and Google is for the rest. :)
 
T

tom_usenet

No, but your results also seem to indicate that the member copy is the
fastest ... maybe I should have put my question like this: what is the
fastest way to copy arrays of pod?

It depends on the compiler. Good candidates are std::copy,
std::memcpy, memberwise loop, and, often the winner for poorer
compilers, Duff's Device.

In my tests, the results depend very much on the precise optimization
settings. With loop unrolling, iCopy, mCopy and dCopy are generally
better than cCopy (memcpy) and sCopy (std::copy). Without, dCopy is
generally best, followed by the rest all about even.

Here are my results (all with max optimization), followed by the code:

GCC 2.95 (-O3 -funroll-loops):
time for iCopy : 0.546
time for mCopy : 0.625
time for sCopy : 1.11
time for cCopy : 1.093
time for dCopy : 0.61
time for iCopy : 0.531
time for mCopy : 0.703
time for sCopy : 1.094
time for cCopy : 1.094
time for dCopy : 0.593

GCC 3.2 (-O3 -funroll-loops -mcpu=athlon):
time for iCopy : 0.516
time for mCopy : 0.609
time for sCopy : 1.156
time for cCopy : 1.032
time for dCopy : 0.64
time for iCopy : 0.531
time for mCopy : 0.625
time for sCopy : 1.204
time for cCopy : 1.046
time for dCopy : 0.641

VC7.1 (/Ox /Og /Oi /Ot /Oy /G7 /GA):
time for iCopy : 0.813
time for mCopy : 0.625
time for sCopy : 1.093
time for cCopy : 1.032
time for dCopy : 0.656
time for iCopy : 0.812
time for mCopy : 0.672
time for sCopy : 1.11
time for cCopy : 1.031
time for dCopy : 0.656


#include <time.h>
#include <algorithm>
#include <iostream>
#include <string.h>

using namespace std;

typedef double Element;
void iCopy( Element *dst, const Element *src, int n ) { while ( n-- >
0 ) *dst++ = *src++; }
void mCopy( Element *dst, const Element *src, int n ) { while ( n-- >
0 ) dst[n] = src[n]; }
void sCopy( Element *dst, const Element *src, int n ) { std::copy(src,
src+n, dst); }
void cCopy( Element *dst, const Element *src, int n ) { memcpy(dst,
src, n*sizeof *dst); }
void dCopy( Element *dst, const Element *src, int n )
{
const Element* const end = src + n;
if (n <= 0)
return;
switch(n % 8)
{
while(src != end)
{
case 0:
*dst++ = *src++;
case 7:
*dst++ = *src++;
case 6:
*dst++ = *src++;
case 5:
*dst++ = *src++;
case 4:
*dst++ = *src++;
case 3:
*dst++ = *src++;
case 2:
*dst++ = *src++;
case 1:
*dst++ = *src++;
}
}
}

class Vector
{
public:

Vector( unsigned int newsize ) { array = new Element[ newsize ];
size = newsize; }
~Vector() { delete[] array; }

void init() { for ( unsigned int i = 0; i<size; i++ ) array[ i ] =
i; }
void print() { for ( unsigned int i = 0; i<size; i++ ) cout <<
array[ i ] << " "; cout << endl; }

unsigned int size;
Element *array;
};

int main()
{
Vector a( 10 ), b( 10 ), c( 10 ); a.init();
cout << "a : "; a.print();
cout << "b : "; b.print();
cout << "c : "; c.print();

iCopy( b.array, a.array, 7 );
cout << "b after iCopy : "; b.print();

sCopy( c.array, a.array, 7 );
cout << "c after sCopy : "; c.print();

clock_t start, stop;
double taken;

int VECTOR_SIZE = 100;
int LOOP_COUNT = 5000000;

Vector d( VECTOR_SIZE ), e( VECTOR_SIZE );
d.init();

start = clock();
for ( int i = 0 ; i < LOOP_COUNT; i++ ) iCopy( e.array, d.array,
VECTOR_SIZE-10 );
stop = clock();
taken = ( stop - start ) / (double)CLOCKS_PER_SEC;
cerr << "time for iCopy : " << taken << endl;

start = clock();
for ( int i = 0 ; i < LOOP_COUNT; i++ ) mCopy( e.array, d.array,
VECTOR_SIZE-10 );
stop = clock();
taken = ( stop - start ) / (double)CLOCKS_PER_SEC;
cerr << "time for mCopy : " << taken << endl;

start = clock();
for ( int i = 0; i < LOOP_COUNT; i++ ) sCopy( e.array, d.array,
VECTOR_SIZE-10 );
stop = clock();
taken = ( stop - start ) / (double)CLOCKS_PER_SEC;
cerr << "time for sCopy : " << taken << endl;

start = clock();
for ( int i = 0; i < LOOP_COUNT; i++ ) cCopy( e.array, d.array,
VECTOR_SIZE-10 );
stop = clock();
taken = ( stop - start ) / (double)CLOCKS_PER_SEC;
cerr << "time for cCopy : " << taken << endl;

start = clock();
for ( int i = 0; i < LOOP_COUNT; i++ ) dCopy( e.array, d.array,
VECTOR_SIZE-10 );
stop = clock();
taken = ( stop - start ) / (double)CLOCKS_PER_SEC;
cerr << "time for dCopy : " << taken << endl;

start = clock();
for ( int i = 0 ; i < LOOP_COUNT; i++ ) iCopy( e.array, d.array,
VECTOR_SIZE-10 );
stop = clock();
taken = ( stop - start ) / (double)CLOCKS_PER_SEC;
cerr << "time for iCopy : " << taken << endl;

start = clock();
for ( int i = 0 ; i < LOOP_COUNT; i++ ) mCopy( e.array, d.array,
VECTOR_SIZE-10 );
stop = clock();
taken = ( stop - start ) / (double)CLOCKS_PER_SEC;
cerr << "time for mCopy : " << taken << endl;

start = clock();
for ( int i = 0; i < LOOP_COUNT; i++ ) sCopy( e.array, d.array,
VECTOR_SIZE-10 );
stop = clock();
taken = ( stop - start ) / (double)CLOCKS_PER_SEC;
cerr << "time for sCopy : " << taken << endl;

start = clock();
for ( int i = 0; i < LOOP_COUNT; i++ ) cCopy( e.array, d.array,
VECTOR_SIZE-10 );
stop = clock();
taken = ( stop - start ) / (double)CLOCKS_PER_SEC;
cerr << "time for cCopy : " << taken << endl;

start = clock();
for ( int i = 0; i < LOOP_COUNT; i++ ) dCopy( e.array, d.array,
VECTOR_SIZE-10 );
stop = clock();
taken = ( stop - start ) / (double)CLOCKS_PER_SEC;
cerr << "time for dCopy : " << taken << endl;

return 0;
}
 
F

franky.backeljauw

It depends on the compiler. Good candidates are std::copy,
std::memcpy, memberwise loop, and, often the winner for poorer
compilers, Duff's Device.

In my tests, the results depend very much on the precise optimization
settings. With loop unrolling, iCopy, mCopy and dCopy are generally
better than cCopy (memcpy) and sCopy (std::copy). Without, dCopy is
generally best, followed by the rest all about even.

Thanks! I did not yet test with the maximum optimizations ...

Regards,

Franky B.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,141
Messages
2,570,818
Members
47,367
Latest member
mahdiharooniir

Latest Threads

Top