creating a wrapper around a vendor api

ma740988 · Oct 10, 2006

Assume I have a vendor file called ' vendor.h'. Within the file
there's two methods memalign and cforward. It is my understanding that
the memalign function is a wrapper around malloc. cforward is just a
vendor function for doing forward FFT's. At issue CSL_COMPLEX is
'cumbersome' to work with. As a result I created a wrapper. So now -
given the pseudo code.

#include <iostream>
#include <complex>
#include <vector>

// in vendor file - vendor.h
struct CSL_COMPLEX {
float r;
float i ;
};

void *memalign(size_t blocksize, size_t bytes)
{
return ( malloc ( blocksize * bytes ) ) ; // not sure if I have this
right but .. for test purposes
}
void cforward (
CSL_COMPLEX* ptr_input,
CSL_COMPLEX* ptr_ouput,
int num_sample
)
{}
// end vendor.h

typedef std::vector < std::complex < float > > CFLOAT_VEC ;

struct wrapper {
int previous ;
CSL_COMPLEX *ptr_mem_f ;
wrapper ( )
: previous ( INT_MAX )
, ptr_mem_f ( 0 )
{}

void execute_f ( CFLOAT_VEC& vec )
{
CFLOAT_VEC::size_type const current_sz = vec.size() ;
if ( !current_sz )
return ;

if ( previous != current_sz )
{
std::cout << current_sz << std::endl;
ptr_mem_f =
( CSL_COMPLEX *) memalign ( 32, 2 * current_sz * sizeof (
CSL_COMPLEX ) ) ;
}

// copy contents from vec to ptr_mem_f
for ( int idx ( 0 ); idx < current_sz; ++idx ) {
ptr_mem_f [ idx ].r = vec[ idx ].real() ;
ptr_mem_f [ idx ].i = vec[ idx ].imag() ;
}

// - at this point we run the forward FFT run the forward fft
//
cforward ( ptr_mem_f, ptr_mem_f, current_sz ) ; // for inplace
operations input and output is the same

// now copy back to vec ..
for ( int idx ( 0 ); idx < current_sz; ++idx ) {
std::complex < float > temp (
ptr_mem_f [ idx ].r,
ptr_mem_f [ idx ].i
);
vec[ idx ] = temp ;
}
previous = current_sz;
}
};

int main()
{

CFLOAT_VEC fv ( 5 );
for ( CFLOAT_VEC::size_type jdx ( 0 ); jdx < fv.size(); ++jdx )
fv [ jdx ] = std::complex < float > ( jdx , jdx ) ;

wrapper w_obj;
w_obj.execute_f ( fv ) ;
}

The fundamental issue today surrounds the copy from vec to ptr_mem_f
and ptr_mem_f to vec - literally - kills my timing. Besides my current
approach, I'm unsure how to put a decent wrapper around the basket
case of a function - cforward? Ideas welcomed, thanks in advance

Alf P. Steinbach · Oct 10, 2006

* ma740988:

Assume I have a vendor file called ' vendor.h'. Within the file
there's two methods memalign and cforward. It is my understanding that
the memalign function is a wrapper around malloc. cforward is just a
vendor function for doing forward FFT's. At issue CSL_COMPLEX is
'cumbersome' to work with. As a result I created a wrapper. So now -
given the pseudo code.

#include <iostream>
#include <complex>
#include <vector>

// in vendor file - vendor.h
struct CSL_COMPLEX {
float r;
float i ;
};

void *memalign(size_t blocksize, size_t bytes)
{
return ( malloc ( blocksize * bytes ) ) ; // not sure if I have this
right but .. for test purposes
}
void cforward (
CSL_COMPLEX* ptr_input,
CSL_COMPLEX* ptr_ouput,
int num_sample
)
{}
// end vendor.h

Are you sure the cforward function has special alignment requirements
above those provided by C++ new? If not, just use a
std::vector<CSL_COMPLEX> in your client code. Or rather, two of them.

Client code might look like this (off the cuff, not test-compiled):

typedef CSL_COMPLEX CslComplex; // Get rid of uppercase, principle.
typedef std::vector<CslComplex> CslValues;

void transform( CslValues const& data, CslValues& result )
{
if( data.size() == 0 ) { return; }

CslValues resultBuffer( data.size() );
cforward(
const_cast<CslComplex*>( &data[0] ),
&resultBuffer[0],
static_cast<int>( data.size() )
);
// Possible error checking here, I don't see that provided? Then:
std::swap( resultBuffer, result ); // Constant time.
}

// Wrapper for notational ease, relying on modern compiler with RVO.
// If measurements show unacceptable time used on copying, one might
// need to rewrite client code to use 'transform' above directly.'
// But that would be a premature optimization without measurements.
inline CslValues transformed( CslValues const& data )
{
CslValues result;
transform( data, result );
return result;
}

struct MakeCslComplex: CslComplex
{
MakeCslComplex( CslComplex const& v ) { r = v.i; i = v.i; }
MakeCslComplex( float re, float im ) { r = re; i = im }
// Etc., eg. conversion from std::complex<double>
};

int main()
{
CslValues data;
data.push_back( MakeCslComplex( 1.0, 2.0 ) );
data.push_back( MakeCslComplex( 3.0, 4.0 ) );
// etc.
CslValues result = transformed( data );
}

Hth.,

- Alf

Alf P. Steinbach · Oct 10, 2006

* Alf P. Steinbach:

void transform( CslValues const& data, CslValues& result )
{
if( data.size() == 0 ) { return; }

Argh. Replace the simple 'return' with something like

{ CslValues empty; empty.swap( result ); return; }

Grumble.

ma740988 · Oct 10, 2006

Alf P. Steinbach wrote:

Alf, for starters, thanks alot for the suggestions.

Are you sure the cforward function has special alignment requirements
above those provided by C++ new?

Yes, the cforward function has special alignment requirements. You see
under the hood the cforward routine uses hardware tricks to achieve
these 'fast FFT times'.. Meaning they're using an Altivec engine. Now
I'm not too savy on the altivec specifics but my understanding is that
this altivec requires ( dont quote me on this ) 4 byte alignments.

From what I understand ( sadly I can't recall all this ) the alignments

allow for this engine to do 4 multiplies ( or some such thing) in one
clock cycle. Bottom line. Lots of hardware tricks but your data has
to be 'just right'.

Does this mean because of the alignment your approach doesn't work?

ma740988 · Oct 10, 2006

Alf said:
Are you sure the cforward function has special alignment requirements
above those provided by C++ new?

Bye the way, what alignment requirements does new require? I'm trying
to find information on this but I'm coming up short.

[OT]
The first link below references the memalign function. The vxworks
manual is in some respects 'silent' on memalign but I think the premise
is the same as that in the second link.

http://www-kryo.desy.de/documents/vxWorks/V5.4/vxworks/ref/memLib.html#memalign
http://www.mkssoftware.com/docs/man3/memalign.3.asp

Alf P. Steinbach · Oct 10, 2006

* ma740988:

Alf P. Steinbach wrote:

Alf, for starters, thanks alot for the suggestions.

Yes, the cforward function has special alignment requirements. You see
under the hood the cforward routine uses hardware tricks to achieve
these 'fast FFT times'.. Meaning they're using an Altivec engine. Now
I'm not too savy on the altivec specifics but my understanding is that
this altivec requires ( dont quote me on this ) 4 byte alignments.

I'd just try and see. Also, most compilers let you specify the default
alignment for dynamic allocation, via command line options and via
pragmas. However, the default is unlikely to be less than 4 (if higher
and a power of two the 4 byte alignment is automatically satisfied), so,
disregarding extreme Murphy activity, things should work by default.

stevenj · Oct 10, 2006

ma740988 said:
I'm not too savy on the altivec specifics but my understanding is that
this altivec requires ( dont quote me on this ) 4 byte alignments.

Altivec requires data to be 16-byte aligned. On MacOS X, this is the
default for malloc, and I believe for new, as documented by Apple. On
most other systems (e.g. GNU/Linux, and Windows I think), memory
allocation uses only 8-byte alignment by default. Hence the need for
memalign.

Since your routine expects a struct { float real; float imag; } complex
data type, you shouldn't actually need to do any copying. It should be
sufficient to do a reinterpret_cast<csl_complex*>(x). The reason for
this is that essentially all extant C++ compilers store complex<foo> as
real part followed by imaginary part, with no padding, for any floating
point type foo. This is actually slated to become part of the standard
(see http://anubis.dkuug.dk/JTC1/SC22/WG21/docs/papers/2002/1388.pdf).
So, the only problem is to override the memory allocation to fix the
alignment.

Regards,
Steven G. Johnson

ma740988 · Oct 10, 2006

Since your routine expects a struct { float real; float imag; } complex
data type, you shouldn't actually need to do any copying. It should be
sufficient to do a reinterpret_cast<csl_complex*>(x). The reason for
this is that essentially all extant C++ compilers store complex<foo> as
real part followed by imaginary part, with no padding, for any floating
point type foo. This is actually slated to become part of the standard
(see http://anubis.dkuug.dk/JTC1/SC22/WG21/docs/papers/2002/1388.pdf).
So, the only problem is to override the memory allocation to fix the
alignment.

I see. I'll check into achieving this override.
Thanks alot

Alf P. Steinbach · Oct 10, 2006

* ma740988:

I see. I'll check into achieving this override.

Check out the allocator template parameter to std::vector.

ma740988 · Oct 10, 2006

Alf said:
Check out the allocator template parameter to std::vector.

Yeah, that's where I'm going through overload. I never thought I'd
ever need to 'write my own allocator' ( thought it was too hi-tech for
me ) as a result I skipped those chapters in Josuttis and the like

A question on std::swap and it's performance	9	Oct 11, 2006
array of array of complex things	2	Aug 29, 2006
Help optimize nbody bench program (c++ sse2 intrinsics)	3	Oct 12, 2012
construction of static foo objects .. more	4	Sep 28, 2006
Drawing missing in bitmap in a pure C win32 program	4	Jun 3, 2023
Implementation dilema	0	Jul 8, 2005
Modified answer to Accelerated C++ Exercise 11-6 stops linking?	5	Jan 5, 2010
Dealing with naive malloc() implementations	14	May 9, 2007

creating a wrapper around a vendor api

ma740988

Alf P. Steinbach

Alf P. Steinbach

ma740988

ma740988

Alf P. Steinbach

stevenj

ma740988

Alf P. Steinbach

ma740988

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads