creating a wrapper around a vendor api

M

ma740988

Assume I have a vendor file called ' vendor.h'. Within the file
there's two methods memalign and cforward. It is my understanding that
the memalign function is a wrapper around malloc. cforward is just a
vendor function for doing forward FFT's. At issue CSL_COMPLEX is
'cumbersome' to work with. As a result I created a wrapper. So now -
given the pseudo code.

#include <iostream>
#include <complex>
#include <vector>

// in vendor file - vendor.h
struct CSL_COMPLEX {
float r;
float i ;
};

void *memalign(size_t blocksize, size_t bytes)
{
return ( malloc ( blocksize * bytes ) ) ; // not sure if I have this
right but .. for test purposes
}
void cforward (
CSL_COMPLEX* ptr_input,
CSL_COMPLEX* ptr_ouput,
int num_sample
)
{}
// end vendor.h


typedef std::vector < std::complex < float > > CFLOAT_VEC ;

struct wrapper {
int previous ;
CSL_COMPLEX *ptr_mem_f ;
wrapper ( )
: previous ( INT_MAX )
, ptr_mem_f ( 0 )
{}

void execute_f ( CFLOAT_VEC& vec )
{
CFLOAT_VEC::size_type const current_sz = vec.size() ;
if ( !current_sz )
return ;

if ( previous != current_sz )
{
std::cout << current_sz << std::endl;
ptr_mem_f =
( CSL_COMPLEX *) memalign ( 32, 2 * current_sz * sizeof (
CSL_COMPLEX ) ) ;
}

// copy contents from vec to ptr_mem_f
for ( int idx ( 0 ); idx < current_sz; ++idx ) {
ptr_mem_f [ idx ].r = vec[ idx ].real() ;
ptr_mem_f [ idx ].i = vec[ idx ].imag() ;
}

// - at this point we run the forward FFT run the forward fft
//
cforward ( ptr_mem_f, ptr_mem_f, current_sz ) ; // for inplace
operations input and output is the same

// now copy back to vec ..
for ( int idx ( 0 ); idx < current_sz; ++idx ) {
std::complex < float > temp (
ptr_mem_f [ idx ].r,
ptr_mem_f [ idx ].i
);
vec[ idx ] = temp ;
}
previous = current_sz;
}
};

int main()
{

CFLOAT_VEC fv ( 5 );
for ( CFLOAT_VEC::size_type jdx ( 0 ); jdx < fv.size(); ++jdx )
fv [ jdx ] = std::complex < float > ( jdx , jdx ) ;

wrapper w_obj;
w_obj.execute_f ( fv ) ;
}

The fundamental issue today surrounds the copy from vec to ptr_mem_f
and ptr_mem_f to vec - literally - kills my timing. Besides my current
approach, I'm unsure how to put a decent wrapper around the basket
case of a function - cforward? Ideas welcomed, thanks in advance
 
A

Alf P. Steinbach

* ma740988:
Assume I have a vendor file called ' vendor.h'. Within the file
there's two methods memalign and cforward. It is my understanding that
the memalign function is a wrapper around malloc. cforward is just a
vendor function for doing forward FFT's. At issue CSL_COMPLEX is
'cumbersome' to work with. As a result I created a wrapper. So now -
given the pseudo code.

#include <iostream>
#include <complex>
#include <vector>

// in vendor file - vendor.h
struct CSL_COMPLEX {
float r;
float i ;
};

void *memalign(size_t blocksize, size_t bytes)
{
return ( malloc ( blocksize * bytes ) ) ; // not sure if I have this
right but .. for test purposes
}
void cforward (
CSL_COMPLEX* ptr_input,
CSL_COMPLEX* ptr_ouput,
int num_sample
)
{}
// end vendor.h

Are you sure the cforward function has special alignment requirements
above those provided by C++ new? If not, just use a
std::vector<CSL_COMPLEX> in your client code. Or rather, two of them.

Client code might look like this (off the cuff, not test-compiled):

typedef CSL_COMPLEX CslComplex; // Get rid of uppercase, principle.
typedef std::vector<CslComplex> CslValues;

void transform( CslValues const& data, CslValues& result )
{
if( data.size() == 0 ) { return; }

CslValues resultBuffer( data.size() );
cforward(
const_cast<CslComplex*>( &data[0] ),
&resultBuffer[0],
static_cast<int>( data.size() )
);
// Possible error checking here, I don't see that provided? Then:
std::swap( resultBuffer, result ); // Constant time.
}

// Wrapper for notational ease, relying on modern compiler with RVO.
// If measurements show unacceptable time used on copying, one might
// need to rewrite client code to use 'transform' above directly.'
// But that would be a premature optimization without measurements.
inline CslValues transformed( CslValues const& data )
{
CslValues result;
transform( data, result );
return result;
}

struct MakeCslComplex: CslComplex
{
MakeCslComplex( CslComplex const& v ) { r = v.i; i = v.i; }
MakeCslComplex( float re, float im ) { r = re; i = im }
// Etc., eg. conversion from std::complex<double>
};

int main()
{
CslValues data;
data.push_back( MakeCslComplex( 1.0, 2.0 ) );
data.push_back( MakeCslComplex( 3.0, 4.0 ) );
// etc.
CslValues result = transformed( data );
}

Hth.,

- Alf
 
A

Alf P. Steinbach

* Alf P. Steinbach:
void transform( CslValues const& data, CslValues& result )
{
if( data.size() == 0 ) { return; }

Argh. Replace the simple 'return' with something like

{ CslValues empty; empty.swap( result ); return; }

Grumble.
 
M

ma740988

Alf P. Steinbach wrote:

Alf, for starters, thanks alot for the suggestions.
Are you sure the cforward function has special alignment requirements
above those provided by C++ new?
Yes, the cforward function has special alignment requirements. You see
under the hood the cforward routine uses hardware tricks to achieve
these 'fast FFT times'.. Meaning they're using an Altivec engine. Now
I'm not too savy on the altivec specifics but my understanding is that
this altivec requires ( dont quote me on this ) 4 byte alignments.
From what I understand ( sadly I can't recall all this ) the alignments
allow for this engine to do 4 multiplies ( or some such thing) in one
clock cycle. Bottom line. Lots of hardware tricks but your data has
to be 'just right'.

Does this mean because of the alignment your approach doesn't work?
 
M

ma740988

Alf said:
Are you sure the cforward function has special alignment requirements
above those provided by C++ new?
Bye the way, what alignment requirements does new require? I'm trying
to find information on this but I'm coming up short.


[OT]
The first link below references the memalign function. The vxworks
manual is in some respects 'silent' on memalign but I think the premise
is the same as that in the second link.

http://www-kryo.desy.de/documents/vxWorks/V5.4/vxworks/ref/memLib.html#memalign
http://www.mkssoftware.com/docs/man3/memalign.3.asp
 
A

Alf P. Steinbach

* ma740988:
Alf P. Steinbach wrote:

Alf, for starters, thanks alot for the suggestions.

Yes, the cforward function has special alignment requirements. You see
under the hood the cforward routine uses hardware tricks to achieve
these 'fast FFT times'.. Meaning they're using an Altivec engine. Now
I'm not too savy on the altivec specifics but my understanding is that
this altivec requires ( dont quote me on this ) 4 byte alignments.

I'd just try and see. Also, most compilers let you specify the default
alignment for dynamic allocation, via command line options and via
pragmas. However, the default is unlikely to be less than 4 (if higher
and a power of two the 4 byte alignment is automatically satisfied), so,
disregarding extreme Murphy activity, things should work by default.
 
S

stevenj

ma740988 said:
I'm not too savy on the altivec specifics but my understanding is that
this altivec requires ( dont quote me on this ) 4 byte alignments.

Altivec requires data to be 16-byte aligned. On MacOS X, this is the
default for malloc, and I believe for new, as documented by Apple. On
most other systems (e.g. GNU/Linux, and Windows I think), memory
allocation uses only 8-byte alignment by default. Hence the need for
memalign.

Since your routine expects a struct { float real; float imag; } complex
data type, you shouldn't actually need to do any copying. It should be
sufficient to do a reinterpret_cast<csl_complex*>(x). The reason for
this is that essentially all extant C++ compilers store complex<foo> as
real part followed by imaginary part, with no padding, for any floating
point type foo. This is actually slated to become part of the standard
(see http://anubis.dkuug.dk/JTC1/SC22/WG21/docs/papers/2002/1388.pdf).
So, the only problem is to override the memory allocation to fix the
alignment.

Regards,
Steven G. Johnson
 
M

ma740988

Since your routine expects a struct { float real; float imag; } complex
data type, you shouldn't actually need to do any copying. It should be
sufficient to do a reinterpret_cast<csl_complex*>(x). The reason for
this is that essentially all extant C++ compilers store complex<foo> as
real part followed by imaginary part, with no padding, for any floating
point type foo. This is actually slated to become part of the standard
(see http://anubis.dkuug.dk/JTC1/SC22/WG21/docs/papers/2002/1388.pdf).
So, the only problem is to override the memory allocation to fix the
alignment.
I see. I'll check into achieving this override.
Thanks alot
 
M

ma740988

Alf said:
Check out the allocator template parameter to std::vector.
Yeah, that's where I'm going through overload. I never thought I'd
ever need to 'write my own allocator' ( thought it was too hi-tech for
me ) as a result I skipped those chapters in Josuttis and the like :)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,968
Messages
2,570,154
Members
46,702
Latest member
LukasConde

Latest Threads

Top