No Macros, No for loops, Pure C++

S

Steven T. Hatton

It blows up on a 9 dimensional space, but I'm pretty sure that's due to the
size of size_t.

--- begin: DataGeneration_Test ---
Tensor<T 3, 3>
1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27
TensorIndices<3, 3>
::ORDER : 3
::RANK : 3
::SIZE : 27
::_gslices:
GSlices<3,3>::_slices:
start: 0 sizes: 1,3,3 strides: 9,3,1
start: 9 sizes: 1,3,3 strides: 9,3,1
start: 18 sizes: 1,3,3 strides: 9,3,1
GSlices::_subslices:
GSlices<3,2>::_slices:
start: 0 sizes: 1,3 strides: 3,1
start: 3 sizes: 1,3 strides: 3,1
start: 6 sizes: 1,3 strides: 3,1
GSlices::_subslices:
GSlices<3,1>::_slices:
start: 0 sizes: 1,1,1 strides: 1,1,1
start: 1 sizes: 1,1,1 strides: 1,1,1
start: 2 sizes: 1,1,1 strides: 1,1,1

GSlices<3,1>::_slices:
start: 3 sizes: 1,1,1 strides: 1,1,1
start: 4 sizes: 1,1,1 strides: 1,1,1
start: 5 sizes: 1,1,1 strides: 1,1,1

GSlices<3,1>::_slices:
start: 6 sizes: 1,1,1 strides: 1,1,1
start: 7 sizes: 1,1,1 strides: 1,1,1
start: 8 sizes: 1,1,1 strides: 1,1,1


GSlices<3,2>::_slices:
start: 9 sizes: 1,3 strides: 3,1
start: 12 sizes: 1,3 strides: 3,1
start: 15 sizes: 1,3 strides: 3,1
GSlices::_subslices:
GSlices<3,1>::_slices:
start: 9 sizes: 1,1,1 strides: 1,1,1
start: 10 sizes: 1,1,1 strides: 1,1,1
start: 11 sizes: 1,1,1 strides: 1,1,1

GSlices<3,1>::_slices:
start: 12 sizes: 1,1,1 strides: 1,1,1
start: 13 sizes: 1,1,1 strides: 1,1,1
start: 14 sizes: 1,1,1 strides: 1,1,1

GSlices<3,1>::_slices:
start: 15 sizes: 1,1,1 strides: 1,1,1
start: 16 sizes: 1,1,1 strides: 1,1,1
start: 17 sizes: 1,1,1 strides: 1,1,1


GSlices<3,2>::_slices:
start: 18 sizes: 1,3 strides: 3,1
start: 21 sizes: 1,3 strides: 3,1
start: 24 sizes: 1,3 strides: 3,1
GSlices::_subslices:
GSlices<3,1>::_slices:
start: 18 sizes: 1,1,1 strides: 1,1,1
start: 19 sizes: 1,1,1 strides: 1,1,1
start: 20 sizes: 1,1,1 strides: 1,1,1

GSlices<3,1>::_slices:
start: 21 sizes: 1,1,1 strides: 1,1,1
start: 22 sizes: 1,1,1 strides: 1,1,1
start: 23 sizes: 1,1,1 strides: 1,1,1

GSlices<3,1>::_slices:
start: 24 sizes: 1,1,1 strides: 1,1,1
start: 25 sizes: 1,1,1 strides: 1,1,1
start: 26 sizes: 1,1,1 strides: 1,1,1




--- end: DataGeneration_Test ---

--
"If our hypothesis is about anything and not about some one or more
particular things, then our deductions constitute mathematics. Thus
mathematics may be defined as the subject in which we never know what we
are talking about, nor whether what we are saying is true." - Bertrand
Russell
 
H

Howard

Steven T. Hatton said:
It blows up on a 9 dimensional space, but I'm pretty sure that's due to
the
size of size_t.

--- begin: DataGeneration_Test ---
Tensor<T 3, 3>
1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27
TensorIndices<3, 3>

<snip>


Care to enlighten us as to exactly what you're talking about here? :)

-Howard
 
P

Phlip

Care to enlighten us as to exactly what you're talking about here? :)

Loop unrolling with expression metatemplates, without for loops.

The compiler will output essentially a huge brick of solid opcodes.
Electrons go in one end, and the result comes out another, with no moving
parts inside.
 
O

Old Wolf

Phlip said:
Loop unrolling with expression metatemplates, without for loops.

The compiler will output essentially a huge brick of solid opcodes.

It doesn't output much for me:
i.cc:2: error: expected constructor, destructor, or type conversion
before '<' token
i.cc:2: error: expected `,' or `;' before '<' token

Is there meant to be some compilable code somewhere?
 
S

Steven T. Hatton

Phlip said:
Loop unrolling with expression metatemplates, without for loops.

The compiler will output essentially a huge brick of solid opcodes.
Electrons go in one end, and the result comes out another, with no moving
parts inside.

In some cases compilers can unroll loops without the templates. For me, one
of the advantages of using templates and compile-time construction of the
data structures is that I can use recursion without paying for it at
run-time. Run-time recursion usually means a lot of dereferencing which can
be expensive (in comparison to loops). Oh, I should add that I have no
pointers either. There are very few conditionals as well.

I have to admit, I don't fully understand the beast I've created. Nor do I
have solid ideas of how best to use it. I started out by creating my own
vector and matrix classes treating them as separate types. That didn't
settle well with me because I know they should ideally be derived from a
common tensor type. One library I looked at has this backwards. In that
library tensor derives from vector.

I believe there is a correct foundation that can be laid, upon which all
subsequent functionality should be easily and naturally derivable. There
should be two fundamental algorithms: inner product, and outer product.
From these I should be able to produce all the operations such as M*V, V*M,
M*M, M^n, V*V, VxV, etc. I'm inclined to believe the correct design can be
expressed in fewer than 500 loc.
--
"If our hypothesis is about anything and not about some one or more
particular things, then our deductions constitute mathematics. Thus
mathematics may be defined as the subject in which we never know what we
are talking about, nor whether what we are saying is true." - Bertrand
Russell
 
C

Chris Theis

Steven T. Hatton said:
Phlip wrote:
[SNIP]

In some cases compilers can unroll loops without the templates. For me, one
of the advantages of using templates and compile-time construction of the
data structures is that I can use recursion without paying for it at
run-time. Run-time recursion usually means a lot of dereferencing which can
be expensive (in comparison to loops). Oh, I should add that I have no
pointers either. There are very few conditionals as well.

As always with optimization, care is advisable. In some situations this
"manual" loop unrolling might pay off and in others it won´t. This depends
very much on what you´re doing in the loop, if there might be aliasing
effects and so on. In general it´s good practice to profile and decide
whether the compiler´s built in optimization is better or the template
unrolling.
I have to admit, I don't fully understand the beast I've created. Nor do I
have solid ideas of how best to use it. I started out by creating my own
vector and matrix classes treating them as separate types. That didn't
settle well with me because I know they should ideally be derived from a
common tensor type. One library I looked at has this backwards. In that
library tensor derives from vector.

Well, then the design of that library is wrong (if we´re strictly speaking).
In terms of mathematics a vector is a tensor of first order and all
operations can be expressed in tensor notation. Consequently, it makes sense
that a vector is a tensor. However, from a progarmmer´s view the library
creators might have had their reasons for their design.

Cheers
Chris

[SNIP]
 
P

Phlip

Steven said:
I have to admit, I don't fully understand the beast I've created. Nor do I
have solid ideas of how best to use it. I started out by creating my own
vector and matrix classes treating them as separate types. That didn't
settle well with me because I know they should ideally be derived from a
common tensor type. One library I looked at has this backwards. In that
library tensor derives from vector.

If nothing can compell you to show Old Wolf your code, you could at least
show us your tests.
 
S

Steven T. Hatton

Chris said:
Steven T. Hatton said:
Phlip wrote:
[SNIP]

In some cases compilers can unroll loops without the templates. For me, one
of the advantages of using templates and compile-time construction of the
data structures is that I can use recursion without paying for it at
run-time. Run-time recursion usually means a lot of dereferencing which can
be expensive (in comparison to loops). Oh, I should add that I have no
pointers either. There are very few conditionals as well.

As always with optimization, care is advisable. In some situations this
"manual" loop unrolling might pay off and in others it won´t. This depends
very much on what you´re doing in the loop, if there might be aliasing
effects and so on. In general it´s good practice to profile and decide
whether the compiler´s built in optimization is better or the template
unrolling.

Can you provide an example of where using template recursion was not as
efficient as the compilers own optimization? Also, can you explain what you
mean by "aliasing effects", and how it might impact object code generated
from a recursive expression template?
Well, then the design of that library is wrong (if we´re strictly
speaking). In terms of mathematics a vector is a tensor of first order and
all operations can be expressed in tensor notation. Consequently, it makes
sense that a vector is a tensor. However, from a progarmmer´s view the
library creators might have had their reasons for their design.

It may be possible to define tensors in terms nested vectors of vectors and
show them to mathematically equivalent to tensors defined in the
conventional manner. I haven't thought about it for more than the time it
took to write this paragraph, so I can't say for sure.
--
"If our hypothesis is about anything and not about some one or more
particular things, then our deductions constitute mathematics. Thus
mathematics may be defined as the subject in which we never know what we
are talking about, nor whether what we are saying is true." - Bertrand
Russell
 
P

Phlip

Steven said:
Can you provide an example of where using template recursion was not as
efficient as the compilers own optimization?

Easy. Imagine function A calls B in seven different ways. Now they both fit
in the CPU cache, so this just discusses A and B to itself, without
involving the system bus.

Now force B to inline, via the keyword, or compiling options, or a macro.
Now A is seven times longer. It no longer fits in the CPU. When A evaluates,
it must constantly flush itself out of the CPU cache to load its other half.
Performance plumets.

Always remember: Inline does not automatically mean better performance.
Also, can you explain what you
mean by "aliasing effects",

int q = 5;
int * p = &q;
*p = 4;
if (q == 5)...

Between those lines, the compiler cannot optimize q into a register. It must
fetch q from memory, because the compiler may or may not be able to track
whether aliases, meaning alternate names, for q have affected its value.
and how it might impact object code generated
from a recursive expression template?

I don't know; the general point to learn is you can never predict this or
that code's effect on performance.
 
S

Steven T. Hatton

Phlip said:
Easy. Imagine function A calls B in seven different ways. Now they both
fit in the CPU cache, so this just discusses A and B to itself, without
involving the system bus.

Now force B to inline, via the keyword, or compiling options, or a macro.
Now A is seven times longer. It no longer fits in the CPU. When A
evaluates, it must constantly flush itself out of the CPU cache to load
its other half. Performance plumets.

Always remember: Inline does not automatically mean better performance.

I'm not sure how that addresses my question. Can you provide an actual
compilable example of template code and analogos for() loop code that
demonstrates that compiler-produced op-code from the for() loop version is
more efficient than the template version? I don't care if you have to use
a particular compiler to demonstrate.

--
"If our hypothesis is about anything and not about some one or more
particular things, then our deductions constitute mathematics. Thus
mathematics may be defined as the subject in which we never know what we
are talking about, nor whether what we are saying is true." - Bertrand
Russell
 
C

Chris Theis

Steven T. Hatton said:
Chris said:
Steven T. Hatton said:
Phlip wrote:
[SNIP]

In some cases compilers can unroll loops without the templates. For me, one
of the advantages of using templates and compile-time construction of the
data structures is that I can use recursion without paying for it at
run-time. Run-time recursion usually means a lot of dereferencing which can
be expensive (in comparison to loops). Oh, I should add that I have no
pointers either. There are very few conditionals as well.

As always with optimization, care is advisable. In some situations this
"manual" loop unrolling might pay off and in others it won´t. This depends
very much on what you´re doing in the loop, if there might be aliasing
effects and so on. In general it´s good practice to profile and decide
whether the compiler´s built in optimization is better or the template
unrolling.

Can you provide an example of where using template recursion was not as
efficient as the compilers own optimization?

Giving general examples regarding optimization is always tricky because it
depends on so many things. What I want to stress is that template loop
unrolling is certainly worth trying but it might result in code-bloat and
you might take away the compiler's possibility to optimize the code with its
own loop unrolling procedure. This "built-in" feature heavily relies on the
internal CPU registers and might or might not be more efficient. This
depends very much on the code.
Also, can you explain what you
mean by "aliasing effects", and how it might impact object code generated
from a recursive expression template?

Let's take a look at the following function which can be used to set the
values of an array:

void Set( float y[], const float* pArray, int n )
{
for( int i = 0; i < n; i++ )
y = 1.0f - *pArray;
}

Under the assumption that pArray is invariant the optimizer could move the
subtraction out of the loop and everybody is happy. But the user could pass
an address of an element of pArray as the second parameter (e.g., y[10] =
1.0; Set( y, &y[10], 20 ); ).
This means that pArray is not invariant anymore and the compiler cannot
safely move the subtraction to the front. However, you might force the
compiler or at least hint to the compiler that there is certainly no
aliasing-effect and then the built-in loop-unrolling can kick in. Naturally
this should be done with care only. If you use the template approach the
compiler will have a hard time to unroll the loop as it's not there anymore.
Don't get me wrong - I'm not against the template solution and I personally
favor it very much, but one has to decide on a case to case basis after
profiling which way to go. There simply is no general rule.

[SNIP]
It may be possible to define tensors in terms nested vectors of vectors and
show them to mathematically equivalent to tensors defined in the
conventional manner. I haven't thought about it for more than the time it
took to write this paragraph, so I can't say for sure.

You can implement tensors in terms of vectors but mathematically speaking
its wrong. A vector is by definition a tensor of n=1 order with 3^n elements
and a generic tensor of order n is expressed by 3^n elements. So every
vector is a tensor but not every tensor is a vector and consequently such an
implementation would violate the "is-a inheritance" rule.

Cheers
Chris
 
S

Steven T. Hatton

Chris said:
Don't get me wrong - I'm not against the template solution
and I personally favor it very much, but one has to decide on a case to
case basis after profiling which way to go. There simply is no general
rule.

I don't believe profiling at this stage is really very useful. In the case
of what I'm doing, I really have to take the design approach as an
hypothesis and build a fairly extensive framework before I can evaluate the
value of the approach. Since the design approach using template
metaprogramming is new to me, and fairly new to the field of computer
science (notice I did not say 'industry'), I am hard pressed to evaluate
one design approach in comparison with the other until I have more
experience. That only comes from doing.
[SNIP]
It may be possible to define tensors in terms nested vectors of vectors and
show them to mathematically equivalent to tensors defined in the
conventional manner. I haven't thought about it for more than the time
it took to write this paragraph, so I can't say for sure.

You can implement tensors in terms of vectors but mathematically speaking
its wrong. A vector is by definition a tensor of n=1 order with 3^n
elements and a generic tensor of order n is expressed by 3^n elements.

Actually, a vector in n-space is a rank 1 tensor of order n, having n
components. A general tensor in n-space is a rank m order n object with
n^m components. Hence the need for the previously discussed PowerOf
template to calculate the number of components.
So
every vector is a tensor but not every tensor is a vector and consequently
such an implementation would violate the "is-a inheritance" rule.

As I said, it may be possible to define tensors in terms of nested vectors
of vectors and demonstrate that a tensor thus defined is mathematically
equivalent to the conventional definition of a tensor. This would, I
believe, follow rather naturally from a generalization of the concept of
diadic matrices.
--
"If our hypothesis is about anything and not about some one or more
particular things, then our deductions constitute mathematics. Thus
mathematics may be defined as the subject in which we never know what we
are talking about, nor whether what we are saying is true." - Bertrand
Russell
 
C

Chris Theis

Steven T. Hatton said:
I don't believe profiling at this stage is really very useful. In the case
of what I'm doing, I really have to take the design approach as an
hypothesis and build a fairly extensive framework before I can evaluate the
value of the approach. Since the design approach using template
metaprogramming is new to me, and fairly new to the field of computer
science (notice I did not say 'industry'), I am hard pressed to evaluate
one design approach in comparison with the other until I have more
experience. That only comes from doing.

You should certainly go ahead and evaluate the designs first and the
optimization will come in the end. BTW metaprogramming is not really "new"
to the field of computer science either. The first time I saw it in an
academic context was in the book "Generative Programming" by Ulrich
Eisenecker published in 2000. There might be some scientific context even
before that time which I´m not aware of.
[SNIP]

As I said, it may be possible to define tensors in terms of nested vectors
of vectors and demonstrate that a tensor thus defined is mathematically
equivalent to the conventional definition of a tensor. This would, I
believe, follow rather naturally from a generalization of the concept of
diadic matrices.

You might be right on that, although dyadic matrices requires some
conditions (which makes them dyadic) that is not required by the generic
definition of tensors. Anyway, whether the generalization from the special
case is possible is something left for mathematicians and OT here, so I
won´t continue nitpicking ;-)

Cheers
Chris
 
C

chris

Steven said:
I'm not sure how that addresses my question. Can you provide an actual
compilable example of template code and analogos for() loop code that
demonstrates that compiler-produced op-code from the for() loop version is
more efficient than the template version? I don't care if you have to use
a particular compiler to demonstrate.

I would think a good example would be any sufficently large loop? as the
for loop would only unfold a few iteratons, whereas the template version
would unroll the whole loop.

Chris
 
C

chris

Steven said:
I'm not sure how that addresses my question. Can you provide an actual
compilable example of template code and analogos for() loop code that
demonstrates that compiler-produced op-code from the for() loop version is
more efficient than the template version? I don't care if you have to use
a particular compiler to demonstrate.

I would think a good example would be any sufficently large loop? as the
for loop would only unfold a few iteratons, whereas the template version
would unroll the whole loop.

Chris
 
C

chris

Steven said:
I'm not sure how that addresses my question. Can you provide an actual
compilable example of template code and analogos for() loop code that
demonstrates that compiler-produced op-code from the for() loop version is
more efficient than the template version? I don't care if you have to use
a particular compiler to demonstrate.

I would think a good example would be any sufficently large loop? as the
for loop would only unfold a few iteratons, whereas the template version
would unroll the whole loop.

Chris
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,183
Messages
2,570,965
Members
47,512
Latest member
FinleyNick

Latest Threads

Top