P
Paul Brown
Thanks for the replies Tristan, Eric, Steven & Kurt. They have given
me some good leads.
I present justification for a lot of the comments that drew
(constructive) criticism below. Firstly, let me summarise the feedback
that has pointed me in the right direction :-
<hand up> Count me in for that, there was no intention or suggestion
of micro-optimisation or use of inline assembler in my post. What I am
trying to find out is what algorithm is effective and what C syntax is
more efficient, albeit only on the small Turbo-C development system
that I have.
through all the URLs at this site.
I see that I previously downloaded
FFTW 3.0.1 is the latest official version of FFTW
a while back, I am overwhelmed with volume :
458 *.c files (2.9MB)
42 *.h files
Could take me a while to assimilate all of this.
20ms ticks and counting in between I could improve resolution to
600ns. Still dependent on the system time sharing on the CPU though.
hopefully I will come right with CYCLE.H.
I can't go and peep at this URL at the mo' as my C-FAQ download is
still stepping through each answer page. CYCLE.H has quite a bit of
code to it, I suppose I just chuck out most of the CPU specific code
and keep
_( Pentium cycle counter)
typedef unsigned long long ticks;
static __inline__ ticks getticks(void) {
ticks ret;
__asm__ __volatile__("rdtsc": "=A" (ret));
return ret; }
I followed my usual method of
QUERY GOOGLE -> READ -> FILTER -> POST
and using this I selected the four newsgroups that had hits on
C Speed Optimisation FFT
Newsgroups: comp.lang.oberon
Newsgroups: comp.lang.functional
.....
But what you *can* do is generate low-level code from high-level
specifications. This has already been well-demonstrated in e.g. the
FFTW
(written in OCaml), which generates extremely efficient C-algorithms
for the FFT from specifications and specialises them for the given
architecture. This, however, is high-level programming again, not
low-level programming. There is no escape from imperativeness if you
work with von-Neumann architectures on a low level.
from me (design engineer) it goes to software engineer, rebuilt under
Visual C++, and then targetted to TMS320 (or equiv) DSP.
algorithm that looks like it came from the Cooley-Tukey paper. Then I
stepped to the Numerical Recipes (C) algorithm (Danielson - Lanczos?)
(which by the way still carries the legacy of Fortran array indexing)
This took -72% off the time.
I have refined this in about five major steps, so that at the point of
posting I was
-23% off the time of the NumRep routine as published (only high
level tweaks).
I will perform my last trial of converting the original code to double
without changing the indexing, just to see what I would have got,
before ...
Now, however, I will trawl through all the source in FFTW and see what
I can find.
lessons learned will be valuable for other pieces of time critical
code.
hoped that me forcing the use of pointers would be faster.
code was definitely hand tweaked in the past as portability and
maintainability were far lower priority than performance.
Regards,
Paul
me some good leads.
I present justification for a lot of the comments that drew
(constructive) criticism below. Firstly, let me summarise the feedback
that has pointed me in the right direction :-
20.13: What's the best way of making my program efficient?
A: By picking good algorithms and implementing them carefully.
<hand up> Count me in for that, there was no intention or suggestion
of micro-optimisation or use of inline assembler in my post. What I am
trying to find out is what algorithm is effective and what C syntax is
more efficient, albeit only on the small Turbo-C development system
that I have.
Thanks, I am busy downloading the whole list by sequentially stepping
through all the URLs at this site.
The biggest speed gains in an FFT (and most algorithms) come from much higher-
level transformations. For example, our FFTW (www.fftw.org) ...
I see that I previously downloaded
FFTW 3.0.1 is the latest official version of FFTW
a while back, I am overwhelmed with volume :
458 *.c files (2.9MB)
42 *.h files
Could take me a while to assimilate all of this.
I thought I did quite well on the Athlon /Win98, synchronising on theyou should use a different routine than clock()
20ms ticks and counting in between I could improve resolution to
600ns. Still dependent on the system time sharing on the CPU though.
Yes, I knew there was something called the "software stopwatch",The best resolution comes from reading the CPU cycle counter directly (see
www.fftw.org/download.html for code to do this on many CPUs).
hopefully I will come right with CYCLE.H.
I can't go and peep at this URL at the mo' as my C-FAQ download is
still stepping through each answer page. CYCLE.H has quite a bit of
code to it, I suppose I just chuck out most of the CPU specific code
and keep
_( Pentium cycle counter)
typedef unsigned long long ticks;
static __inline__ ticks getticks(void) {
ticks ret;
__asm__ __volatile__("rdtsc": "=A" (ret));
return ret; }
I followed my usual method of
QUERY GOOGLE -> READ -> FILTER -> POST
and using this I selected the four newsgroups that had hits on
C Speed Optimisation FFT
Subject: Re: Optimizing array bounds checkingcomp.lang.oberon
Newsgroups: comp.lang.oberon
Subject: Re: Academia isn't interested...comp.lang.functional
Newsgroups: comp.lang.functional
.....
But what you *can* do is generate low-level code from high-level
specifications. This has already been well-demonstrated in e.g. the
FFTW
(written in OCaml), which generates extremely efficient C-algorithms
for the FFT from specifications and specialises them for the given
architecture. This, however, is high-level programming again, not
low-level programming. There is no escape from imperativeness if you
work with von-Neumann architectures on a low level.
Don't want to do that, anyway my solution has to be fairly portable,You'd have to examine the actual generated code to have a hope
of figuring out what's really happening in any particular case.
from me (design engineer) it goes to software engineer, rebuilt under
Visual C++, and then targetted to TMS320 (or equiv) DSP.
Actually I started with someone else's Fortran->C implementation of anYou realize, of course, that your first mistake was starting with the
radix-2 Numerical Recipes in C code and spending your time attempting
micro-optimizations. The biggest speed gains in an FFT (and most
algorithms) come from much higher-level transformations.
algorithm that looks like it came from the Cooley-Tukey paper. Then I
stepped to the Numerical Recipes (C) algorithm (Danielson - Lanczos?)
(which by the way still carries the legacy of Fortran array indexing)
This took -72% off the time.
I have refined this in about five major steps, so that at the point of
posting I was
-23% off the time of the NumRep routine as published (only high
level tweaks).
I will perform my last trial of converting the original code to double
without changing the indexing, just to see what I would have got,
before ...
Now, however, I will trawl through all the source in FFTW and see what
I can find.
Even though I will probably do better with an FFTW routine, theIf you just want a fast FFT and are not doing this as a learning experience, you'll
be much better off downloading an existing optimized code.
lessons learned will be valuable for other pieces of time critical
code.
The two sets of indices do not change sequentially, which is why IA decent compiler should transform array references a[j] in a loop
into separate incremented pointers for you, if it's advantageous to do
so. By being clever you can easily just confuse the compiler's
optimizer unless you know what you're doing.
hoped that me forcing the use of pointers would be faster.
This would be interesting given unlimited time, I do know that the DSPI've done the above many a time. The only way I've been able to cope is
by learning the instruction set and reading the (equivalent) assembly code.
It's pretty easy to tell when you've confused the optimizer.
code was definitely hand tweaked in the past as portability and
maintainability were far lower priority than performance.
Regards,
Paul