Fast 1D DCT Implementation in C/C++

S

Soumyadip Rakshit

Hi,

Could anyone direct me to a very good 1D DCT Implementation in C/C++

My data is in sets of 8 numbers.

Thanks a ton,
Soumyadip.
 
V

Victor Bazarov

Soumyadip said:
Could anyone direct me to a very good 1D DCT Implementation in C/C++

Is it supposed to be obvious what "1D DCT" is? I don't remember its being
defined in the language itself. Are you in the right newsgroup?

V
 
M

ma740988

Soumyadip said:
Hi,

Could anyone direct me to a very good 1D DCT Implementation in C/C++

You could expand on the DFT found at the thread (below). Mr Bux
overhauled my version of the DFT and it's excellent. I realize you
don't care if it's C versus C++ but I'm sort of sick of all the C
versions of transfrorms I've seen ( all the malloc's, free's make me
nervous ).


http://groups.google.com/group/comp...b97e23?q=ma740988+DFT&rnum=2#b3d97752f4b97e23

An aside: I suspect you're probably transforming image data? In that
regard, the the DCT can be performed by a DFT with a post-rotation
 
S

stevenj

Calling that code "excellent" is overstating things a bit in my
opinion. First, without a doubt, it is incredibly slow: it's not just
vanilla radix-2, which is slow to begin with, it uses explicit
recursion down to size 1 which adds a substantial overhead...you need
to coarsen recursion in order for it to be efficient. I would expect
it's around a factor of 10 slower than an optimized code, based on my
experience in benchmarking similar FFT codes. Second, it uses an
unstable iteration to obtain the trigonometric coefficients, which has
errors that grow as O(sqrt(N)) vs. O(sqrt(log N))) for an accurate FFT.
Third, the code is more than five times longer than a simple C function
using the same slow algorithm: I can do it in about 20 lines, vs. your
160. So I'm not sure what all the C++ iterator syntax has gained you.

In any case, it doesn't answer the question of the original poster, who
wanted to compute the DCT and not the DFT.

Our FFTW (www.fftw.org) library (in C) includes DCTs (discrete cosine
transforms) of types I-IV. There is also a C library by Ooura (Google
it) that includes type-II/III DCTs, and the FFTPACK library (Fortran)
includes types I-III DCTs. (Probably the original poster wants
type-II, as that is the most common kind).

Regards,
Steven G. Johnson
 
M

ma740988

Calling that code "excellent" is overstating things a bit in my
opinion. First, without a doubt, it is incredibly slow:

With some improvements we managed to fix that. Still not as fast as a
vendory library we recently purchased but I'll get around to that as
soon as time permits.
vanilla radix-2, which is slow to begin with, it uses explicit
recursion down to size 1 which adds a substantial overhead...you need
to coarsen recursion in order for it to be efficient. I would expect
it's around a factor of 10 slower than an optimized code, based on my
experience in benchmarking similar FFT codes. Second, it uses an
unstable iteration to obtain the trigonometric coefficients, which has
errors that grow as O(sqrt(N)) vs. O(sqrt(log N))) for an accurate FFT.
Third, the code is more than five times longer than a simple C function
using the same slow algorithm: I can do it in about 20 lines, vs. your
160. So I'm not sure what all the C++ iterator syntax has gained you.

As far I'm concerned it was a fairly decent start that didn't embody
the typical 'C' approach. I've experimented with cascaded
summation/Kahan computational tricks, O(sqrt(N)) with an _fairly_
accruate trig recurrence, etc. etc. The point. It's been overhauled.
Again certainly not as fast as the vendors scientific library but I'm
still experimenting. I think part of that surrounds - in some cases -
the vendors use of the cache. I've got a host of 7410/7457 Dual/Quad
powerPC's.
Lots of benchmarking left to do. One of which is to compare a fairly
decent implementation on sourceforge so we'll see.
 
S

stevenj

ma740988 said:
With some improvements we managed to fix that. Still not as fast as a
vendory library we recently purchased but I'll get around to that as
soon as time permits.

If you have indeed "fixed that" and have a code that is even remotely
competitive with vendor codes or high-performance portable codes, we'd
love to include it in our FFT benchmarks (www.fftw.org/speed).

However, I'm skeptical unless you made major structural changes to your
code. You have to stop focusing on syntactical details and
micro-optimization and realize that there is more to FFT algorithms
than textbook radix-2.
I've experimented with cascaded summation/Kahan computational tricks,

Any Cooley-Tukey FFT already uses cascade summation. In any case, the
limiting factor in the accuracy of the FFT code you posted is the trig.
factors, so, until you fix this, trying to make other parts more
accurate is silly.

(Unless you're trying to improve the "DFT" part of your code, where the
summation error is more significant for large N. But this is silly too
because no one should use the O(N^2) algorithm for large N.)

By the way, I understated the error of the trig. recurrence in your
posted code; IIRC the mean error was O(N), not O(sqrt(N)). You can get
O(sqrt(N)) by switching to a slightly different recurrence, but to get
O(sqrt(log N))) accuracy in the trig factors you need to either
precompute them or use a much more complicated recurrence by Buneman
that requires O(log N) storage.
As far I'm concerned it was a fairly decent start that didn't embody the typical 'C' approach.

It was fine as a first try for learning, but for the reasons I
explained (slow, inaccurate, much longer than necessary) I don't think
you should hold it forth as an example for others to follow.

Best of luck with your efforts at improving it, though.

Cordially,
Steven G. Johnson
 
M

ma740988

However, I'm skeptical unless you made major structural changes to your
code. You have to stop focusing on syntactical details and
micro-optimization and realize that there is more to FFT algorithms
than textbook radix-2.


Short of the fact that - it appears to me you're promoting your own
site, you're not providing information I dont know. I've got a vendor
library here that does fine. I already made structural changes to the
source but again, I'm not done yet as I'm still running comparisons
against the vendor library and a version obtained off sourceforge.

As this a newsgroup for discussing on the c++ language, algorithm
details we could discuss off line.

[snipped]
Other verbiage snipped.
 
S

stevenj

you're not providing information I dont know

I'm glad to hear that you know what you're doing with FFTs. I look
forward to seeing you demonstrate it someday, when your code is ready.

Sorry that you're offended by my posting links to code that actually
answers the OP's question, and references to real performance numbers,
which happen to be on our site.

Best of luck!
Steven G. Johnson
 
M

ma740988

I'm glad to hear that you know what you're doing with FFTs. I look
forward to seeing you demonstrate it someday, when your code is ready.

Sorry that you're offended by my posting links to code that actually
answers the OP's question, and references to real performance numbers,
which happen to be on our site.

Best of luck!
Steven G. Johnson

It was a surprise to me that the OP even asked when you find hordes of
implementations on the web. Revisiting his post, he didn't care if
the solution was C/C++. Having said that, it appears to me that it's
incumbent upon you to toot your own horn on a continuum. You'll take
issues with the standard algorithms used to re-express the DCT as a
real FFT (Numerical Recipes, FFTPACK etc) of the same size - calling
them unstable and then promote your own. With respect to the
algorithms. Do you valid points? Indeed. If memory serves FFTPACK
has O(square root ( n )) errors for realtively large data sets. ( Again
it's been awhile but if I recall correctly that's a true statement ).


It may be that most folks use a relatively small dataset and in that
regard it's a minor issue.

Here's my issue: I'm not arguing all that. I'm interested in a C++
solution. Period!!

I might add that my knowledge of C++ is sub-par. But that's where I'm
at. I'm at the language level. void main's and mallocs don't cut it
for me. Furthermore, I dont care if malloc saves me 5 microseconds.

Bottom line. if you want to promote your 'stuff' promote it. Some of
us aren't interested in a C solution and I'm willing to bet bottom
dollar. When the dust settles ( algorithm details aside ) there's a
comparable C++ solution out there.

Am I there yet, NO.
At first I developed a template class that's fairly canonical and
general. It uses a technique, where the 'twiddle' factors are stored
in a precomputed bit-reversed table. exp[0], exp[pi*i/2], exp[pi*i/4],
exp[3*pi*i/4], exp[pi*i/8] ...

I could go on but in the interest of time. I'm in a state of flux right
now, in that there's more pressing issues. I think I got side track
because I needed to finalize a prediction architecture. In essence,
for a broadband ( stochatic ) signal with interference from a
narrowband ( periodic ) source. I needed to work through the
prediction architecture such that the adaptive filter attempts to find
the correlation between d (k) and y (k)

Of course, I'm not here to re-invent the wheel. For a general purpose
FFT, initially I ran with link what's here:
http://www.relisoft.com/Science/Physics/fft.html

Today I've got a vendor library for handling the FFTs. It's pure C and
it's got the right 'flags' (if memory serves) to take advantage of the
Altivec engine on my cards. Admittedly, I've used your implementation.
Of course I don't recall what the numbers looked like but it's evident
I've - literally - been all over the place. Come to think of it, I
might bounce it (your FFTW) against the 'gold standard' ( a scientific
library from a vendor ) in my line of business.

In any event (algorithm details aside), I haven't had a chance to
solidify my implementation, but hopefully, I'll be able to continue/get
it together real soon. Again part of that is my ignorance of C++ but
...

Having said that. I think I'm/have been - way off topic at least 2/3
posts ago.
 
S

stevenj

ma740988 said:
You'll take
issues with the standard algorithms used to re-express the DCT as a
real FFT (Numerical Recipes, FFTPACK etc) of the same size - calling
them unstable and then promote your own.

Wow, you're apparently responding to things I said in completely
different forums in response to completely different posts! I'm
flattered that you read my writings so avidly!

I don't need to promote my code; plenty of people use it already. I
care mostly about having correct information about FFT algorithms
rather than misinformation floating about on Usenet. (Such as people
using inaccurate algorithms without realizing it.) You'll note that in
response to the OP I gave several examples of codes that would solve
his problem, not just ours.
With respect to the
algorithms. Do you valid points? Indeed. If memory serves FFTPACK
has O(square root ( n )) errors for realtively large data sets. ( Again
it's been awhile but if I recall correctly that's a true statement ).

Actually, it's a false statement when made in general as you just did;
you should go back to read my posts before mis-repeating them. The
only case in which FFTPACK has O(sqrt(n)) error is for type-I DCTs. It
is perfectly fine, i.e. O(sqrt(log n)) error, for type-II/III DCTs and
for DFTs.
Here's my issue: I'm not arguing all that. I'm interested in a C++ solution. Period!!

I don't care what language you write in, putting aside that most C FFTs
are also valid C++. I *do* care if you mislead other readers by
pointing to a slow, inaccurate, bloated FFT code and calling it an
"excellent" example to follow.
Furthermore, I dont care if malloc saves me 5 microseconds.

Um, memory allocation is completely unnecessary for an FFT that uses
trig. recurrences, and is completely irrelevant to the performance
optimization. I'm not going to guess at why you're bringing this up.

Best of luck in your implementation!
Steven G. Johnson
 
M

ma740988

Wow, you're apparently responding to things I said in completely
different forums in response to completely different posts! I'm
flattered that you read my writings so avidly!
Don't flatter yourself. I used to frequent comp.dsp. Certainly not as
much as I used to. I recalled in the - good ole days - discussions
about 'my' library being faster than 'yours'.

Actually, it's a false statement when made in general as you just did;
you should go back to read my posts before mis-repeating them. The
only case in which FFTPACK has O(sqrt(n)) error is for type-I DCTs. It
is perfectly fine, i.e. O(sqrt(log n)) error, for type-II/III DCTs and
for DFTs.
I didn't specify the type-type-I. That's correct, I stand corrected.
I don't care what language you write in, putting aside that most C FFTs
are also valid C++. I *do* care if you mislead other readers by
pointing to a slow, inaccurate, bloated FFT code and calling it an
"excellent" example to follow.
Hang loose, I'll revisit yours and at some point send you the vendor
numbers and my refined source numbers. I suspect the best I could do
with the vendor stuff is pdf - for obvious reasons. So we'll revisit
this in an email through your site.

Take care
 
M

ma740988

Wow, you're apparently responding to things I said in completely
different forums in response to completely different posts! I'm
flattered that you read my writings so avidly!

Don't flatter yourself. I used to frequent comp.dsp. Certainly not as

much as I used to. I recalled in the - good ole days - discussions
about 'my' library being faster than 'yours'.

Actually, it's a false statement when made in general as you just did;
you should go back to read my posts before mis-repeating them. The
only case in which FFTPACK has O(sqrt(n)) error is for type-I DCTs. It
is perfectly fine, i.e. O(sqrt(log n)) error, for type-II/III DCTs and
for DFTs.

I didn't specify the type-type-I. That's correct, I stand corrected.
I don't care what language you write in, putting aside that most C FFTs
are also valid C++. I *do* care if you mislead other readers by
pointing to a slow, inaccurate, bloated FFT code and calling it an
"excellent" example to follow.

Hang loose, I'll revisit yours and at some point send you the vendor
numbers and my refined source numbers. I suspect the best I could do
with the vendor stuff is pdf - for obvious reasons. So we'll revisit
this in an email through your site.
Um, memory allocation is completely unnecessary for an FFT that uses
trig. recurrences, and is completely irrelevant to the performance
optimization. I'm not going to guess at why you're bringing this up.

http://cluster.earlham.edu/detail/bazaar/src/fftw-2.1.5/fftw/malloc.c
http://cluster.earlham.edu/detail/bazaar/src/fftw-2.1.5/fftw/fftw.h
"extern void *fftw_malloc(size_t n);"
fftwnd.c - this is where it happens - i.e compute the FFT:
http://cluster.earlham.edu/detail/bazaar/src/fftw-2.1.5/fftw/fftwnd.c
etc. etc.

Take care
 
M

ma740988

Wow, you're apparently responding to things I said in completely
different forums in response to completely different posts! I'm
flattered that you read my writings so avidly!

Don't flatter yourself. I used to frequent comp.dsp. Certainly not as

much as I used to. I recalled in the - good ole days - discussions
about 'my' library being faster than 'yours'.

Actually, it's a false statement when made in general as you just did;
you should go back to read my posts before mis-repeating them. The
only case in which FFTPACK has O(sqrt(n)) error is for type-I DCTs. It
is perfectly fine, i.e. O(sqrt(log n)) error, for type-II/III DCTs and
for DFTs.

I didn't specify the type-type-I. That's correct, I stand corrected.
I don't care what language you write in, putting aside that most C FFTs
are also valid C++. I *do* care if you mislead other readers by
pointing to a slow, inaccurate, bloated FFT code and calling it an
"excellent" example to follow.

Hang loose, I'll revisit yours and at some point send you the vendor
numbers and my refined source numbers. I suspect the best I could do
with the vendor stuff is pdf - for obvious reasons. So we'll revisit
this in an email through your site.
Um, memory allocation is completely unnecessary for an FFT that uses
trig. recurrences, and is completely irrelevant to the performance
optimization. I'm not going to guess at why you're bringing this up.

http://cluster.earlham.edu/detail/bazaar/src/fftw-2.1.5/fftw/malloc.c
http://cluster.earlham.edu/detail/bazaar/src/fftw-2.1.5/fftw/fftw.h
"extern void *fftw_malloc(size_t n);"
fftwnd.c - this is where it happens - i.e compute the FFT:
http://cluster.earlham.edu/detail/bazaar/src/fftw-2.1.5/fftw/fftwnd.c
etc. etc.

Take care
 
S

stevenj

ma740988 said:
I'll revisit yours and at some point send you the vendor
numbers and my refined source numbers.

Better yet, if your version ends up being competitive, post the source
code somewhere! As I've said before, we're very interested in keeping
track of fast FFT codes ... it's important to have things to compare
against when optimizing, and to see what the tradeoffs are.

But I'm glad you've apparently realized that the posted code you
pointed to earlier is not "excellent" as it stands (either in speed,
accuracy, or size).

Regards,
Steven G. Johnson
 
M

ma740988

Better yet, if your version ends up being competitive, post the source
code somewhere! As I've said before, we're very interested in keeping
track of fast FFT codes ... it's important to have things to compare
against when optimizing, and to see what the tradeoffs are.

But I'm glad you've apparently realized that the posted code you
pointed to earlier is not "excellent" as it stands (either in speed,
accuracy, or size).

I was more concerned about the algorithm and made changes from the
outset. Admittidely I didn't spend much time on speed and size.
Algorithm details aside and time permitting, I'll work the
implementation details. In the end, I hope to have an approach that'll
be comparable to the ' C ' mess I encounter on the web. For now I'm
still struggling with C++ but I'll solicit the right help and wade
through it. No disrespect to 'C' source, but the vast majority of
those libraries doesn't cut it for _me_ and no one can convince me
_today_ (again algorithm aside) that I can't get a comparable/even
better implementation with C++.
 
S

stevenj

ma740988 said:
no one can convince me
_today_ (again algorithm aside) that I can't get a comparable/even
better implementation with C++.

We'd waste less time if you stopped responding to arguments no one has
made.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,175
Messages
2,570,944
Members
47,491
Latest member
mohitk

Latest Threads

Top