Fast fmodf(..., 1.0f)

jason.cipriani · Jun 29, 2008

Two questions:

1. On a modern 32-bit Intel machine, but not using SSE, what is the
fastest way to compute the fractional part of a float? E.g.,
"fmodf(num, 1.0f)", or "num - (float)(int)num", etc. Is the something
faster than fmodf? I don't know the input range, so I can't do tricks
like "if (num < 0.0f) num += 1.0f;", for example.

2. I know that this is not the correct newsgroup for this, although I
figured somebody here would have some good advice. It seemed like a
better option than c.l.c++ or comp.unix.programmer, the only other two
newsgroups I really frequent. In the future, what is the best
newsgroup for questions about machine-specific optimizations like
this, not necessarily in any specific programming language?

Thanks,
Jason

jason.cipriani · Jun 29, 2008

1. On a modern 32-bit Intel machine, but not using SSE, what is the
fastest way to compute the fractional part of a float? E.g.,
"fmodf(num, 1.0f)", or "num - (float)(int)num", etc. Is the something
faster than fmodf? I don't know the input range, so I can't do tricks
like "if (num < 0.0f) num += 1.0f;", for example.

Oops, I screwed up that question. I am looking for more than just
finding the fractional component of a number, I need negative numbers
to wrap around, too, so, unlike fmodf, I'm looking for:

assert(magic_function(-0.1f) == 0.9f);
assert(magic_function(-3.8f) == 0.2f);
assert(magic_function( 0.1f) == 0.1f);
assert(magic_function( 2.8f) == 0.8f);

So it's more the fastest equivalent of:

float magic_function (float f) {
while (f < 0.0f) f += 1.0f;
while (f > 1.0f) f -= 1.0f;
return f;
}

Thanks,
Jason

Barry Schwarz · Jun 29, 2008

Two questions:

1. On a modern 32-bit Intel machine, but not using SSE, what is the
fastest way to compute the fractional part of a float? E.g.,

Why do you care. Premature optimization is the source of many
otherwise avoidable problems.

"fmodf(num, 1.0f)", or "num - (float)(int)num", etc. Is the something

There is no guarantee that num can be converted to int. It could very
easily exceed INT_MAX.

faster than fmodf? I don't know the input range, so I can't do tricks
like "if (num < 0.0f) num += 1.0f;", for example.

Sure you can. Just include this statement in a while loop. Be aware
there are a few problems with the approach: 1) it could take very many
iterations (there are ways to code around this), 2) for sufficiently
large num, num+1 equals num (there are ways to code around this too),
and 3) it could introduce rounding errors.

2. I know that this is not the correct newsgroup for this, although I
figured somebody here would have some good advice. It seemed like a
better option than c.l.c++ or comp.unix.programmer, the only other two

Since C and C++ are completely separate languages, decide which one
you want to code in and ask in only the appropriate group.

newsgroups I really frequent. In the future, what is the best
newsgroup for questions about machine-specific optimizations like
this, not necessarily in any specific programming language?

A newsgroup devoted to your system. There may be a hardware
instruction that does exactly what you want.

Remove del for email

jason.cipriani · Jun 29, 2008

Barry Schwarz said:
Why do you care. Premature optimization is the source of many
otherwise avoidable problems.

Because it's a measured bottleneck in a performance-critical inner
loop... nothing premature about that...

There is no guarantee that num can be converted to int. It could very
easily exceed INT_MAX.

I know the relative range of the values, I should have mentioned that.
I don't know exactly what it can be but I can guarantee it won't
exceed INT_MAX. And I'm not using this method anyways.

Sure you can. Just include this statement in a while loop. Be aware
there are a few problems with the approach: 1) it could take very many
iterations (there are ways to code around this), 2) for sufficiently
large num, num+1 equals num (there are ways to code around this too),
and 3) it could introduce rounding errors.

That's not faster than fmodf, and therefore it's not an option I
would consider, of course...

Since C and C++ are completely separate languages, decide which one
you want to code in and ask in only the appropriate group.

A newsgroup devoted to your system. There may be a hardware
instruction that does exactly what you want.

This doesn't really help. Do you know of a newsgroup devoted to my
system? I don't, that's why I asked...

Thanks anyways for your response,
Jason

jason.cipriani · Jun 29, 2008

I would expect modff to be faster at breaking a float
into integer and fractional parts, simply because,
unlike fmodf, it can't be used for anything else.

Thanks! I didn't know about this function. I'll see if it improves the
speed when I get a chance to test. To make it work with negative
numbers like I want it to, it will be (using doubles, although in
reality I'm doing this in C++ and using the C++ std::modf version that
works with floats) something like this:

double wrap_one (double v) {
double ipart;
if (v >= 0.0)
return modf(v, &ipart);
else
return 1.0 - modf(-v, &ipart);
}

Jason

Boon · Jun 30, 2008

Jason said:
On a modern 32-bit Intel machine, but not using SSE, what is the
fastest way to compute the fractional part of a float?

In the future, what is the best newsgroup for questions about
machine-specific optimizations like this, not necessarily in any
specific programming language?

I'd give comp.lang.asm.x86 a try.

christian.bau · Jun 30, 2008

Because it's a measured bottleneck in a performance-critical inner
loop... nothing premature about that...

I know the relative range of the values, I should have mentioned that.
I don't know exactly what it can be but I can guarantee it won't
exceed INT_MAX. And I'm not using this method anyways.

It is unlikely that your real problem is "calculate x mod 1 for a
gazillion different values of x". It is highly likely that looking at
the problem from a higher level will produce a faster solution by
looking at things from a completely different angle. So you should
post what you _really_ want to achieve.

jason.cipriani · Jun 30, 2008

It is unlikely that your real problem is "calculate x mod 1 for a
gazillion different values of x". It is highly likely that looking at
the problem from a higher level will produce a faster solution by
looking at things from a completely different angle. So you should
post what you _really_ want to achieve.

I'm working with video post-processing in HSL color space and at one
stage I need to correct out-of-range hue components, where the correct
range is [0,1] (representing 0 to 360 degrees). I promise, from the
bottom of my heart, that's what I really want to achieve. I am,
indeed, looking for a way to calculate x mod 1 for a gazillion
different values of x. I'm not looking for alternative high level
algorithms. The way it works now is already correct. I promise.

Incidently, fmod still seems to be significantly faster than modf. See
my other post.

Jason

jason.cipriani · Jun 30, 2008

Thanks! I didn't know about this function. I'll see if it improves the
speed when I get a chance to test. To make it work with negative
numbers like I want it to, it will be (using doubles, although in
reality I'm doing this in C++ and using the C++ std::modf version that
works with floats) something like this:

It seems that modff is significantly slower than fmodf, taking about
1.5 to 2 times longer -- I'm imagining it has something to do with the
fact that it returns both the fractional AND integer parts. The
program I used to time it is at the end of this message. If you run it
provide a non-zero argument, fmodf is significantly slower when you
pass 0.0f to it for some reason.

Compiled with MinGW GCC 3.4.5:

gcc -std=c99 -O0 fmodtest.c

Results, in performance counter ticks (actual units here are
arbitrary, relative values are meaningful), Intel T2600 at 2.16 GHz:

$ ./a.exe -1.5
fmodf = 1514
modff = 2482
$ ./a.exe 0
fmodf = 2176
modff = 2409
$ ./a.exe 1.5
fmodf = 1411
modff = 2411

Differences when the value is in or out of the range [-1,1] are
negligible.

Program follows; if you aren't using Windows you'll have to fill in
something appropriate for timerval_t, tick(), and tickdiff(). You can
use gettimeofday() on Linux.

Jason

======== modtest.c =========

#include <math.h>
#include <stdio.h>
#include <stdlib.h>

/* do what you need to do here */
#ifdef _WIN32
# include <windows.h>
typedef LARGE_INTEGER timerval_t;
# define tick QueryPerformanceCounter
# define tickdiff(a,b) ((b).QuadPart - (a).QuadPart)
#else
/* typedef ... timerval_t;
void tick (timerval_t *t) { ... }
timerval_t tickdiff (timerval_t a, timerval_t b) { ... } */
#endif

#define ITERS 1000

inline float mymoda (float v) {
if (v >= 0.0f)
return fmodf(v, 1.0f);
else
return 1.0f - fmodf(-v, 1.0f);
}

inline float mymodb (float v) {
float ipart;
if (v >= 0.0f)
return modff(v, &ipart);
else
return 1.0f - modff(-v, &ipart);
}

int main (int argc, char **argv) {

timerval_t a1, a2, b1, b2;
float v, s = 0.0f;
unsigned n;
unsigned long at, bt;

/* v is non-constant to prevent optimizations with > -O0 */
v = atof(argv[1]);
printf("using value %f\n", v);

tick(&a1);
for (n = 0; n < ITERS; ++ n) {
s += mymoda(v);
s += mymoda(v);
s += mymoda(v);
s += mymoda(v);
s += mymoda(v);
s += mymoda(v);
s += mymoda(v);
s += mymoda(v);
s += mymoda(v);
s += mymoda(v);
}
tick(&a2);

tick(&b1);
for (n = 0; n < ITERS; ++ n) {
s += mymodb(v);
s += mymodb(v);
s += mymodb(v);
s += mymodb(v);
s += mymodb(v);
s += mymodb(v);
s += mymodb(v);
s += mymodb(v);
s += mymodb(v);
s += mymodb(v);
}
tick(&b2);

at = (unsigned long)tickdiff(a1, a2);
bt = (unsigned long)tickdiff(b1, b2);
printf("fmodf = %lu\nmodff = %lu\n", at, bt);

/* prevent calculations from being optimized away with > -O0 */
return (int)s;

}

jason.cipriani · Jun 30, 2008

I'd give comp.lang.asm.x86 a try.

Thanks!

Jason

Flash Gordon · Jul 22, 2008

Eric Sosman wrote, On 30/06/08 20:45:

Also, there's a trick I learned many years ago in my days of
writing code for radars: Represent angles as unsigned integers of
a suitable width, dividing the circle into Uxxx_MAX+1 parts. (The
trick even had a name, "Binary Angular Measure," and angles were
measured in so-and-so many "bams.") The advantage, of course, is
that the wrap-around at two pi just falls out of the properties of
unsigned arithmetic with no extra effort.

Back in the day I used the same trick (and term) for Infra-Red cameras
and turret systems of various types. It's not only for the reason you
give, also integer arithmetic was *vastly* faster back before the days
of FPUs

user923005 · Jul 22, 2008

Thanks! I didn't know about this function. I'll see if it improves the
speed when I get a chance to test. To make it work with negative
numbers like I want it to, it will be (using doubles, although in
reality I'm doing this in C++ and using the C++ std::modf version that
works with floats) something like this:

Click to expand...

It seems that modff is significantly slower than fmodf, taking about
1.5 to 2 times longer -- I'm imagining it has something to do with the
fact that it returns both the fractional AND integer parts. The
program I used to time it is at the end of this message. If you run it
provide a non-zero argument, fmodf is significantly slower when you
pass 0.0f to it for some reason.

Compiled with MinGW GCC 3.4.5:

gcc -std=c99 -O0 fmodtest.c

Results, in performance counter ticks (actual units here are
arbitrary, relative values are meaningful), Intel T2600 at 2.16 GHz:

$ ./a.exe -1.5
fmodf = 1514
modff = 2482
$ ./a.exe 0
fmodf = 2176
modff = 2409
$ ./a.exe 1.5
fmodf = 1411
modff = 2411

Differences when the value is in or out of the range [-1,1] are
negligible.

Program follows; if you aren't using Windows you'll have to fill in
something appropriate for timerval_t, tick(), and tickdiff(). You can
use gettimeofday() on Linux.

Jason

======== modtest.c =========

#include <math.h>
#include <stdio.h>
#include <stdlib.h>

/* do what you need to do here */
#ifdef _WIN32
# include <windows.h>
typedef LARGE_INTEGER timerval_t;
# define tick QueryPerformanceCounter
# define tickdiff(a,b) ((b).QuadPart - (a).QuadPart)
#else
/* typedef ... timerval_t;
void tick (timerval_t *t) { ... }
timerval_t tickdiff (timerval_t a, timerval_t b) { ... } */
#endif

#define ITERS 1000

inline float mymoda (float v) {
if (v >= 0.0f)
return fmodf(v, 1.0f);
else
return 1.0f - fmodf(-v, 1.0f);

}

inline float mymodb (float v) {
float ipart;
if (v >= 0.0f)
return modff(v, &ipart);
else
return 1.0f - modff(-v, &ipart);

}

int main (int argc, char **argv) {

timerval_t a1, a2, b1, b2;
float v, s = 0.0f;
unsigned n;
unsigned long at, bt;

/* v is non-constant to prevent optimizations with > -O0 */
v = atof(argv[1]);
printf("using value %f\n", v);

tick(&a1);
for (n = 0; n < ITERS; ++ n) {
s += mymoda(v);
s += mymoda(v);
s += mymoda(v);
s += mymoda(v);
s += mymoda(v);
s += mymoda(v);
s += mymoda(v);
s += mymoda(v);
s += mymoda(v);
s += mymoda(v);
}
tick(&a2);

tick(&b1);
for (n = 0; n < ITERS; ++ n) {
s += mymodb(v);
s += mymodb(v);
s += mymodb(v);
s += mymodb(v);
s += mymodb(v);
s += mymodb(v);
s += mymodb(v);
s += mymodb(v);
s += mymodb(v);
s += mymodb(v);
}
tick(&b2);

at = (unsigned long)tickdiff(a1, a2);
bt = (unsigned long)tickdiff(b1, b2);
printf("fmodf = %lu\nmodff = %lu\n", at, bt);

/* prevent calculations from being optimized away with > -O0 */
return (int)s;

}

Using visual studio in 64 bit mode, I got even more dramatic
difference (showing fmodf() to be faster than integer truncation!):

#include <math.h>
#include <stdio.h>
#include <stdlib.h>

/* do what you need to do here */
#ifdef _WIN32
# include <windows.h>
typedef LARGE_INTEGER timerval_t;
# define tick QueryPerformanceCounter
# define tickdiff(a,b) ((b).QuadPart - (a).QuadPart)
#define inline __inline
#else
/* typedef ... timerval_t;
void tick (timerval_t *t) { ... }
timerval_t tickdiff (timerval_t a, timerval_t b) { ... } */
#endif

#define ITERS 1000

inline float mymoda (float v) {
if (v >= 0.0f)
return fmodf(v, 1.0f);
else
return 1.0f - fmodf(-v, 1.0f);
}

inline float mymodb (float v) {
float ipart;
if (v >= 0.0f)
return modff(v, &ipart);
else
return 1.0f - modff(-v, &ipart);
}

inline float primitive_mod(float v)
{
return v - (float) (long long) v;
}

int main (int argc, char **argv) {
timerval_t a1, a2, b1, b2, c1, c2;
float v, s = 0.0f;
unsigned n;
unsigned long at, bt, ct;
/* v is non-constant to prevent optimizations with > -O0 */
if (argc < 2)
{
puts("Error -- you must supply a floating point number.");
exit(EXIT_FAILURE);
}
v = atof(argv[1]);
printf("using value %f\n", v);
tick(&a1);
for (n = 0; n < ITERS; ++ n) {
s += mymoda(v);
s += mymoda(v);
s += mymoda(v);
s += mymoda(v);
s += mymoda(v);
s += mymoda(v);
s += mymoda(v);
s += mymoda(v);
s += mymoda(v);
s += mymoda(v);
}
tick(&a2);
tick(&b1);
s = 0.0f;
for (n = 0; n < ITERS; ++ n) {
s += mymodb(v);
s += mymodb(v);
s += mymodb(v);
s += mymodb(v);
s += mymodb(v);
s += mymodb(v);
s += mymodb(v);
s += mymodb(v);
s += mymodb(v);
s += mymodb(v);
}
tick(&b2);
tick(&c1);
s = 0.0f;
for (n = 0; n < ITERS; ++ n) {
s += primitive_mod(v);
s += primitive_mod(v);
s += primitive_mod(v);
s += primitive_mod(v);
s += primitive_mod(v);
s += primitive_mod(v);
s += primitive_mod(v);
s += primitive_mod(v);
s += primitive_mod(v);
s += primitive_mod(v);
}
tick(&c2);

at = (unsigned long)tickdiff(a1, a2);
bt = (unsigned long)tickdiff(b1, b2);
ct = (unsigned long)tickdiff(c1, c2);
printf("fmodf = %lu\nmodff = %lu\nprimitive = %lu\n", at, bt, ct);
/* prevent calculations from being optimized away with > -O0 */
return (int)s;
}

/*
C:\tmp\testmod\x64\Release>testmod 3.1415926535
using value 3.141593
fmodf = 9
modff = 793
primitive = 156

C:\tmp\testmod\x64\Release>testmod 3.1415926535
using value 3.141593
fmodf = 8
modff = 834
primitive = 153

C:\tmp\testmod\x64\Release>testmod 3.1415926535
using value 3.141593
fmodf = 9
modff = 794
primitive = 156
*/

Confused about benchmarks calculating averages of large arrays.	3	May 26, 2008
Computation slow with float than double.	13	Jun 6, 2005
how to: fast & easy updates??	8	Oct 14, 2003
Where to Find Linux API Information?	15	Nov 27, 2006
Hello!	5	Dec 21, 2003
Hello!	2	Dec 21, 2003
comp.lang.c Answers to Frequently Asked Questions (FAQ List)	15	Apr 1, 2006
comp.lang.c Changes to Answers to Frequently Asked Questions (FAQ)	1	Jul 3, 2004

Fast fmodf(..., 1.0f)

jason.cipriani

jason.cipriani

Barry Schwarz

jason.cipriani

jason.cipriani

Boon

christian.bau

jason.cipriani

jason.cipriani

jason.cipriani

Flash Gordon

user923005

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads