How to write a small graceful gcd function?

T

Tom St Denis

Dann said:
How can the division way be quadratic? It takes out repeated factors in
every iteration. The above example takes 2 modulus cycles.

You assume division is atomic and equivalent to subtraction?

.... really?

Try saying that for 1000 it numbers.

Tom
 
D

Dann Corbit

Ben Pfaff said:
Are we talking about the same binary GCD algorithm that's in
Knuth? There's a wikipedia page about it:
http://en.wikipedia.org/wiki/Binary_GCD_algorithm

(Every algorithm is in Knuth if you look hard enough.)

He wasn't aware of using a priority queue to do a merge. I sent him some
email about it a long time ago. Lots better than the snowplow.
--
int main(void){char
p[]="ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz.\
\n",*q="kl BIcNBFr.NKEzjwCIxNJC";int i=sizeof p/2;char *strchr();int
putchar(\
);while(*q){i+=strchr(p,*q++)-p;if(i>=(int)sizeof p)i-=sizeof
p-1;putchar(p\
);}return 0;}
 
T

Tom St Denis

Ben said:
Are we talking about the same binary GCD algorithm that's in
Knuth? There's a wikipedia page about it:
http://en.wikipedia.org/wiki/Binary_GCD_algorithm

(Every algorithm is in Knuth if you look hard enough.)

Hmm, lots of them are, but not all. Specially when you start moving
off into more crypto like problems [e.g. elliptic curves, real
exponentiation [*]]. Also he misses a lot more practical stuff than
you'd think. If we all implemented multiplication by his book we
wouldn't be using comba techniques for instance.

[*] I realize he talks about addition chains but I don't recall how
much he talks about sliding windows for instance.

Definitely worth having them. I bought the entire set when I was still
in high scool. Served me well through college and afterwards so far.

Tom
 
D

Dann Corbit

Tom St Denis said:
You assume division is atomic and equivalent to subtraction?

... really?

On modern CPUs in hardware, it is. Modulus will be slower by a small,
constant factor. Since the algorithm itself has the same performance, and
since the loop does not execute very many times, I guess that the difference
will be very small.
Try saying that for 1000 it numbers.

I guess that you mean for 1000 digit numbers. If you have to emulate the
math in software, it might be a significant difference.
 
T

Tom St Denis

Dann said:
On modern CPUs in hardware, it is. Modulus will be slower by a small,
constant factor. Since the algorithm itself has the same performance, and
since the loop does not execute very many times, I guess that the difference
will be very small.

First off, that's not even remotely true. Division is a LOT slower
than subtraction. Or put it another way, if they took the same number
of cycles you'd be looking at a 37MHz processor.

Second, division is often dozens if not hundreds of cycles.

Third, most processors don't even have division opcodes [mips, ARM, PPC
anyone?]
I guess that you mean for 1000 digit numbers. If you have to emulate the
math in software, it might be a significant difference.

digit bit whatever.

Division is normally quadratic unless you start getting fancy with the
Karatsuba.

Since division is quadratic, doing gcd with it is also quadratic.
think about it.

Division: O(n^2)

GCD: O(cn^2) for any [say] small c

What does the O() of GCD reduce to?

Subtraction is fully linear O(n). So GCD == O(cn) means it's also
linear.

*In practice* for small numbers they may act comparably. Specially if
a mod b is small. Which is why it's not entirely a bad approach.

For small types, unless you can prove otherwise it's usually best to
use the binary approach. It's more "portable" [in that you don't need
division] and usually just as fast [*]

Tom

[*] in fact it's guaranteed to be no slower on platforms without a
division opcode.
 
W

Walter Roberson

Tom St Denis said:
Third, most processors don't even have division opcodes [mips, ARM, PPC
anyone?]

The original MIPS design did not have division, but division was
provided right from the first commercial offering, the R2000.

It could be that there is no division in some of the processors aimed
at the embedded market; I'm only familiar with the MIPS general purpose
chips.
 
T

Tom St Denis

Walter said:
The original MIPS design did not have division, but division was
provided right from the first commercial offering, the R2000.

It could be that there is no division in some of the processors aimed
at the embedded market; I'm only familiar with the MIPS general purpose
chips.

And likely it's not one cycle right?

Kinda violates the MIPS RISC principles ...

Of course, if they could add a divider why can't they add an "add with
carry" hehehehe...

Though the 4Km series is nice in that regard.

tom
 
D

Dann Corbit

Tom St Denis said:
Dann said:
On modern CPUs in hardware, it is. Modulus will be slower by a small,
constant factor. Since the algorithm itself has the same performance,
and
since the loop does not execute very many times, I guess that the
difference
will be very small.

First off, that's not even remotely true. Division is a LOT slower
than subtraction. Or put it another way, if they took the same number
of cycles you'd be looking at a 37MHz processor.

Second, division is often dozens if not hundreds of cycles.

Third, most processors don't even have division opcodes [mips, ARM, PPC
anyone?]
I guess that you mean for 1000 digit numbers. If you have to emulate the
math in software, it might be a significant difference.

digit bit whatever.

Division is normally quadratic unless you start getting fancy with the
Karatsuba.

Since division is quadratic, doing gcd with it is also quadratic.
think about it.

Division is not quadratic in hardware.
It will take a definite, fixed number of cycles at most. (17 for a 64 bit
AMD CPU, IIRC).

Subtraction will be faster, but only by a small, fixed constant.

If your hardware does not support division, then the subtraction method
would be a significant advantage.
Division: O(n^2)

GCD: O(cn^2) for any [say] small c

What does the O() of GCD reduce to?

Subtraction is fully linear O(n). So GCD == O(cn) means it's also
linear.

*In practice* for small numbers they may act comparably. Specially if
a mod b is small. Which is why it's not entirely a bad approach.

For small types, unless you can prove otherwise it's usually best to
use the binary approach. It's more "portable" [in that you don't need
division] and usually just as fast [*]

Tom

[*] in fact it's guaranteed to be no slower on platforms without a
division opcode.
/*
Now that I actually understand how it works, I like the subtraction trick a
lot better. In this instance, (lots of small repeated factors) it takes the
same number of iterations as the modulo version
*/
unsigned long long modulos = 0;
unsigned long long subtractions = 0;
unsigned long long gcdm(unsigned long long a, unsigned long long b)
{
unsigned long long k, t;

k = 0;
while (!(a & 1 || b & 1)) {
++k; a >>= 1; b >>= 1;
}
while (!(a & 1)) { a >>= 1; }
while (!(b & 1)) { b >>= 1; }

while (b) {
if (a > b) { t = a; a = b; b = t; }
b = b % a;
modulos++;
while (b && !(b & 1)) { b >>= 1; }
}
return a << k;
}
unsigned long long gcds(unsigned long long a, unsigned long long b)
{
unsigned long long k, t;

k = 0;
while (!(a & 1 || b & 1)) {
++k; a >>= 1; b >>= 1;
}
while (!(a & 1)) { a >>= 1; }
while (!(b & 1)) { b >>= 1; }

while (b) {
if (a > b) { t = a; a = b; b = t; }
b = b - a;
subtractions++;
while (b && !(b & 1)) { b >>= 1; }
}
return a << k;
}
unsigned long long big=129746337890625;
unsigned long long med=8649755859375*7;

#include <stdio.h>
#include <stdlib.h>
#include <time.h>
unsigned long long randvals[1000000];
int main(void)
{
clock_t start;
clock_t end;
unsigned long long answer=0;
size_t index;
for (index = 0; index < 1000000; index++)
{
randvals[index] = rand();
randvals[index] <<= 32;
randvals[index] += rand();
}
start = clock();
for (index = 0; index < 1000000-1; index++)
answer += gcdm(randvals[index],randvals[index+1]);
end = clock();
printf("clocks to do one million modular gcd's = %u\n", (unsigned)
end-start) ;
answer = 0;
start = clock();
for (index = 0; index < 1000000-1; index++)
answer += gcds(randvals[index],randvals[index+1]);
end = clock();
printf("clocks to do one million subtraction gcd's = %u\n", (unsigned)
end-start) ;
return 0;
}

/*
clocks to do one million modular gcd's = 1031
clocks to do one million subtraction gcd's = 703
Press any key to continue . . .
*/
 
T

Tom St Denis

Dann said:
Division is not quadratic in hardware.
It will take a definite, fixed number of cycles at most. (17 for a 64 bit
AMD CPU, IIRC).

Actually you're still wrong. It's either quadratic in time or space
[or both]. You can make it O(1) time by taking O(n^2) space but that's
still "quadratic". You linearize it by taking linear space O(n) for
O(n) which is NOT quadratic [just not practical for the size of numbers
you work with in processors] by using interpolation [e.g. Karatsuba]

hint: Why do you think multipliers are fast. It isn't because they
violated the O(n^2) rule. It's because the multipliers are big [*].
They take up nearly O(n^2) space to do the multiplication in a trivial
amount of time.
Subtraction will be faster, but only by a small, fixed constant.

If your hardware does not support division, then the subtraction method
would be a significant advantage.

Or if you have to work with larger numbers. Even with an O(1) division
opcode, the division of n-digit numbers is O(n^2) time. So even if
divide == sub for time on a single digit it'd be slower for larger
numbers [proof: division by shift/subtract is slower than a single
subtraction].

Tom

[*] Some processors use a Karatsuba trick but so far I haven't heard of
any x86 series using it as a fact....
 
D

Dann Corbit

Ben Pfaff said:
Are we talking about the same binary GCD algorithm that's in
Knuth? There's a wikipedia page about it:
http://en.wikipedia.org/wiki/Binary_GCD_algorithm

(Every algorithm is in Knuth if you look hard enough.)

Tom's version seems somewhat faster than the web article version.
The timings below are after profile guided optimization.

/*
Now that I actually understand how it works, I like the subtraction trick a
lot better. In this instance, (lots of small repeated factors) it takes the
same number of iterations as the modulo version
*/
unsigned long long modulos = 0;
unsigned long long subtractions = 0;
unsigned long long gcdm(unsigned long long a, unsigned long long b)
{
unsigned long long k, t;

k = 0;
while (!(a & 1 || b & 1)) {
++k; a >>= 1; b >>= 1;
}
while (!(a & 1)) { a >>= 1; }
while (!(b & 1)) { b >>= 1; }

while (b) {
if (a > b) { t = a; a = b; b = t; }
b = b % a;
while (b && !(b & 1)) { b >>= 1; }
}
return a << k;
}
unsigned long long gcds(unsigned long long a, unsigned long long b)
{
unsigned long long k, t;

k = 0;
while (!(a & 1 || b & 1)) {
++k; a >>= 1; b >>= 1;
}
while (!(a & 1)) { a >>= 1; }
while (!(b & 1)) { b >>= 1; }

while (b) {
if (a > b) { t = a; a = b; b = t; }
b = b - a;
while (b && !(b & 1)) { b >>= 1; }
}
return a << k;
}

unsigned long long gcd(unsigned long long u, unsigned long long v) {
unsigned long long k = 0;
if (u == 0)
return v;
if (v == 0)
return u;
while ((u & 1) == 0 && (v & 1) == 0) { /* while both u and v are even
*/
u >>= 1; /* shift u right, dividing it by 2 */
v >>= 1; /* shift v right, dividing it by 2 */
k++; /* add a power of 2 to the final result */
}
/* At this point either u or v (or both) is odd */
do {
if ((u & 1) == 0) /* if u is even */
u >>= 1; /* divide u by 2 */
else if ((v & 1) == 0) /* else if v is even */
v >>= 1; /* divide v by 2 */
else if (u >= v) /* u and v are both odd */
u = (u-v) >> 1;
else /* u and v both odd, v > u */
v = (v-u) >> 1;
} while (u > 0);
return v << k; /* returns v * 2^k */
}

unsigned long long big=129746337890625;
unsigned long long med=8649755859375*7;

#include <stdio.h>
#include <stdlib.h>
#include <time.h>
unsigned long long randvals[1000000];
int main(void)
{
clock_t start;
clock_t end;
unsigned long long answer=0;
size_t index;
for (index = 0; index < 1000000; index++)
{
randvals[index] = rand();
randvals[index] <<= 32;
randvals[index] += rand();
}
start = clock();
for (index = 0; index < 1000000-1; index++)
answer += gcdm(randvals[index],randvals[index+1]);
end = clock();
printf("clocks to do one million modular gcd's = %u, summed GCD =
%llu\n", (unsigned) end-start, answer) ;
answer = 0;
start = clock();
for (index = 0; index < 1000000-1; index++)
answer += gcds(randvals[index],randvals[index+1]);
end = clock();
printf("clocks to do one million subtraction gcd's = %u, summed GCD =
%llu\n", (unsigned) end-start, answer) ;
answer = 0;
start = clock();
for (index = 0; index < 1000000-1; index++)
answer += gcd(randvals[index],randvals[index+1]);
end = clock();
printf("clocks to do one million subtraction gcd's {web article version} =
%u, summed GCD = %llu\n", (unsigned) end-start, answer) ;
return 0;
}

/*
clocks to do one million modular gcd's = 1031, summed GCD = 9203554
clocks to do one million subtraction gcd's = 766, summed GCD = 9203554
clocks to do one million subtraction gcd's {web article version} = 1109,
summed GCD = 9203554
Press any key to continue . . .

*/

--
int main(void){char
p[]="ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz.\
\n",*q="kl BIcNBFr.NKEzjwCIxNJC";int i=sizeof p/2;char *strchr();int
putchar(\
);while(*q){i+=strchr(p,*q++)-p;if(i>=(int)sizeof p)i-=sizeof
p-1;putchar(p\
);}return 0;}
 
D

Dann Corbit

Ben Pfaff said:
Tom St Denis said:
Ben said:
Are we talking about the same binary GCD algorithm that's in
Knuth? There's a wikipedia page about it:
http://en.wikipedia.org/wiki/Binary_GCD_algorithm

(Every algorithm is in Knuth if you look hard enough.)

Hmm, lots of them are, but not all. [...]

For what it's worth, that was meant as a joke.

It's pretty darn close to correct. It is very unusual to need an algorithm
that is not covered in one of the volumes.

As an aside, some of his algorithms are superior to those in common use...
{And TAOCP was written when?!}

E.g. Merge insertion sorting is fabulous.
 
D

Dann Corbit

Tom St Denis said:
Dann said:
Division is not quadratic in hardware.
It will take a definite, fixed number of cycles at most. (17 for a 64 bit
AMD CPU, IIRC).

Actually you're still wrong. It's either quadratic in time or space
[or both]. You can make it O(1) time by taking O(n^2) space but that's
still "quadratic". You linearize it by taking linear space O(n) for
O(n) which is NOT quadratic [just not practical for the size of numbers
you work with in processors] by using interpolation [e.g. Karatsuba]

hint: Why do you think multipliers are fast. It isn't because they
violated the O(n^2) rule. It's because the multipliers are big [*].
They take up nearly O(n^2) space to do the multiplication in a trivial
amount of time.

I'm sorry, but I'm not wrong. You fundamentally misunderstand what O(f(n))
means.

If something takes a fixed number of cycles to complete at most, then it is
not O(log(n)) or O(n^2) or anything else. It is O(1).

Subtraction, division, multiplication, modulus, etc. are ALL O(1) on modern
64 bit hardware if the integer values are also 64 bit.
Subtraction will be faster, but only by a small, fixed constant.

If your hardware does not support division, then the subtraction method
would be a significant advantage.

Or if you have to work with larger numbers. Even with an O(1) division
opcode, the division of n-digit numbers is O(n^2) time. So even if
divide == sub for time on a single digit it'd be slower for larger
numbers [proof: division by shift/subtract is slower than a single
subtraction].

I am not arguing this point. I also agree that the subtraction method is
better. The notion that the hardware version of the algorithm is O(n^2) is
pure fantasy on your part, though.
Tom

[*] Some processors use a Karatsuba trick but so far I haven't heard of
any x86 series using it as a fact....
 
K

Keith Thompson

Dann Corbit said:
I'm sorry, but I'm not wrong. You fundamentally misunderstand what O(f(n))
means.

If something takes a fixed number of cycles to complete at most, then it is
not O(log(n)) or O(n^2) or anything else. It is O(1).

Subtraction, division, multiplication, modulus, etc. are ALL O(1) on modern
64 bit hardware if the integer values are also 64 bit.

In this case, n is fixed at 64, so O(n) and O(1) are effectively the
same thing.

If you want to handle n > 64 (multiplication of arbitrarily large
numbers), it's probably going to exceed O(1).
 
T

Tom St Denis

Dann said:
I'm sorry, but I'm not wrong. You fundamentally misunderstand what O(f(n))
means.

If something takes a fixed number of cycles to complete at most, then it is
not O(log(n)) or O(n^2) or anything else. It is O(1).

No, that's wrong. In the case of multiplication they traded space for
performance. That's still a O(n^2) algorithm even though it may take
O(1) time.

What do you think processors just magically multiply 64 bit quantities
with a O(1) space algorithm?

If that were the case we'd be doing RSA in x86 processors in a dozen
cycles with a single opcode.
Subtraction, division, multiplication, modulus, etc. are ALL O(1) on modern
64 bit hardware if the integer values are also 64 bit.

They may take O(1) TIME [which even then is debatable] but they
certainly are not O(1) algorithms.

Think of O() as a unit of resources. Multiplication takes n^2
bit-multiplications to compute for n bits [where here a mult is and
AND]. You can compute it serially in n^2 time or if you have n^2
parallel multipliers in O(1) time. The algorithm still takes O(n^2)
resources.

By your logic, expanding a basic multiplier would always be more
efficient than Karatsuba since "it's linear" which simply isn't true.
Or if you have to work with larger numbers. Even with an O(1) division
opcode, the division of n-digit numbers is O(n^2) time. So even if
divide == sub for time on a single digit it'd be slower for larger
numbers [proof: division by shift/subtract is slower than a single
subtraction].

I am not arguing this point. I also agree that the subtraction method is
better. The notion that the hardware version of the algorithm is O(n^2) is
pure fantasy on your part, though.

I said the ALGORITHM is quadratic. Not the implementation. For all I
know your processor HAS a linear time divider [a real one] and
therefore the algorithm is linear. But using the typical notions of a
processor that algorithm is fully quadratic.

Think about it, replace "unsigned long" with "bigint_t". The ALGORITHM
is the same just the numbers changed size. Why is it now quadratic and
before it wasn't?

Tom
 
B

Ben Pfaff

Tom St Denis said:
No, that's wrong. In the case of multiplication they traded space for
performance. That's still a O(n^2) algorithm even though it may take
O(1) time.

You seem to believe that you multiply time by space to obtain the
cost of an algorithm. I've never seen anyone do that before.
Merge sort takes O(n lg n) time and O(n) additional space, so
does that make it an O(n^2 lg n) algorithm?

I could accept that multiplication of fixed-size integers can be
implemented given O(1) time and O(n^2) space. That doesn't make
it "an O(n^2) algorithm"; that's a meaningless thing to say. It
makes it an O(1) time, O(n^2) space algorithm.
 
T

Tom St Denis

Ben said:
You seem to believe that you multiply time by space to obtain the
cost of an algorithm. I've never seen anyone do that before.
Merge sort takes O(n lg n) time and O(n) additional space, so
does that make it an O(n^2 lg n) algorithm?

No, because it doesn't take n space per unit of time.

Basic multiplication is O(n^2) because there are n^2 steps. Whether
they happen in parallel or not doesn't matter.

Karatsuba is a O(n^1.583) time algorithm because there are n^1.583
multiplications. Whether you do them all in parallel doesn't matter
(and in fact in hardware that's usually the case).

The complexity of an algorithm doesn't change just because you trade
memory or space for time.

Besides, you could do all compares in parallel. Therefore, merge sort
is a O(ln n) algorithm. But wait, space complexity doesn't matter. So
therefore all sorting should be O(1) complexity!!!
I could accept that multiplication of fixed-size integers can be
implemented given O(1) time and O(n^2) space. That doesn't make
it "an O(n^2) algorithm"; that's a meaningless thing to say. It
makes it an O(1) time, O(n^2) space algorithm.

Well first off O() refers to complexity. If you TRADE space for time
that doesn't lower the complexity. It lowers the time. I'll grant you
the TIME complexity of a O(n^2) space multiplier is O(1).

Tom
 
D

Dann Corbit

Tom St Denis said:
No, that's wrong. In the case of multiplication they traded space for
performance. That's still a O(n^2) algorithm even though it may take
O(1) time.

If you use the martian definition.
What do you think processors just magically multiply 64 bit quantities
with a O(1) space algorithm?

Space is constant also. Do you think that the table grows and shrinks as I
multiply different numbers? It does not.
If that were the case we'd be doing RSA in x86 processors in a dozen
cycles with a single opcode.

RSA has to worry about 512 bit numbers and larger, which are not even in
this conversation.
Subtraction, division, multiplication, modulus, etc. are ALL O(1) on
modern
64 bit hardware if the integer values are also 64 bit.

They may take O(1) TIME [which even then is debatable] but they
certainly are not O(1) algorithms.

O(1) means that it completes in constant time. I guess that A*B or A/C or
A-C will complete in constant time if A,B,C are hardware integers.
Think of O() as a unit of resources. Multiplication takes n^2
bit-multiplications to compute for n bits [where here a mult is and
AND]. You can compute it serially in n^2 time or if you have n^2
parallel multipliers in O(1) time. The algorithm still takes O(n^2)
resources.

The size of the lookup table is fixed. The time to complete the unit of
work is fixed. Both are O(1) by the raw definition of the terms.

If you are talking about implementing arbitrary precision operations, then
your statements make sense. They make no sense at all in this context.
By your logic, expanding a basic multiplier would always be more
efficient than Karatsuba since "it's linear" which simply isn't true.

Something that executes in a fixed amount of time is O(1). That's the
definition of O(1). An O(1) algorithm could take a century to complete
(that's a fixed amount of time) and an exponential algorithm could take a
nanosecond to complete (because of the problem given to the algorithm).
Or if you have to work with larger numbers. Even with an O(1) division
opcode, the division of n-digit numbers is O(n^2) time. So even if
divide == sub for time on a single digit it'd be slower for larger
numbers [proof: division by shift/subtract is slower than a single
subtraction].

I am not arguing this point. I also agree that the subtraction method is
better. The notion that the hardware version of the algorithm is O(n^2)
is
pure fantasy on your part, though.

I said the ALGORITHM is quadratic. Not the implementation. For all I
know your processor HAS a linear time divider [a real one] and
therefore the algorithm is linear. But using the typical notions of a
processor that algorithm is fully quadratic.

The algorithm is not quadratic. The divider finishes in constant time.
Think about it, replace "unsigned long" with "bigint_t". The ALGORITHM
is the same just the numbers changed size. Why is it now quadratic and
before it wasn't?

Why should I replace unsigned long long with arbitrary precision? I made it
very clear that I was discussing the algorithm for hardware implementations.
 
T

Tom St Denis

Dann said:
If you use the martian definition.

I don't see the problem here. The algorithm has a complexity of O(n^2)
whether that is space or time it's still that complex.

Why do things get more efficient as you use more resources?
Space is constant also. Do you think that the table grows and shrinks as I
multiply different numbers? It does not.

Different SIZED NUMBERS. Yes, in fact the size grows quadratically
with the size of the input if you want to keep O(1) time.

What? You think a 16-bit and 64-bit multiplier are the same size for
constant time?
RSA has to worry about 512 bit numbers and larger, which are not even in
this conversation.

They're the same thing. Fundamentally a "multiplication algorithm" is
used even in the cpu. You're treating the processor as a huge black
box where everything is linear time/space. That's just not the case.
[hint: if it were processors would have more multipliers]
Subtraction, division, multiplication, modulus, etc. are ALL O(1) on
modern
64 bit hardware if the integer values are also 64 bit.

They may take O(1) TIME [which even then is debatable] but they
certainly are not O(1) algorithms.

O(1) means that it completes in constant time. I guess that A*B or A/C or
A-C will complete in constant time if A,B,C are hardware integers.

TIME is not the only measure of complexity.
Think of O() as a unit of resources. Multiplication takes n^2
bit-multiplications to compute for n bits [where here a mult is and
AND]. You can compute it serially in n^2 time or if you have n^2
parallel multipliers in O(1) time. The algorithm still takes O(n^2)
resources.

The size of the lookup table is fixed. The time to complete the unit of
work is fixed. Both are O(1) by the raw definition of the terms.

You're either ignorant of computer science or really obtuse. O()
notation is in terms of the size of the input. And no, for O(1) time
multiplication the table size is neither fixed, nor linearly dependent
on the size of the input.
If you are talking about implementing arbitrary precision operations, then
your statements make sense. They make no sense at all in this context.

It applies just as much to processor components as it does software.
You're just really naive and think otherwise [regardless, gcdm is still
quadratic unless you specify a different way to divide so it doesn't
matter]
Something that executes in a fixed amount of time is O(1). That's the
definition of O(1). An O(1) algorithm could take a century to complete
(that's a fixed amount of time) and an exponential algorithm could take a
nanosecond to complete (because of the problem given to the algorithm).

TIME is not the only measure of complexity.
I said the ALGORITHM is quadratic. Not the implementation. For all I
know your processor HAS a linear time divider [a real one] and
therefore the algorithm is linear. But using the typical notions of a
processor that algorithm is fully quadratic.

The algorithm is not quadratic. The divider finishes in constant time.

This is impossible. The algorithm isn't constant . The implementation
may be [due to a specific multiplier] but as you wrote it the algorithm
itself is quadratic time because division is of O(n^2) complexity.
Why should I replace unsigned long long with arbitrary precision? I made it
very clear that I was discussing the algorithm for hardware implementations.

Ok but unless you use a specific division opcode which is of O(1)
complexity [none exist btw] then the algorithm itself isn't of O(1)
complexity.

You're confusing runtime with the actual implementation details. Your
multiplier is fast because it takes O(n^2) space. It's still the same
O(n^2) multiplication algorithm you learned as a 6 year old just all of
the mults are done in parallel.

Complexity is a measure of both space and time not just time.

I suppose it's too much to tell you that addition is O(n) complexity
too then right?

....



Tom
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,184
Messages
2,570,979
Members
47,579
Latest member
CharaS3188

Latest Threads

Top