processing limitation in Python

P

pdr14

If I un-comment any line in this program below the line where I
commented " all OK up to this point " This program locks up my
computer.

Windows task manager will show "Not Responding" for Python in the
Applications tab and in the Performance tabe the CPU usage will be
locked at %100.

I've experienced the same problem on 2 different computers, one running
2000 pro. the other running XP home eddition. both computers run
Python 2.4.2

I'm just wondering if any one else has noticed any problems with
working with large numbers in Python ind if there is anything that can
work around this issue.

Thankd for reading
David

def factor(n):
d = 2
factors = [ ]
while n > 1:
if n % d == 0:
factors.append(d)
n = n/d
else:
d = d + 1
print factors
factor (12)
factor (123)
factor (1234)
factor (12345)
factor (123456)
factor (1234567)
factor (12345678)
factor (123456789)
factor (1234567898)
factor (12345678987)
factor (123456789876)
factor (1234567898765) # all OK up to this point
#factor (12345678987654) # locks up computer if I run this line
#factor (123456789876543)
#factor (1234567898765432)
#factor (12345678987654321)
 
T

Tim Peters

[[email protected]]
If I un-comment any line in this program below the line where I
commented " all OK up to this point " This program locks up my
computer.

Windows task manager will show "Not Responding" for Python in the
Applications tab and in the Performance tabe the CPU usage will be
locked at %100.

I've experienced the same problem on 2 different computers, one running
2000 pro. the other running XP home eddition. both computers run
Python 2.4.2

I'm just wondering if any one else has noticed any problems with
working with large numbers in Python ind if there is anything that can
work around this issue.

Thankd for reading
David

def factor(n):
d = 2
factors = [ ]
while n > 1:
if n % d == 0:
factors.append(d)
n = n/d
else:
d = d + 1
print factors

You primary problem is that this is a horridly inefficient way to
factor, taking time propotional to n's largest prime divisor (which
may be n).
factor (12)
factor (123)
factor (1234)
factor (12345)
factor (123456)
factor (1234567)
factor (12345678)
factor (123456789)
factor (1234567898)
factor (12345678987)
factor (123456789876)
factor (1234567898765) # all OK up to this point
#factor (12345678987654) # locks up computer if I run this line

It doesn't lock up for me, using Python 2.3.5 or 2.4.2 on Windows (XP
Pro SP2, but the specific flavor of Windows shouldn't matter). I ran
it from a DOS box, and while it was plugging away on 12345678987654,
hitting Ctrl+C stopped it.

If you let it continue running, and you & your computer were immortal
(something worth shooting for :)), it would eventually print the
factorization. Since

12345678987654 = 2 * 3 * 2057613164609

the loop would have to go around over 2 trillion times to find the
final 2057613164609 prime factor. A simple enormous improvement is to
get out of the loop when d*d > n. Then n must be prime or 1. That
would slash the worst-care runtime from being proportional to n to
being proportional to sqrt(n).
 
B

bearophileHUGS

Using CPython or GMPY with a smarter algorithm in acceptable time you
can find that:

12345678987654 == 2 * 3 * 2057613164609

It's a very big number to factorize with that naive algorithm, so the
program hangs... (I have used an online factoring service).

Bye,
bearophile
 
D

Dave Hansen

If I un-comment any line in this program below the line where I
commented " all OK up to this point " This program locks up my
computer.

Hmm. Ctrl-C gets me out just fine. In Idle, at least.
Windows task manager will show "Not Responding" for Python in the
Applications tab and in the Performance tabe the CPU usage will be
locked at %100.

Well sure. It's pretty busy code.
I've experienced the same problem on 2 different computers, one running
2000 pro. the other running XP home eddition. both computers run
Python 2.4.2

I'm just wondering if any one else has noticed any problems with
working with large numbers in Python ind if there is anything that can
work around this issue.

Try running with the changes I've made below, and see if that tells
you anything.

def factor(n):
d = 2
pctr = 0
factors = [ ]
while n > 1:
if n % d == 0:
factors.append(d)
n = n/d
else:
d = d + 1
pctr += 1
if pctr >= 1000000:
print "So Far: " + str(factors)
pctr = 0
print factors
factor (12)
factor (123)
factor (1234)
factor (12345)
factor (123456)
factor (1234567)
factor (12345678)
factor (123456789)
factor (1234567898)
factor (12345678987)
factor (123456789876)
factor (1234567898765) # all OK up to this point
factor (12345678987654) # locks up computer if I run this line
#factor (123456789876543)
#factor (1234567898765432)
#factor (12345678987654321)


Hint: 2057613164609L is a Really Big Number (tm). If it's prime (I
don't know if it is or not), it will take more than 46 days on my
computer to figure that out. Did you wait that long?

Regards,
-=Dave
 
S

Steven D'Aprano

If I un-comment any line in this program below the line where I
commented " all OK up to this point " This program locks up my
computer.

It locks up the operating system as well? Shame on Windows.

What happens if you type ctrl-C to interrupt the processing?

I'm just wondering if any one else has noticed any problems with
working with large numbers in Python ind if there is anything that can
work around this issue.

Yes, you need a better algorithm.

You can prove to yourself that it isn't a problem with Python handling big
numbers by running this simple test:

factor(100000000000000000)

and watch how quickly it completes -- a fraction of a second. The problem
is that your test data have few factors, and so your function spends a
LONG time increasing d by one each iteration. Worst case, if your number
is a prime, it has to try to divide against *every* number, 2, 3, 4, ...
all the way up to the prime itself. This takes a LONG time if the number
is very large.

Some improvements you might like to try:

You have to check for factors of two. But after you have done that, you
are just wasting time to check for factors of 4, 6, 8, ... because they
can't possibly be factors. Pull the factor of two test out of the loop,
then start the test with d = 3 and increase by two instead of one.

You can stop looking for factors once you have reached the square root of
the original number. The square root is the biggest possible factor.

There are other improvements you can make, but they make the code more
complicated. Just these few things will give you a HUGE performance boost:

def factor(n):
factors = []
while n % 2 == 0:
factors.append(2)
n = n/2
d = 3
mx = int(n**0.5)
while (n > 1) and (d <= mx):
if n % d:
d = d+2
else:
factors.append(d)
n = n/d
if n != 1:
factors.append(n)
return factors


Using this new improved version, I get this calculation in about two
seconds:
[2, 3, 2057613164609L]

and this in less than a second:
[3, 3, 3, 3, 37, 37, 333667, 333667]
 
P

pdr14

Big help guys, thanks. There does seem to be a problem with Pythons
IDLE. If I run my oridgional program from a dos shell I can hit Ctrl-C
and free up the procesor, But running it in IDLE just locks up the
computer. Bad Windows. Thanks for the help with a more efficent
algorythm.

David KG2LI
 
A

Atanas Banov

But running it in IDLE just locks up the
computer. Bad Windows.

yeah, right - blame it all on Microsoft!

try ctrl-F6 (or Shell / Restart Shell from the menu) in IDLE, which
stops programs from infinite looping

- nas
 
D

Dennis Lee Bieber

Windows task manager will show "Not Responding" for Python in the
Applications tab and in the Performance tabe the CPU usage will be
locked at %100.
Windows will respond "Not Responding" to ANY program that is in such
a tight process loop that the OS can't tickle it (IE, it never calls a
system service).
I've experienced the same problem on 2 different computers, one running
2000 pro. the other running XP home eddition. both computers run
Python 2.4.2
How much memory...
def factor(n):
d = 2
factors = [ ]
while n > 1:
if n % d == 0:
factors.append(d)
n = n/d
else:
d = d + 1
print factors

Why bother collecting the "factors" into a list only to print them
at the bottom of the procedure -- you aren't returning a value from
factor(), so I'd suggest dropping the
factors = []
and
print factors

replace
factors.append(d)
with
print d #optional , if you want as many on a line as possible

Among other things, besides not needing memory to store the entire
list, you also get to see it working as each one is found and printed.
--
 
S

Steven D'Aprano

Why bother collecting the "factors" into a list only to print them
at the bottom of the procedure -- you aren't returning a value from
factor(), so I'd suggest dropping the
factors = []
and
print factors

replace
factors.append(d)
with
print d #optional , if you want as many on a line as possible

This is generally bad practice. You now have a function which can't
easily feed its output into any other function (without pipes, file
redirection and other tricks). The function is now locked down into one
rather limited use: printing the factors.

If you need the factors one at a time, the right way is to use a
generator, and yield them as they are found. Otherwise, the correct way to
solve this problem is to accumulate all the factors in a list and return
the list.

Among other things, besides not needing memory to store the entire
list, you also get to see it working as each one is found and printed.

The amount of memory needed to store the entire list is trivial for all
but the hugest numbers. A number with 1,000 factors would be enormous,
and yet the memory needed for a list of 1,000 ints could be as small as
8K -- trivial on modern computers.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,285
Messages
2,571,415
Members
48,107
Latest member
jigyasauniversity

Latest Threads

Top