Performance issue

Tom Carrick · Apr 2, 2005

Hi,

In my attempted learning of python, I've decided to recode an old
anagram solving program I made in C++. The C++ version runs in less
than a second, while the python takes 30 seconds. I'm not willing to
think it's just python being slow, so I was hoping someone could find
a faster way of doing this. Also, I was wondering if there was a more
builtin, or just nicer way of converting a string to a list (or using
the sort function on a list) than making a function for it.

The words.txt here is just a copy of FreeBSD's /usr/share/dict/words

Anyway, the code:

import string

# Need a function to convert a string to a list to be
# able to use the sort() function
def string2list(s):
l = []
for i in range(0, len(s)):
l.append(s)
return l

words = []
found = []

anagram = raw_input("Find anagrams of word: ")

f = open('words.txt', 'r')
file = f.read()
f.close()

words = file.splitlines()

sorted_anagram = anagram.lower()
sorted_anagram = string2list(anagram)
sorted_anagram.sort(lambda x, y: cmp(x, y))

while words:
if len(words[0]) == len(sorted_anagram):
wordlist = string2list(words[0])
wordlist.sort(lambda x, y: cmp(x, y))
sorted_wordlist = wordlist
if sorted_anagram == sorted_wordlist:
found.append(words[0])
del words[0]

print "Anagrams of " + anagram + ": "
while found:
print found[0] + " "
del found[0]

Irmen de Jong · Apr 2, 2005

Tom said:
Hi,

In my attempted learning of python, I've decided to recode an old
anagram solving program I made in C++. The C++ version runs in less
than a second, while the python takes 30 seconds. I'm not willing to
think it's just python being slow, so I was hoping someone could find
a faster way of doing this.

I like your attitude, not thinking that it's just Python that is slow

Also, I was wondering if there was a more
builtin, or just nicer way of converting a string to a list (or using
the sort function on a list) than making a function for it.

String to list: list("irmen") # --> ['i','r','m','e','n']
Sorted list: sorted("irmen") # --> ['e', 'i', 'm', 'n', 'r']
(the latter works in Python 2.4+)

The words.txt here is just a copy of FreeBSD's /usr/share/dict/words

Anyway, the code:

import string

# Need a function to convert a string to a list to be
# able to use the sort() function
def string2list(s):
l = []
for i in range(0, len(s)):
l.append(s)
return l

.... see above... just replace string2list(s) with sorted(s)

words = []
found = []

anagram = raw_input("Find anagrams of word: ")

f = open('words.txt', 'r')
file = f.read()
f.close()

Click to expand...

Style: don't use 'file' as a variable name, you're hiding the
builtin 'file' function

words = file.splitlines()

Click to expand...

You can obtain this list without reading the file in its entirety,
by using the readlines method of file objects:

words=open("words.txt").readlines()

sorted_anagram = anagram.lower()
sorted_anagram = string2list(anagram)
sorted_anagram.sort(lambda x, y: cmp(x, y))

Click to expand...

The lambda is optional and only slows it down
But to get a sorted list of letters, just use sorted(s)
if you're on Python 2.4+

while words:
if len(words[0]) == len(sorted_anagram):
wordlist = string2list(words[0])
wordlist.sort(lambda x, y: cmp(x, y))
sorted_wordlist = wordlist

Click to expand...

(same here.. replacing this by sorted(words[0]) probably
will speed it up rather significantly, partly because
it avoids the creation of those temporary lists)

if sorted_anagram == sorted_wordlist:
found.append(words[0])
del words[0]

print "Anagrams of " + anagram + ": "
while found:
print found[0] + " "
del found[0]

Click to expand...

print " ".join(found)

Cheers

--Irmen de Jong

Marc 'BlackJack' Rintsch · Apr 2, 2005

Tom Carrick said:
[â€¦] Also, I was wondering if there was a more
builtin, or just nicer way of converting a string to a list (or using
the sort function on a list) than making a function for it.

Use the `list()` builtin on the string and *just* the `sort()` method::

In [2]: characters = list('hello')

In [3]: characters
Out[3]: ['h', 'e', 'l', 'l', 'o']

In [4]: characters.sort()

In [5]: characters
Out[5]: ['e', 'h', 'l', 'l', 'o']

sorted_anagram = anagram.lower()
sorted_anagram = string2list(anagram)
sorted_anagram.sort(lambda x, y: cmp(x, y))

sorted_anagram = list(anagram.lower())
sorted_anagram.sort()

while words:
if len(words[0]) == len(sorted_anagram):
wordlist = string2list(words[0])
wordlist.sort(lambda x, y: cmp(x, y))
sorted_wordlist = wordlist
if sorted_anagram == sorted_wordlist:
found.append(words[0])
del words[0]

And here's the performance issue. Deleting the first element of a list
results in moving all remaining elements one index down. Better iterate
over the words in a for loop::

for word in words:
# use `word` instead of `word[0]` in the loop body.
...

Ciao,
Marc 'BlackJack' Rintsch

Marc 'BlackJack' Rintsch · Apr 2, 2005

You can obtain this list without reading the file in its entirety,
by using the readlines method of file objects:

words=open("words.txt").readlines()

This leaves the newline characters at the end of each line while
`str.splitlines()` removes them.

Ciao,
Marc 'BlackJack' Rintsch

vincent wehren · Apr 2, 2005

| Hi,
|
| In my attempted learning of python, I've decided to recode an old
| anagram solving program I made in C++. The C++ version runs in less
| than a second, while the python takes 30 seconds. I'm not willing to
| think it's just python being slow, so I was hoping someone could find
| a faster way of doing this. Also, I was wondering if there was a more
| builtin, or just nicer way of converting a string to a list (or using
| the sort function on a list) than making a function for it.
|
| The words.txt here is just a copy of FreeBSD's /usr/share/dict/words
|
| Anyway, the code:
|
| import string

You're importing string, but never use it, so you can omit that line.

|
| # Need a function to convert a string to a list to be
| # able to use the sort() function
| def string2list(s):
| l = []
| for i in range(0, len(s)):
| l.append(s)
| return l

No need to write your own function. list(s) already does the trick.

|
| words = []
| found = []
|
| anagram = raw_input("Find anagrams of word: ")
|
| f = open('words.txt', 'r')
| file = f.read()
| f.close()

I don't have a copy of words.txt, but here's what I would try
(untested):

anagram = raw_input("Find anagrams of word: ")
sorted_anagram = list(sorted(anagram.lower()))
# If you're Python is pre 2.4 ise
# sorted_anagram = list(anagram.lower())
# sorted_anagram.sort() #--sort list in place

found = []
# assuming "words.txt" contains a word per line
# iterate over the lines of the file

for line in open("/path/to/words.txt"):
word = line[:-1] # Get rid of trailing newline
sorted_word = list(sorted(word.lower()))
if sorted_word == sorted_anagram:
found.append(word)
if found:
print "Anagrams of %s:" % anagram
for w in found:
print w
else:
print "No anagrams for %s" % anagram

Thomas Rast · Apr 2, 2005

Tom Carrick said:
In my attempted learning of python, I've decided to recode an old
anagram solving program I made in C++. The C++ version runs in less
than a second, while the python takes 30 seconds.

Indeed, your program can be improved to run about ten times as fast,
which (on my system, with 96274 entries in /usr/share/dict/words) is
below a second.

In general you should try to move the loops into C code, i.e. use
built-in functions instead of long 'for' blocks.

Some comments on your version:

import string

# Need a function to convert a string to a list to be
# able to use the sort() function
def string2list(s): [snipped]

list() achieves the same thing a lot faster.

words = []

You do not need to initialize 'words' here, as you're overwriting it a
few lines afterwards.

found = []

anagram = raw_input("Find anagrams of word: ")

f = open('words.txt', 'r')
file = f.read()
f.close()

words = file.splitlines()

Try to avoid assigning to the names of built-in functions if you can.
Names like 'file', 'list', 'dict', 'map' etc. are often an obvious
choice, but overwriting them means that you don't "just" know what a
later use refers to.

sorted_anagram = anagram.lower()
sorted_anagram = string2list(anagram)
sorted_anagram.sort(lambda x, y: cmp(x, y))

Unless you *really* have to, don't use comparison functions with
sort(), as they slow the operation considerably. In this (as in most)
cases, a plain sorted_anagram.sort() does the trick, and in version
2.4 you can achieve custom sort orders with the optional 'key'
argument. The sorted() built-in also comes in handy here.

while words:
if len(words[0]) == len(sorted_anagram):
wordlist = string2list(words[0])
wordlist.sort(lambda x, y: cmp(x, y))
sorted_wordlist = wordlist
if sorted_anagram == sorted_wordlist:
found.append(words[0])
del words[0]

Avoid this style of looping at all times! Removing the first element
of a list is O(n), so looping through the whole list as above is
O(n**2). In most cases you should use a for loop:

for word in words:
# do something

which is O(n) of course. If you do have to loop destructively, pop()
from the end (which is the default) like so:

while words:
word = words.pop()
# do something

This is also O(n), because removing the *last* element of a list is
O(1) (amortized; I suppose the implementation will occasionally shrink
the underlying array at linear cost).

print "Anagrams of " + anagram + ": "
while found:
print found[0] + " "
del found[0]

I assume you meant not to print a newline between the words, which
'print' does by default. The best solution in that case is "
".join(found).

A better version (2.4+ only):

-- 8< -- 8< --
anagram = raw_input("Find anagrams of word: ")

words = open('words.txt', 'r')

sorted_anagram = sorted(anagram.lower())

found = []

for word in words.read().splitlines():
if len(word) == len(anagram) and sorted(word) == sorted_anagram:
found.append(word)

print "Anagrams of %s: %s" % (anagram, ' '.join(found))
-- >8 -- >8 --

Interestingly, the length comparison makes quite a difference! I
removed it at first, thinking it was unnecessary. Here are some
timings:

* Your original version (for comparison):

$ time echo stop | python2.4 anagram_slow.py
[...]
real 0m9.090s
user 0m8.790s
sys 0m0.013s

* Your version, but with the O(n**2) loop replaced by an O(n) 'for':

$ time echo stop | python2.4 anagram_forloop.py
[...]
real 0m0.221s
user 0m0.134s
sys 0m0.014s

* My version but with the length comparison removed:

$ time echo stop | python2.4 anagram_no_lencmp.py
[...]
real 0m0.408s
user 0m0.353s
sys 0m0.010s

* My version as above:

$ time echo stop | python2.4 anagram_fast.py
[...]
real 0m0.144s
user 0m0.099s
sys 0m0.008s

Hope that helps

- Thomas

stelios xanthakis · Apr 2, 2005

Scott said:
if __name__ == '__main__':
import sys
main(sys.argv[1:] or ['anagrams.py'])

This is *exactly* the kind of testcases I'm looking for to test
the soon-to-be-released pyvm. Great! I'll be back with results.

For now, a fast anagrams.py is
--------------------------------------
import sys

WORDS = [ i.rstrip () for i in file ('/usr/share/dict/words') ]

def findana (anagram):
sorted_anagram = sorted(anagram.lower())
len_anagram = len (anagram)
found = [ word for word in WORDS if len(word)==len_anagram and
sorted(word)==sorted_anagram ]
print "Anagrams of %s: %s" % (anagram, ' '.join(found))

for i in sys.argv [1:]:
findana (i)
-----------------------------------------

And timings....

time python anagram.pyc stop step words lots pool eat fast slow lamp
cold door xyzzy
Anagrams of stop: opts post pots spot stop tops
Anagrams of step: pest pets sept step
Anagrams of words: sword words
Anagrams of lots: lost lots slot
Anagrams of pool: loop polo pool
Anagrams of eat: ate eat tea
Anagrams of fast: fast fats
Anagrams of slow: lows owls slow
Anagrams of lamp: lamp palm
Anagrams of cold: clod cold
Anagrams of door: door odor
Anagrams of xyzzy:

real 0m1.491s
user 0m1.390s
sys 0m0.040s

time pyvm anagram.pyc stop step words lots pool eat fast slow lamp cold
door xyzzy
Anagrams of stop: opts post pots spot stop tops
Anagrams of step: pest pets sept step
Anagrams of words: sword words
Anagrams of lots: lost lots slot
Anagrams of pool: loop polo pool
Anagrams of eat: ate eat tea
Anagrams of fast: fast fats
Anagrams of slow: lows owls slow
Anagrams of lamp: lamp palm
Anagrams of cold: clod cold
Anagrams of door: door odor
Anagrams of xyzzy:

real 0m0.923s
user 0m0.760s
sys 0m0.070s

Scott David Daniels · Apr 2, 2005

Thomas said:
Indeed, your program can be improved to run about ten times as fast, ...
<great stuff>

This problem inspired an "all anagrams" program. Using it I was able
to find the largest anagram group in Shakespeare's first folio in about
the time you originally found anagrams for an individual word.

7: owers = rowse = sower = sowre = swore = woers = worse

====

def words(source):
for line in source:
for word in line.split():
yield word

def all_anagrams(words):
seen = dict()

for word in words:
word = word.lower()
if word not in seen:
dorw = ''.join(sorted(word))
try:
seen[dorw].append(word)
except KeyError:
seen[dorw] = [word]
if word == dorw:
continue
seen[word] = ()
for group in seen.itervalues():
if len(group) > 1:
yield -len(group), sorted(group) # conveniently sortable

def main(sources):
for filename in sources:
dictionary = open(filename, 'r')
print "All anagrams from %s:" % filename
try:
for nsize, group in sorted(all_anagrams(words(dictionary))):
print '%2s: %s' % (-nsize, ' = '.join(group))
finally:
dictionary.close()
print

if __name__ == '__main__':
import sys
main(sys.argv[1:] or ['anagrams.py'])

Marc 'BlackJack' Rintsch · Apr 2, 2005

Tom Carrick said:
In my attempted learning of python, I've decided to recode an old
anagram solving program I made in C++. The C++ version runs in less
than a second, while the python takes 30 seconds. I'm not willing to
think it's just python being slow, so I was hoping someone could find
a faster way of doing this. Also, I was wondering if there was a more
builtin, or just nicer way of converting a string to a list (or using
the sort function on a list) than making a function for it.

The words.txt here is just a copy of FreeBSD's /usr/share/dict/words

Here's my attempt which builds an anagram dictionary ("sorted word" ->
list of anagrams) for fast lookup of anagrams::

#!/usr/bin/env python2.4
from itertools import imap, ifilter

WORDS = '/usr/share/dict/words'

def make_anagram_map(words):
anagram_map = dict()
for word in imap(lambda w: w.strip().lower(), words):
sorted_word = ''.join(sorted(list(word)))
anagram_map.setdefault(sorted_word, list()).append(word)

return dict(ifilter(lambda x: len(x[1]) > 1, anagram_map.iteritems()))

def main():
words_file = open(WORDS, 'r')
anagram_map = make_anagram_map(words_file)
words_file.close()

while True:
word = raw_input('Find anagrams of word (just enter to end): ')
if not word:
break
try:
print anagram_map[''.join(sorted(list(word.strip().lower())))]
except KeyError:
print 'No anagrams found for %r' % word

# # Print all anagrams sorted by number of anagrams.
# print '\n'.join(map(str, sorted(anagram_map.values(), key=len)))
# print len(anagram_map)

if __name__ == '__main__':
main()

Ciao,
Marc 'BlackJack' Rintsch

Steven Bethard · Apr 2, 2005

Marc said:
def make_anagram_map(words):
anagram_map = dict()
for word in imap(lambda w: w.strip().lower(), words):
sorted_word = ''.join(sorted(list(word)))
anagram_map.setdefault(sorted_word, list()).append(word)

return dict(ifilter(lambda x: len(x[1]) > 1, anagram_map.iteritems()))

Or if you're afraid of map and filter like me, you can try:

def make_anagram_map(words):
anagram_map = {}
for word in (w.strip().lower() for w in words):
anagram_map.setdefault(''.join(sorted(word)), []).append(word)
return dict(sortedword_wordlist
for sortedword_wordlist in anagram_map.iteritems()
if len(sortedword_wordlist[1]) > 1)

py> make_anagram_map(['owers', 'pest', 'rowse', 'pets', 'sower', 'step'])
{'epst': ['pest', 'pets', 'step'], 'eorsw': ['owers', 'rowse', 'sower']}

STeVe

I have to finish this code for my assignment but I cant figure out how to solve it	1	Jun 27, 2023
Issue with textbox script?	0	Sep 5, 2022
Pygame project sound mixing issue...	4	Dec 29, 2023
HMAC encription issue	4	Aug 31, 2013
I Need Fix In Code	1	Apr 12, 2023
Trouble with prediction code, for the life of me I can't figure out why it isnt running properly. Help would be appreciated.	0	Jul 8, 2023
Range / empty list issues??	1	Dec 11, 2023
Translater + module + tkinter	1	Feb 16, 2023

Performance issue

Tom Carrick

Irmen de Jong

Marc 'BlackJack' Rintsch

Marc 'BlackJack' Rintsch

vincent wehren

Thomas Rast

stelios xanthakis

Scott David Daniels

Marc 'BlackJack' Rintsch

Steven Bethard

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads