Python speed-up

G

Guyon Morée

Hi all,

I am working on a Huffman encoding exercise, but it is kinda slow. This is
not a big problem, I do this to educate myself :)

So I started profiling the code and the slowdown was actually taking place
at places where I didn't expect it.

after I have created a lookup-table-dictionary with encodings like
{'d':'0110', 'e':'01' etc} to encode the original text like this:

for c in original_text:
encoded_text += table[c]

I can appreciate the length of the text is big, but this isn't a problem at
character frequency counting for eaxample. Why is this slow?


the second place the slowdown occurs is when I ty to chop the encoded string
of 0's and 1's in pieces of eigth like this:

chr_list = [] # resulting list
while 1:
chr_list.append(encoded_text[:8]) # take 8 bits from string and put them
in the list
encoded_text = encoded_text[8:] # truncate the string
if len(encoded_text) < 8: # end of string reached
chr_list.append(encoded_text)
break


I hope someone can tell me why these are slow.


regards,

Guyon
 
J

Jeff Epler

The statements
s += t
and
s = s[j:]
(strings s and t; integer j) both take time related to the length of s.
The first creates a new string copies len(s)+len(t) characters into it,
and then assigns the new string to s. The second creates a new string,
copies len(s)-j characters into it, and assigns the new string to s.

This is mostly a consequence of string immutability, though it's possible
that sneaky optimizations might be able to make these operations faster
in a future version of Python.

You might write the first loop as
encoded_text = "".join([table[c] for c in original_text])
which can be written out as
encoded_text_parts = []
for c in original_text:
encoded_text_parts.append(c)
encoded_text = "".join(encoded_text_parts)
Accumulating string parts in a list and then joining them is one common
way to increase performance of code that repeatedly joins strings.

For the second, consider writing
chr_list = [encoded_text[i:i+8] for i in range(0, len(encoded_text), 8)]
which can be written out as
chr_list = []
for i in range(0, len(encoded_text), 8):
chr_list.append(encoded_text[i:i+8])
Instead of whittling away encoded_text, just take the 8 characters
you're interested in each time.

Jeff

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.1 (GNU/Linux)

iD8DBQFBUY3RJd01MZaTXX0RAsDsAJ44bewY1SoWBbyef65jgI8en+80LwCeO5P1
LW7C44lnWO6gfwIqjDh2TBo=
=5rkm
-----END PGP SIGNATURE-----
 
T

Thomas Guettler

Am Wed, 22 Sep 2004 16:06:04 +0200 schrieb Guyon Morée:
Hi all,

I am working on a Huffman encoding exercise, but it is kinda slow. This is
not a big problem, I do this to educate myself :)

So I started profiling the code and the slowdown was actually taking place
at places where I didn't expect it.

after I have created a lookup-table-dictionary with encodings like
{'d':'0110', 'e':'01' etc} to encode the original text like this:


for c in original_text:
encoded_text += table[c]

Hi Guyon,

this is slow. Try this:

e_t=[]
for c in original_text:
e_t.append(table[c])
e_t=''.join(e_t)


Your solutions creates a new string for every +=.

HTH,
Thomas
 
P

Phil Frost

String contatination in Python might be slower than you think it is,
because it requires building a new string, which involves a memory
allocation and copy. Suggested reading:
<http://www.skymind.com/~ocrow/python_string/>

For the seccond part, try replacing

encoded_text = encoded_text[8:]

with

del encoded_text[:8]

Or, use an index variable and don't mutate the list at all.
 
D

Des Small

Guyon Morée said:
Hi all,

I am working on a Huffman encoding exercise, but it is kinda slow. This is
not a big problem, I do this to educate myself :)

So I started profiling the code and the slowdown was actually taking place
at places where I didn't expect it.

after I have created a lookup-table-dictionary with encodings like
{'d':'0110', 'e':'01' etc} to encode the original text like this:

for c in original_text:
encoded_text += table[c]

I probably shouldn't be guessing like this, but I know for sure that
Python strings are immutable, and that this has to allocate a new
string every time through the loop.

How about

encoded_text = ''.join([table[c] for c in original_text]) # untested, beware

instead?
I can appreciate the length of the text is big, but this isn't a problem at
character frequency counting for eaxample. Why is this slow?


the second place the slowdown occurs is when I ty to chop the encoded string
of 0's and 1's in pieces of eigth like this:

chr_list = [] # resulting list
while 1:
chr_list.append(encoded_text[:8]) # take 8 bits from string and put them
in the list
encoded_text = encoded_text[8:] # truncate the string
if len(encoded_text) < 8: # end of string reached
chr_list.append(encoded_text)
break

Again with the mutable strings! Probably wizards can improve on the
first thing that came to my mind:

chr_list = [encoded_text[i:i+8]
for i in range(0, len(encoded_text), 8)] # Tested, but carelessly

At the very least it's shorter my way.
I hope someone can tell me why these are slow.

Des
isn't sure.
 
E

Eric Brunel

Guyon said:
Hi all,

I am working on a Huffman encoding exercise, but it is kinda slow. This is
not a big problem, I do this to educate myself :)

So I started profiling the code and the slowdown was actually taking place
at places where I didn't expect it.

after I have created a lookup-table-dictionary with encodings like
{'d':'0110', 'e':'01' etc} to encode the original text like this:

for c in original_text:
encoded_text += table[c]
> I can appreciate the length of the text is big, but this isn't a problem at
> character frequency counting for eaxample. Why is this slow?

Usual suspect: concatenating to strings reallocates a string each time, so you'd
better do:

l = []
for c in original_text:
l.append(table[c])
encoded_text = ''.join(l)
the second place the slowdown occurs is when I ty to chop the encoded string
of 0's and 1's in pieces of eigth like this:

chr_list = [] # resulting list
while 1:
chr_list.append(encoded_text[:8]) # take 8 bits from string and put them
in the list
encoded_text = encoded_text[8:] # truncate the string
if len(encoded_text) < 8: # end of string reached
chr_list.append(encoded_text)
break

Again, the statement encoded_text = encoded_text[8:] re-allocates the string
each time.

Your whole code chunk may be replaced by a list comprehension:

chr_list = [encoded_text[i:i+8] for i in range(0, len(encoded_text), 8)]

which is likely to be faster, since it only allocates the space for each 8
character chunk.

HTH
 
P

Peter Otten

Phil said:
For the seccond part, try replacing

encoded_text = encoded_text[8:]

with

del encoded_text[:8]

Die you try it?
encoded_text = "thus quoth the raven"
del encoded_text[:8]
Traceback (most recent call last):
File "<stdin>", line 1, in ?
TypeError: object doesn't support slice deletion

The immutability of strings slows down the OP's approach and makes your
remedy impossible.

Peter
 
A

Alex Martelli

Guyon Morée said:
I hope someone can tell me why these are slow.

I see many others have offered excellent answers: they all boil down
to...:

"Strings are immutable; each time you code
astring = <some operation on> astring
you're making a new string object (and throwing away the old one, unless
it's also known by other names), including the times in which you spell
this as:
astring <someop>= whatever
which, since astring is immutable, is only a shortcut for the former."

Putting together a big string with a loop of 'bigstring+=piece' is
perhaps the best-known performance trap in Python. Python 2.4 has
turned somersaults to lessen the penalty you pay for this, but it's
still pretty slow compared to "accumulate pieces in a list, ''.join the
list when it's done". There's really nothing better than this _in
general_. For your specific case, since you know in advance how long
you want the resulting sequence to be, you have options: make
encoded_text a list of the same length as it will be at the end, and
loop filling it. In other words, you might be able to shave a little
time by coding, for example:

encoded_text = list(original_text)
for i, c in enumerate(encoded_text):
encoded_text = table[c]

another possibility that might turn out to be faster (particularly in
2.4) is:

encoded_text = map(table.__getitem__, original_text)

or similarly, in 2.3, with table.get (table.__getitem__ is slower than
it should be in 2.3... fortunately 2.4 has fixed that issue). I'm not
sure which of these options will be faster: time them with timeit.py,
that's what it's _there_ for!-). Anyway, encoded_text will end up as a
list, so you'll ''.join it when you're done.

"consuming a string a piece at a time from the front" is a lesser-known
trap -- interestingly, it would apply to lists as well, for different
reasons (namely, that a list is a compact array in memory, so removing
some stuff from the front requires "sliding down" all of the rest, an
operation taking time proportional to the sequence length, O(N) in
common parlance -- consuming a whole sequence that way is therefore O(N
squared), i.e., pretty bad). The list-comprehension-of-slices solution
I've seen more than one responder propose is in fact quite a good one
(it's O(1) per step, O(N) for the whole 'consumption' operation).

You might also consider whether you want encoded_text to be a string at
all, rather than a list or array.array of 1's and 0's. If the latter,
then 'encoded_text.extend(table[c])' as the first loop's body would do
it -- and you could make the values of table into tuples of numbers,
rather than strings, optionally.


Alex
 
G

Gerrit

Alex said:
Putting together a big string with a loop of 'bigstring+=piece' is
perhaps the best-known performance trap in Python. Python 2.4 has
turned somersaults to lessen the penalty you pay for this, but it's
still pretty slow compared to "accumulate pieces in a list, ''.join the
list when it's done". There's really nothing better than this _in
general_.

Why isn't cStringIO faster than concatenating strings?

Using python2.4:

$ python timeit.py -s 'from cStringIO import StringIO; s=StringIO()' "s.write('foo')"
1000000 loops, best of 3: 0.555 usec per loop

$ python timeit.py -s 's=""' "s+='foo'"
1000000 loops, best of 3: 0.383 usec per loop

$ python timeit.py -s 'L=[]' "L.append('foo')"
1000000 loops, best of 3: 0.32 usec per loop

$ python timeit.py -s 'from StringIO import StringIO; s=StringIO()' "s.write('foo')"
100000 loops, best of 3: 3.19 usec per loop

I was about to advise cStringIO as being much faster than concatenating
strings, but just before I was about to send the e-mail, I spied with my
little eye that the numbers were swapped - concatenating strings is
actually *faster* than using cStringIO! What's happening?

yours,
Gerrit Holl.
 
S

Skip Montanaro

gerrit> Why isn't cStringIO faster than concatenating strings?

gerrit> Using python2.4:
...

Because in 2.4 common case string concatenation gets a big speed boost:

% python2.3 ~/local/bin/timeit.py -s 's=""' "s+='foo'"
10000 loops, best of 3: 68.3 usec per loop
% python2.4 ~/local/bin/timeit.py -s 's=""' "s+='foo'"
1000000 loops, best of 3: 0.855 usec per loop

Skip
 
A

Alex Martelli

Gerrit said:
Why isn't cStringIO faster than concatenating strings?

Using python2.4:

Python 2.4 did reduce the cost of the bigstring+=littlepiece trap. But
I do agree there's nevertheless something strange here. cStringIO is
using a very different strategy, a buffer that doubles and gets
realloc'd every time it would fill up -- maybe that's tripping up its
performance...?


Alex
 
G

Gerrit

Skip said:
gerrit> Why isn't cStringIO faster than concatenating strings?

gerrit> Using python2.4:
...

Because in 2.4 common case string concatenation gets a big speed boost:

% python2.3 ~/local/bin/timeit.py -s 's=""' "s+='foo'"
10000 loops, best of 3: 68.3 usec per loop
% python2.4 ~/local/bin/timeit.py -s 's=""' "s+='foo'"
1000000 loops, best of 3: 0.855 usec per loop

Ah. Am I correct to conclude that StringIO should either be rewritten
concatenating strings, or be deprecated, or both?

Gerrit.
 
B

Bengt Richter

Why isn't cStringIO faster than concatenating strings?
You are timing extra overheads besides the actual concatenation of the data, I beleive.
Using python2.4:

$ python timeit.py -s 'from cStringIO import StringIO; s=StringIO()' "s.write('foo')"
1000000 loops, best of 3: 0.555 usec per loop
It should improve a little if you hoist out the s.write attribute lookup handicap
from the loop, e.g., with swrite = s.write and then use swrite('foo')
but you are still timing a function call in the mix.
$ python timeit.py -s 's=""' "s+='foo'"
1000000 loops, best of 3: 0.383 usec per loop
It should slow down a bit if you give it the same function-call handicap as swrite
one way or another.
$ python timeit.py -s 'L=[]' "L.append('foo')"
1000000 loops, best of 3: 0.32 usec per loop

$ python timeit.py -s 'from StringIO import StringIO; s=StringIO()' "s.write('foo')"
100000 loops, best of 3: 3.19 usec per loop

I was about to advise cStringIO as being much faster than concatenating
strings, but just before I was about to send the e-mail, I spied with my
little eye that the numbers were swapped - concatenating strings is
actually *faster* than using cStringIO! What's happening?
Apples vs oranges, partly, and Skip mentions s+='xxx' optimization for 2.4.

Regards,
Bengt Richter
 
A

Alex Martelli

Gerrit said:
Ah. Am I correct to conclude that StringIO should either be rewritten
concatenating strings, or be deprecated, or both?

StringIO is precious in many cases which have nothing to do with
"concatenating string" but are rather "easily capturing in memory the
input/output of some piece of code which writes to / reads from a
file-like object", so deprecating it would be unwarranted. A little
rewrite might perhaps be warranted, admittedly;-).


Alex
 
R

Raymond Hettinger

[Skip Montanaro]
in 2.4 common case string concatenation gets a big speed boost:

% python2.3 ~/local/bin/timeit.py -s 's=""' "s+='foo'"
10000 loops, best of 3: 68.3 usec per loop
% python2.4 ~/local/bin/timeit.py -s 's=""' "s+='foo'"
1000000 loops, best of 3: 0.855 usec per loop
[Gerrit]
Ah. Am I correct to conclude that StringIO should either be rewritten
concatenating strings, or be deprecated, or both?

Nope. StringIO correctly makes heavy use of the ''.join() idiom which is still
preferred += for reasons of portability and speed. The purpose of the +=
optimization was to make it less catastrophic in places where it does occur.
Guido was clear that ''.join() remains king.

I will submit a patch to tighten up the code in StringIO.write() which needs
localized variables and elimination of an unnecessary step.


Raymond Hettinger
 
J

Jeremy Fincher

Gerrit said:
Why isn't cStringIO faster than concatenating strings?

It is, when you're doing it repetitively. Using cStringIO or str.join
is O(n) in the number of strings, whereas using += repetitively is
O(n**2) in the number of strings.
Using python2.4:

$ python timeit.py -s 'from cStringIO import StringIO; s=StringIO()' "s.write('foo')"
1000000 loops, best of 3: 0.555 usec per loop

$ python timeit.py -s 's=""' "s+='foo'"
1000000 loops, best of 3: 0.383 usec per loop

$ python timeit.py -s 'L=[]' "L.append('foo')"
1000000 loops, best of 3: 0.32 usec per loop

$ python timeit.py -s 'from StringIO import StringIO; s=StringIO()' "s.write('foo')"
100000 loops, best of 3: 3.19 usec per loop

Don't let the "100000 loops" confuse you, in this case you're only
appending *one* string, and the significantly lower constant time of
str.__iadd__ (aka +=) shows more than the lower complexity of the
other methods.
I was about to advise cStringIO as being much faster than concatenating
strings, but just before I was about to send the e-mail, I spied with my
little eye that the numbers were swapped - concatenating strings is
actually *faster* than using cStringIO! What's happening?

Perhaps this will be more elucidating:

$ python ~/timeit.py 's = ""' 'for i in xrange(1):' ' s += "foo"'
100000 loops, best of 3: 4.38 usec per loop

$ python ~/timeit.py -s 'from cStringIO import StringIO' 'sio =
StringIO()' 'for i in xrange(1):' ' sio.write("foo")' 's =
sio.getvalue()'
100000 loops, best of 3: 9.77 usec per loop

$ python ~/timeit.py 's = ""' 'for i in xrange(10):' ' s += "foo"'
10000 loops, best of 3: 15.8 usec per loop

$ python ~/timeit.py -s 'from cStringIO import StringIO' 'sio =
StringIO()' 'for i in xrange(10):' ' sio.write("foo")' 's =
sio.getvalue()'
10000 loops, best of 3: 36.6 usec per loop

$ python ~/timeit.py 's = ""' 'for i in xrange(100):' ' s += "foo"'
1000 loops, best of 3: 154 usec per loop

$ python ~/timeit.py -s 'from cStringIO import StringIO' 'sio =
StringIO()' 'for i in xrange(100):' ' sio.write("foo")' 's =
sio.getvalue()'
1000 loops, best of 3: 264 usec per loop

$ python ~/timeit.py 's = ""' 'for i in xrange(1000):' ' s += "foo"'
100 loops, best of 3: 2.23e+03 usec per loop

$ python ~/timeit.py -s 'from cStringIO import StringIO' 'sio =
StringIO()' 'for i in xrange(1000):' ' sio.write("foo")' 's =
sio.getvalue()'
100 loops, best of 3: 2.62e+03 usec per loop

$ python ~/timeit.py 's = ""' 'for i in xrange(10000):' ' s += "foo"'
10 loops, best of 3: 7.14e+05 usec per loop

$ python ~/timeit.py -s 'from cStringIO import StringIO' 'sio =
StringIO()' 'for i in xrange(10000):' ' sio.write("foo")' 's =
sio.getvalue()'
10 loops, best of 3: 2.66e+04 usec per loop

Do be sure not to use -s before the 's = ""' argument or before the
'sio = StringIO()' argument, otherwise you'll prove your point well,
but not fairly :)

Jeremy
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,206
Messages
2,571,069
Members
47,678
Latest member
Aniruddha Das

Latest Threads

Top