Sorry guys, I'm "only" able to see this (with the Python versions an end
user can download):
[snip timeit results]
While you have been all doom and gloom and negativity that Python has
"destroyed" Unicode, I've actually done some testing. It seems that,
possibly, there is a performance regression in the "replace" method.
This is on Debian squeeze, using the latest rc version of 3.3, 3.3.0rc3:
py> timeit.repeat("('b'*1000).replace('b', 'a')")
[28.308280900120735, 29.012173799797893, 28.834429003298283]
Notice that Unicode doesn't come into it, they are pure ASCII strings.
Here's the same thing using 3.2.2:
py> timeit.repeat("('b'*1000).replace('b', 'a')")
[3.4444618225097656, 3.147739887237549, 3.132185935974121]
That's a factor of 9 slowdown in 3.3, and no Unicode. Obviously Python
has "destroyed ASCII".
(I get similar slowdowns for Unicode strings too, so clearly Python hates
all strings, not just ASCII.)
Now, for irrelevant reasons, here I swapped to Centos.
[steve@ando ~]$ python2.7 -m timeit "'b'*1000"
1000000 loops, best of 3: 0.48 usec per loop
[steve@ando ~]$ python3.2 -m timeit "'b'*1000"
1000000 loops, best of 3: 1.3 usec per loop
[steve@ando ~]$ python3.3 -m timeit "'b'*1000"
1000000 loops, best of 3: 0.397 usec per loop
Clearly 3.3 is the fastest at string multiplication, at least for this
trivial example. Just to prove that the result also applies to Unicode:
[steve@ando ~]$ python3.3 -m timeit "('ä½ '*1000)"
1000000 loops, best of 3: 1.38 usec per loop
Almost identical to 3.2. And the reason it is slower than the 3.3 test
using 'b' above is almost certainly because the string uses four times
more memory:
[steve@ando ~]$ python3.3 -m timeit "('abcd'*1000)"
1000000 loops, best of 3: 0.919 usec per loop
So a little slower that the pure-ASCII version for the same amount of
memory, but not significantly so.
But add a call to replace, and things are very different:
[steve@ando ~]$ python2.7 -m timeit -s "s = 'b'*1000" "s.replace('b', 'a')"
100000 loops, best of 3: 9.3 usec per loop
[steve@ando ~]$ python3.2 -m timeit -s "s = 'b'*1000" "s.replace('b', 'a')"
100000 loops, best of 3: 5.43 usec per loop
[steve@ando ~]$ python3.3 -m timeit -s "s = 'b'*1000" "s.replace('b', 'a')"
100000 loops, best of 3: 18.3 usec per loop
Three times slower, even for pure-ASCII strings. I get comparable results
for Unicode. Notice how slow Python 2.7 is:
[steve@ando ~]$ python2.7 -m timeit -s "s = u'ä½ '*1000" "s.replace(u'ä½ ', u'a')"
10000 loops, best of 3: 65.6 usec per loop
[steve@ando ~]$ python3.2 -m timeit -s "s = 'ä½ '*1000" "s.replace('ä½ ', 'a')"
100000 loops, best of 3: 2.79 usec per loop
[steve@ando ~]$ python3.3 -m timeit -s "s = 'ä½ '*1000" "s.replace('ä½ ', 'a')"
10000 loops, best of 3: 23.7 usec per loop
Even with the performance regression, it is still over twice as fast as
Python 2.7.
Nevertheless, I think there is something here. The consequences are nowhere
near as dramatic as jmf claims, but it does seem that replace() has taken a
serious performance hit. Perhaps it is unavoidable, but perhaps not.
If anyone else can confirm similar results, I think this should be raised as
a performance regression.