Cryptographically random numbers

T

Tuvas

I've actually done the tests on this one, it's actually faster to use
the += than a list, odd as it may sound. I ran into this one a while
back. The best way to do it is to build an array from scratch, fill the
array, and then join it, but I didn't have time to do it that way...
 
S

Steven D'Aprano

in 2.4 and later, += on strings does the operation in place when
possible. this means that += is often faster than append/join.

As much as we all love optimizations, I'm unhappy about this one. It used
to be a real simple rule: to append lots of strings, use append/join. That
worked for any Python you used.

But now code that runs really fast in CPython 2.4 will run like a dog in
Jython or CPython 2.3. Python's implementation independence is weakened.

Worse, because nobody really knows what "often" means (or at least, those
who know haven't said) a conscientious Python developer who wants fast
performance now has to measure all his string appending code using both
idioms to be sure that *this particular case* is/isn't one of the "often".

Is there some heuristic for telling when CPython can do the in-place
appending, and when it can't? If we were talking about a tiny (1% say)
performance penalty for getting it wrong, then it wouldn't matter, but the
penalty for using += when the optimization doesn't apply is severe.
 
P

Paul Rubin

Tuvas said:
I've actually done the tests on this one, it's actually faster to use
the += than a list, odd as it may sound.

Frederik explained the reason; there's an optimization in Python 2.4
that I'd forgotten about, for that specific case. It's not in earlier
versions. It's a bit fragile in 2.4:

a = ''
for x in something:
a += g(x)

is fast, but if a is aliased, Python can't do the optimization, so

a = ''
b = a
for x in something:
a += g(x)

is slow. Figuring out which case to use relies on CPython's reference
counting storage allocator (the interpreter keeps track of how many
pointers there are to any given object) and so the optimization may
not be feasible at all in other implementations which use different
storage allocation strategies (e.g. Lisp-style garbage collection).

All in all I think it's best to use a completely different approach
(something like StringBuffer) but my effort to start a movement here
on clpy a couple months ago didn't get anywhere.
 
T

Tim Hochberg

Paul said:
Frederik explained the reason; there's an optimization in Python 2.4
that I'd forgotten about, for that specific case. It's not in earlier
versions. It's a bit fragile in 2.4:

a = ''
for x in something:
a += g(x)

is fast, but if a is aliased, Python can't do the optimization, so

a = ''
b = a
for x in something:
a += g(x)

is slow.

Is this really true? After the first time through the loop, 'a' won't be
aliased any more since strings are immutable. After that the loops
should be equivalent. I tried this out to see if I could see a timing
difference, in case I was missing something, with Python 2.4.1, the
following two snippets timed essentially the same for N up to 2**20 (I
didn't try any higher):

def concat1():
a = ''
for x in ' '*N:
a += x
return a

def concat2():
a = ''
b = a
for x in ' '*N:
a += x
return a

Regards,


-tim
 
P

Paul Rubin

Is this really true? After the first time through the loop, 'a' won't
be aliased any more since strings are immutable.

Hmm, ok, there was some previous discussion of this problem and I
guess I mis-remembered it. Thanks. I think there is still a fairly
natural case where the optimization doesn't work. Anyone know?
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,289
Messages
2,571,435
Members
48,121
Latest member
ColinHibne

Latest Threads

Top