Performance of int/long in Python 3

R

rusi

Mark,

Thanks for asking this question.

It seems to me that jmf *might* be moving towards a vindicated
position.  There is some interest now in duplicating, understanding and
(hopefully!) extending his test results, which can only be a Good Thing
- whatever the outcome and wherever the facepalm might land.

Whew! Very reassuring to hear some sanity in this discussion at long
last!
 
S

Steven D'Aprano

It seems to me that jmf *might* be moving towards a vindicated position.
There is some interest now in duplicating, understanding and
(hopefully!) extending his test results, which can only be a Good Thing
- whatever the outcome and wherever the facepalm might land.

Some interest "now"? Oh please.

http://mail.python.org/pipermail/python-list/2012-September/629810.html

Mark Lawrence even created a bug report to track this, also back in
September.

http://bugs.python.org/issue16061

I'm sure you didn't intend to be insulting, but some of us *have* taken
JMF seriously, at least at first. His repeated overblown claims of how
Python is destroying Unicode, his lack of acknowledgement that other
people have seen string handling *speed up* not slow down, and his
refusal to assist in diagnosing this performance regression except to
repeatedly quote the same artificial micro-benchmarks over and over again
have lost him whatever credibility he started with.

This feature is a *memory optimization*, not a speed optimization, and
yet as a side-effect of saving memory, it also saves time. Real-world
benchmarks of actual applications demonstrate this. One or two trivial
slowdowns of artificial micro-benchmarks simply are not important, even
if they are genuine. I believe they are genuine, but likely operating
system and hardware dependent.
 
M

Mark Lawrence

Mark,

Thanks for asking this question.

It seems to me that jmf *might* be moving towards a vindicated
position. There is some interest now in duplicating, understanding and
(hopefully!) extending his test results, which can only be a Good Thing
- whatever the outcome and wherever the facepalm might land.

The position that is already documented in PEP393, how so?
However, as you rightly point out, there is only value in following this
through if the functionality is (at least near) 100% correct. I am sure
there are some that will disagree but in most cases, functionality is
the primary requirement and poor performance can be managed initially
and fixed in due time.

I've already raised an issue about performance and Neil Hodgson has
raised a new one. To balance this out perhaps we should have counter
issues asking for the amount of memory being used to be increased to old
levels and the earlier buggier behaviour of Python to be reintroduced?
Swings and roundabouts?
 
S

Steve Simmons

Some interest "now"? Oh please.

http://mail.python.org/pipermail/python-list/2012-September/629810.html

Mark Lawrence even created a bug report to track this, also back in
September.

http://bugs.python.org/issue16061

I'm sure you didn't intend to be insulting, but some of us *have* taken
JMF seriously, at least at first. His repeated overblown claims of how
Python is destroying Unicode, his lack of acknowledgement that other
people have seen string handling *speed up* not slow down, and his
refusal to assist in diagnosing this performance regression except to
repeatedly quote the same artificial micro-benchmarks over and over again
have lost him whatever credibility he started with.

This feature is a *memory optimization*, not a speed optimization, and
yet as a side-effect of saving memory, it also saves time. Real-world
benchmarks of actual applications demonstrate this. One or two trivial
slowdowns of artificial micro-benchmarks simply are not important, even
if they are genuine. I believe they are genuine, but likely operating
system and hardware dependent.
First off, no insult intended and I haven't been part of this list long
enough to be fully immersed in the history of this so I'm sure there are
events of which I am unaware.

However, it seems to me that, for whatever reason, JMF has reached the
end of his capacity (time, capability, patience, ...) to extend his
benchmarks into a more credible test set - i.e. one that demonstrates an
acceptably 'real life' profile with a marked drop in performance. As a
community we have choices. We can brand him a Troll - and some of his
behaviour may mandate that - or we can put some additional energy into
drawing this 'disagreement' to a more amicable and constructive conclusion.

My post was primarily aimed at recognising the work that people like
Mark, Neil and others have done to move the problem forward and was
intended to help shift the focus to a more productive approach. Again,
my apologies if it was ill timed or ill-directed.

Steve Simmons
 
M

Mark Lawrence

My post was primarily aimed at recognising the work that people like
Mark, Neil and others have done to move the problem forward and was
intended to help shift the focus to a more productive approach. Again,
my apologies if it was ill timed or ill-directed.

Steve Simmons

I must point out that I only raised issue 16061 based on data provided
by Steven D'Aprano and Serhiy Storchaka.
 
S

Steve Simmons

I've already raised an issue about performance and Neil Hodgson has
raised a new one. Recognised in a separate post
To balance this out perhaps we should have counter issues asking for
the amount of memory being used to be increased to old levels and the
earlier buggier behaviour of Python to be reintroduced? Swings and
roundabouts?
I don't think I came anywhere near suggesting that we should regress
correct functionality or memory usage improvements. I just don't
believe that we can't have good performance alongside it.

Steve S
 
E

Ethan Furman

First off, no insult intended and I haven't been part of this list long enough to be fully immersed in the history of
this so I'm sure there are events of which I am unaware.

Yes, that would be his months of trollish behavior on this subject.

However, it seems to me that, for whatever reason, JMF has reached the end of his capacity

His capacity, maybe; his time? Not by a long shot. I am positive we will continue to see his uncooperative, bratty*
behavior continue ad nauseum.
 
J

jmfauth

On Tue, 02 Apr 2013 11:58:11 +0100, Steve Simmons wrote:

I'm sure you didn't intend to be insulting, but some of us *have* taken
JMF seriously, at least at first. His repeated overblown claims of how
Python is destroying Unicode ...


Sorrry I never claimed this, I'm just seeing on how Python is becoming
less Unicode friendly.
This feature is a *memory optimization*, not a speed optimization,

I totaly agree, and utf-8 is doing that with a great art. (see Neil
Hodgson
comment).
(Do not interpret this as if i'm saying "Python should use utf-8, as
I'have read).

jmf
 
E

Ethan Furman

Recognised in a separate post


I don't think I came anywhere near suggesting that we should regress correct functionality or memory usage
improvements. I just don't believe that we can't have good performance alongside it.

It's always a trade-off between time and memory.

However, as it happens, there are plenty of instances where the new FSR is faster -- and this in real world code, not
useless micro-benchmarks.

Simmons (too many Steves!), I know you're new so don't have all the history with jmf that many of us do, but consider
that the original post was about numbers, had nothing to do with characters or unicode *in any way*, and yet jmf still
felt the need to bring unicode up.
 
M

Mark Lawrence

Sorrry I never claimed this, I'm just seeing on how Python is becoming
less Unicode friendly.

Please explain this. I see no justification for this comment. How can
an implementation that fixes bugs be less Unicode friendly than its
earlier, buggier equivalents?
 
R

rusi

Simmons (too many Steves!), I know you're new so don't have all the history with jmf that many
of us do, but consider that the original post was about numbers, had nothing to do with
characters or unicode *in any way*, and yet jmf still felt the need to bring unicode up.

Just for reference, here is the starting para of Chris' original mail
that started this thread.
The Python 3 merge of int and long has effectively penalized
small-number arithmetic by removing an optimization. As we've seen
from PEP 393 strings (jmf aside), there can be huge benefits from
having a single type with multiple representations internally. Is
there value in making the int type have a machine-word optimization in
the same way?

ie it mentions numbers, strings, PEP 393 *AND jmf.* So while it is
true that jmf has been butting in with trollish behavior into
completely unrelated threads with his unicode rants, that cannot be
said for this thread.
 
R

rusi

Sorrry I never claimed this, I'm just seeing on how Python is becoming
less Unicode friendly.

jmf: I suggest you try to use less emotionally loaded and more precise
language if you want people to pay heed to your technical observations/
contributions.
In particular, while you say unicode, your examples always (as far as
I remember) refer to BMP.
Also words like 'friendly' are so emotionally charged that people stop
being friendly :)

So may I suggest that you rephrase your complaint as
"I am seeing python is becoming poorly performant on BMP-chars at the
expense of correct support for the whole (6.0?) charset"

(assuming thats what you want to say)

In any case PLEASE note that 'performant' and 'correct' are different
for most practical purposes.
If you dont respect this semantics, people are unlikely to pay heed to
your complaints.
 
J

jmfauth

Just for reference, here is the starting para of Chris' original mail
that started this thread.


ie it mentions numbers, strings, PEP 393 *AND jmf.*  So while it is
true that jmf has been butting in with trollish behavior into
completely unrelated threads with his unicode rants, that cannot be
said for this thread.

-----

That's because you did not understand the analogy, int/long <-> FSR.

One another illustration,
.... if 0 < i <= 100:
.... return i + 10 + 10 + 10 - 10 - 10 - 10 + 1
.... elif 100 < i <= 1000:
.... return i + 100 + 100 + 100 + 100 - 100 - 100 - 100 - 100
+ 1
.... else:
.... return i + 1
....

Do it work? yes.
Is is "correct"? this can be discussed.

Now replace i by a char, a representent of each "subset"
of the FSR, select a method where this FST behave badly
and take a look of what happen.

timeit.repeat("'a' * 1000 + 'z'") [0.6532032148133153, 0.6407248807756699, 0.6407264561239894]
timeit.repeat("'a' * 1000 + '9'") [0.6429508479509245, 0.6242782443215589, 0.6240490311410927]

timeit.repeat("'a' * 1000 + '€'") [1.095694927496563, 1.0696347279235603, 1.0687741939041082]
timeit.repeat("'a' * 1000 + 'ẞ'") [1.0796421281222877, 1.0348612767961853, 1.035325216876231]
timeit.repeat("'a' * 1000 + '\u2345'") [1.0855414137412112, 1.0694677410017164, 1.0688096392412945]

timeit.repeat("'Å“' * 1000 + '\U00010001'") [1.237314015362017, 1.2226262553064657, 1.21994619397816]
timeit.repeat("'Å“' * 1000 + '\U00010002'")
[1.245773635836997, 1.2303978424029651, 1.2258257877430765]

Where does it come from? Simple, the FSR breaks the
simple rules used in all coding schemes (unicode or not).
1) a unique set of chars
2) the "same" algorithm for all chars.

And again that's why utf-8 is working very smoothly.

The "corporates" which understood this very well and
wanted to incorporate, let say, the used characters
of the French language had only the choice to
create new coding schemes (eg mac-roman, cp1252).

In unicode, the "latin-1" range is real plague.

After years of experience, I'm still fascinated to see
the corporates has solved this issue easily and the "free
software" is still relying on latin-1.
I never succeed to find an explanation.

Even, the TeX folks, when they shifted to the Cork
encoding in 199?, were aware of this and consequently
provides special package(s).

No offense, this is in my mind why "corporate software"
will always be "corporate software" and "hobbyist software"
will always stay at the level of "hobbyist software".

A French windows user, understanding nothing in the
coding of characters, assuming he is aware of its
existence (!), has certainly no problem.


Fascinating how it is possible to use Python to teach,
to illustrate, to explain the coding of the characters. No?


jmf
 
R

rusi

Just for reference, here is the starting para of Chris' original mail
that started this thread.
ie it mentions numbers, strings, PEP 393 *AND jmf.*  So while it is
true that jmf has been butting in with trollish behavior into
completely unrelated threads with his unicode rants, that cannot be
said for this thread.

-----

That's because you did not understand the analogy, int/long <-> FSR.

One another illustration,

...     if 0 < i <= 100:
...         return i + 10 + 10 + 10 - 10 - 10 - 10 + 1
...     elif 100 < i <= 1000:
...         return i + 100 + 100 + 100  + 100 - 100 - 100 - 100 - 100
+ 1
...     else:
...         return i + 1
...

Do it work? yes.
Is is "correct"? this can be discussed.

Now replace i by a char, a representent of each "subset"
of the FSR, select a method where this FST behave badly
and take a look of what happen.

[0.6532032148133153, 0.6407248807756699, 0.6407264561239894]>>> timeit.repeat("'a' * 1000 + '9'")

[0.6429508479509245, 0.6242782443215589, 0.6240490311410927]



[1.095694927496563, 1.0696347279235603, 1.0687741939041082]>>> timeit.repeat("'a' * 1000 + 'ẞ'")

[1.0796421281222877, 1.0348612767961853, 1.035325216876231]>>> timeit.repeat("'a' * 1000 + '\u2345'")

[1.0855414137412112, 1.0694677410017164, 1.0688096392412945]



[1.237314015362017, 1.2226262553064657, 1.21994619397816]>>> timeit.repeat("'Å“' * 1000 + '\U00010002'")

[1.245773635836997, 1.2303978424029651, 1.2258257877430765]

Where does it come from? Simple, the FSR breaks the
simple rules used in all coding schemes (unicode or not).
1) a unique set of chars
2) the "same" algorithm for all chars.

Can you give me a source for this requirement?
Numbers are after all numbers. SO we should use the same code/
algorithms/machine-instructions for floating-point and integers?
And again that's why utf-8 is working very smoothly.

How wonderful. Heres a suggestion.
Code up the UTF-8 and any of the python string reps in C and profile
them.
Please come back and tell us if UTF-8 outperforms any of the python
representations for strings on any operation (except straight copy).
The "corporates" which understood this very well and
wanted to incorporate, let say, the used characters
of the French language had only the choice to
create new coding schemes (eg mac-roman, cp1252).

In unicode, the "latin-1" range is real plague.

After years of experience, I'm still fascinated to see
the corporates has solved this issue easily and the "free
software" is still relying on latin-1.
I never succeed to find an explanation.

Even, the TeX folks, when they shifted to the Cork
encoding in 199?, were aware of this and consequently
provides special package(s).

No offense, this is in my mind why "corporate software"
will always be "corporate software" and "hobbyist software"
will always stay at the level of "hobbyist software".

A French windows user, understanding nothing in the
coding of characters, assuming he is aware of its
existence (!), has certainly no problem.

Fascinating how it is possible to use Python to teach,
to illustrate, to explain the coding of the characters. No?

jmf

You troll with eclat and elan!
 
I

Ian Kelly

It is somehow funny to see, the FSR "fails" precisely
on problems Unicode will solve/handle, eg normalization or
sorting [3].

Neither of these problems have anything to do with the FSR. Can you
give us an example of normalization or sorting where Python 3.3 fails
and Python 3.2 does not?
[3] I only test and tested these "chars" blindly with the help
of the doc I have. Btw, when I test complicated "Arabic chars",
I noticed, Py33 "crashes", it does not really crash, it get stucked
in some king of infinite loop (or is it due to "timeit"?).

Without knowing what the actual test that you ran was, we have no way
of answering that. Unless you give us more detail, my assumption
would be that the number of repetitions that you passed to timeit was
excessively large for the particular test case.
[4] Am I the only one who test this kind of stuff?

No, you're just the only one who considers it important.
Micro-benchmarks like the ones you have been reporting are *useful*
when it comes to determining what operations can be better optimized,
but they are not *important* in and of themselves. What is important
is that actual, real-world programs are not significantly slowed by
these kinds of optimizations. Until you can demonstrate that real
programs are adversely affected by PEP 393, there is not in my opinion
any regression that is worth worrying over.
 
T

Terry Jan Reedy

On 2 avr, 16:03, Steven D'Aprano <steve
(e-mail address removed)> wrote:

.... = 'usability in Python" or some variation on that.
Sorrry I never claimed this, I'm just seeing on how Python is becoming
less Unicode friendly.

Let us see what Jim has claimed, starting in 2012 August.

http://mail.python.org/pipermail/python-list/2012-August/628826.html
"Devs are developing sophisticed tools based on a non working basis."

http://mail.python.org/pipermail/python-list/2012-August/629514.html
"This "Flexible String Representation" fails."

http://mail.python.org/pipermail/python-list/2012-August/629554.html
"This flexible representation is working absurdly."

Reader can decide whether 'non-working', 'fails', 'working absurdly' are
closer to 'destroying Unicode usability or just 'less friendly'.

On speed:

http://mail.python.org/pipermail/python-list/2012-August/628781.html
"Python 3.3 is "slower" than Python 3.2."

http://mail.python.org/pipermail/python-list/2012-August/628762.html
"I can open IDLE with Py 3.2 ou Py 3.3 and compare strings
manipulations. Py 3.3 is always slower. Period."

False. Period. Here is my followup at the time.
python.org/pipermail/python-list/2012-August/628779.html
"You have not tried enough tests ;-).

On my Win7-64 system:
from timeit import timeit

print(timeit(" 'a'*10000 "))
3.3.0b2: .5
3.2.3: .8

print(timeit("c in a", "c = '…'; a = 'a'*10000"))
3.3: .05 (independent of len(a)!)
3.2: 5.8 100 times slower! Increase len(a) and the ratio can be made as
high as one wants!

print(timeit("a.encode()", "a = 'a'*1000"))
3.2: 1.5
3.3: .26"

If one runs stringbency.ph with its 40 or so tests, 3.2 is sometimes
faster and 3.3 is sometimes faster.

http://mail.python.org/pipermail/python-list/2012-September/630736.html

On to September:

"http://mail.python.org/pipermail/python-list/2012-September/630736.html"
"Avoid Py3.3"

In other words, ignore all the benefits and reject because a couple of
selected microbenchmarks show a slowdown.

http://mail.python.org/pipermail/python-list/2012-September/631730.html
"Py 3.3 succeeded to somehow kill unicode"

I will stop here and let Jim explain how 'kill unicode' is different
from 'destroy unicode'.
 
J

Joshua Landau

The initial post posited:
"The Python 3 merge of int and long has effectively penalized
small-number arithmetic by removing an optimization. As we've seen
from PEP 393 strings (jmf aside), there can be huge benefits from
having a single type with multiple representations internally. Is
there value in making the int type have a machine-word optimization in
the same way?"

Thanks to the fervent response jmf has gotten, the point above has been
mostly abandoned May I request that next time such an obvious diversion
(aka. jmf) occurs, responses happen in a different thread?
 
L

Lele Gaifax

jmfauth said:
Now replace i by a char, a representent of each "subset"
of the FSR, select a method where this FST behave badly
and take a look of what happen.

You insist in cherry-picking a single "method where this FST behave
badly", even when it is so obviously a corner case (IMHO it is not
reasonably a common case when you have relatively big chunks of ASCII
characters where you are adding one single non-ASCII char...)

Anyway, these are my results on the opposite case, where you have a big
chunk of non-ASCII characters and a single ASCII char added:

Python 2.7.3 (default, Jan 2 2013, 13:56:14)
[GCC 4.7.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
import timeit
timeit.repeat("'€' * 1000 + 'z'") [0.2817099094390869, 0.2811391353607178, 0.2811310291290283]
timeit.repeat("u'Å“' * 1000 + u'\U00010001'") [0.549591064453125, 0.5502040386199951, 0.5490291118621826]
timeit.repeat("u'\U00010001' * 1000 + u'Å“'") [0.3823568820953369, 0.3823089599609375, 0.3820679187774658]
timeit.repeat("u'\U00010002' * 1000 + 'a'")
[0.45046305656433105, 0.45000195503234863, 0.44980502128601074]

Python 3.3.0 (default, Mar 18 2013, 12:00:52)
[GCC 4.7.2] on linux
Type "help", "copyright", "credits" or "license" for more information.
import timeit
timeit.repeat("'€' * 1000 + 'z'") [0.23264244200254325, 0.23299441300332546, 0.2325888039995334]
timeit.repeat("'Å“' * 1000 + '\U00010001'") [0.3760241370036965, 0.37552819900156464, 0.3755163860041648]
timeit.repeat("'\U00010001' * 1000 + 'Å“'") [0.28259182300098473, 0.2825558360054856, 0.2824251129932236]
timeit.repeat("'\U00010002' * 1000 + 'a'")
[0.28227063300437294, 0.2815949220021139, 0.2829978369991295]

IIUC, while it may be true that Py3 is slightly slower than Py2 when the
string operation involves an internal representation change (all your
examples, and the second operation above), in the much more common case
it is considerably faster. This, and the fact that Py3 actually handles
the whole Unicode space without glitches, make it a better environment
in my eyes. Kudos to the core team!

Just my 0.45-0.28 cents,
ciao, lele.
 
R

rusi

    Sorting a million string list (all the file paths on a particular
computer) went from 0.4 seconds with Python 3.2 to 0.78 with 3.3 so
we're out of the 'not noticeable by humans' range. Perhaps this is still
a 'micro-benchmark' - I'd just like to avoid adding email access to get
this over the threshold.

What does that last statement mean?
 
R

Roy Smith

On the other hand, how long did it take you to do the directory tree
walk required to find those million paths? I'll bet a long longer than
0.78 seconds, so this gets lost in the noise.

Still, it is unfortunate if sort performance got hurt significantly. My
mind was blown a while ago when I discovered that python could sort a
file of strings faster than the unix command-line sort utility. That's
pretty impressive.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,102
Messages
2,570,645
Members
47,245
Latest member
ShannonEat

Latest Threads

Top