Python is far from a top performer according to benchmark test...

K

Ken

Do you spend a "significant" amount of time actually optimizing your
Python applications? (Significant is here defined as "more than five
percent of your time", which is for example two hours a week in a
40-hour work week.)

Some of them.

I'm using python for data transformations: some feeds are small and
easily handled by python, however the large ones (10 million rows per
file) require a bit of thought to be spent on performance. However,
this isn't exactly python optimization - more like shifting high-level
pieces around in the architecture: merge two files or do a binary
lookup (nested-loop-join) one one? etc...

To make matters worse we just implemented a metadata-driven
transformation engine entirely written in python. It'll work great on
the small files, but the large ones...

Luckily, the nature of this application lends itself towards
distributed processing - so my plan is to:
1. check out psycho for the metadata-driven tool
2. partition the feeds across multiple servers
3. rewrite performance-intensive functions in c

But I think I'll get by with just options #1 and #2: we're using
python and it's working well - exactly because it is so adaptable.
The cost in performance is inconsequential in this case compared to
the maintainability.
 
T

Terry Reedy

Krzysztof Stachlewski said:
With "heavy use of Numeric module" you were calling functions
written in C. So how can you say that Python is fast,
when C code is doing all the work.

Well gee. *All* of the functions exposed in builtins *and* in built-in
modules are also written in C. So are all the methods of builtin types and
all the hidden functions (some exposed in the C API), including the
compilation and interpretation. So how can anyone even talk about the
speed of Python, when C code is doing all the work, whether quickly or
slowly!

[and in another post]
I just think that the Numeric package is not the best example
of the speed of Python itself.

But what is 'Python' itself? I think you are making a false distinction.
Numerical Python and other scientific code driving C and Fortran functions
was a, if not the killer app for Python when I learned it about 7 years
ago. It was so important to the early success of Python, such as it was,
that the slice object was added just for its use.

Terry J. Reedy
 
B

Brian Kelley

This is my "straw poll" question:
Do you spend a "significant" amount of time actually optimizing your
Python applications? (Significant is here defined as "more than five
percent of your time", which is for example two hours a week in a
40-hour work week.)

Yes and No :) I find that optimizing algorithms is a lot more
beneficial than optimizing code. Let me give a small example. I have
written a chemoinformatics engine (http://frowns.sourceforge.net/) and
one of it's features is substructure searching. That is finding if a
graph is embedded in another graph. This is an NP-Complete problem. A
company recently compared their technique in terms of speed and
correctness to frowns. According to them, frowns was 99% percent
correct but 1000x slower. (Why they were marketing their system which
was built over 5+ man years against mine which was built over 3 months I
never will understand)

Now, 1000x is a *lot* slower. However, when used in practice in a
database setting, my system has a quick mechanism that can reject false
matches very quickly. This is standard practice for chemistry databases
by the way. All of a sudden the 1000x difference becomes almost
meaningless. For a given search across 300000+ compounds, my system
takes 1.2 seconds and their's takes 25 minutes. Using my prefiltering
scheme their system takes 0.7 seconds. Now my code didn't change at
all, only the way it was used changed.

I could, of course, generate an example that takes me much longer but
the average case is a whole lot better. My system is free though, so my
users tend not to mind (or quite honestly, expect) as much :)

Brian
 
L

Lothar Scholz

Samuel Walters said:
For numerical processing, C is the right tool,

Definitely not, you don't want a pointer language when using numerical
processing: use Fortran.
 
S

Sean 'Shaleh' Perry

Yes or no answers suffice, but feel free to follow up with a paragraph
qualifying your answer (or quantifying it!). :)

not any longer.

As I learned on this and the tutor list, writing code in Pythonic style tends
to also result in code being fast enough. Most of my early problems resulted
from trying to write C in Python.
 
T

Tim Delaney

From: "Peter Hansen said:
This is my "straw poll" question:

Do you spend a "significant" amount of time actually optimizing your
Python applications? (Significant is here defined as "more than five
percent of your time", which is for example two hours a week in a
40-hour work week.)

I have to say that I do, but I'm also dealing with datasets up to about
500MB in the worst case - but about 10-20MB in the normal case.

In most cases, the optimisations are things like only doing a single pass
over any file, etc. A naive prototype often doesn't scale well when I need
to deal with large datasets. Often using psyco will be sufficient to get me
over the hurdle (see the psyco thread) but sometimes not.

Tim Delaney
 
S

Skip Montanaro

QOTW perhaps?

Sam> I read the benchmark and I think it doesn't measure python in it's
Sam> target area. That's like taking a world-class marathon runner and
Sam> wondering why he doesn't compete well in a figure-skating event.

Skip
 
S

Samuel Walters

|Thus Spake Lothar Scholz On the now historical date of Fri, 09 Jan 2004
21:29:56 -0800|
Definitely not, you don't want a pointer language when using numerical
processing: use Fortran.

Hmm. I feel misunderstood. I'm going to try to clarify, but if I'm the
one doing the misunderstanding, feel free to give me a good old-fashioned
usenet style bitchslapping back to the stone age.

First off: Truth in advertising.
I know very little about numeric processing, and even less about Fortran.
It's true that my background is in mathematics, but in *pure* mathematics
where pointer-based languages tend to be helpful, not hurtful. I chose
pure mathematics precisely because it eschews the grubby sort of shortcuts
that applied mathematics uses. In other words, I didn't have the proper
sort of mathematical intuition for it, so I chose pure, where my kind of
intuition works well. (In the end, this was to my detriment. All the
interesting problems are in applied math!)

As I see it, when one considers which language is best for one's needs,
one considers a couple of things:

1) Does the language have the semantics I want.
2) Does the language have the primitives I need.
3) Can I *easily* build any missing or suboptimal primitives.

One would assume that Fortran has the proper semantics for numerical
processing because it seems to have been wildly successful for a long
period of time. It would appear that Python has the proper semantics for
numerical processing because a significant number of people are using it
for that, and they'd be using something else if Python caused them too
many headaches.

Fortran naturally comes with the primitives for numerical processing,
because numerical processing is its stated goal. ("FORmula TRANslation")
Python doesn't seem to have the native and optimal primitives for
numerical processing, so that leads to point three.

Whether one uses Fortran, Python, or any other language, all primitives
are eventually implemented in either C or assembly. At some point or
another, we end up scraping bare metal and silicon to get our answers.
The question then becomes, "How easily can I extend this language to fit
my needs." NumPy is evidence that at least a few people said "Easily
enough." I don't know how extensible Fortran is, but I would guess "very"
since I've heard of it being applied in many domains other than numerical
processing. (OpenFirmware, for example.)

So, I guess that my point is that C might not be the right language for
doing numerical processing, but it seems the right language for
implementing the primitives of numerical processing. Those primitives
should, of course, be designed in such a manner that their behaviors are
not muddied by pointer issues.

Moving on:
I think Python's strength flows from the three criterion for choosing a
language. It's semantics seem to naturally fit the way a programmer
thinks about problems. All the algorithmic primitives are there for
naturally expressing one's self easily. Where the primitives don't exist,
it's easy to bind outside primitives into the system seamlessly. One of
the joy's of python is that c extension libraries almost never feel bolted
on. They feel like an intimate part of the language itself. Part of that
is the blood, sweat and tears of the library implementors, but much of it
is also the elegance of Python.

As far as the straw-poll goes, I think it's a good question to ask, and
that the answer is important, but we also need to figure out where else we
can ask this question. The problem with asking such a question solely on
c.l.p is that everyone here has either decided that optimization in python
isn't enough of an issue to bother them, or hasn't made up their
mind yet. Those who have decided that optimization in python is a problem
have already gone elsewhere. Perhaps a better question to ask is "Who has
decided that Python is too slow for their needs, what prompted that
decision and are the issues they had worth addressing?"

Sam Walters.
 
R

Rainer Deyke

Samuel said:
So, I guess that my point is that C might not be the right language
for doing numerical processing, but it seems the right language for
implementing the primitives of numerical processing.

The issue with C is that it is too slow for implementing those primitives
(in part due to pointer aliasing issues). Fortran is considerably faster.
 
S

Samuel Walters

|Thus Spake Skip Montanaro On the now historical date of Sat, 10 Jan 2004
07:50:09 -0600|
QOTW perhaps?

Sam> I read the benchmark and I think it doesn't measure python in
it's Sam> target area. That's like taking a world-class marathon
runner and Sam> wondering why he doesn't compete well in a
figure-skating event.

Skip

*garsh*

I feel flattered. *blush*

You know, I sadly spent quite a bit of time debating which simile to use
there. I wandered around the house wondering what to put there.

Some rejected ideas:
"It's like asking Kasparov why he didn't win the weight-lifting
competition."

"That's like asking a world-class marathon runner why he doesn't compete
well in a weight-lifting competition."

"That's like asking a world-class weightlifter why they didn't do well in
the figure skating competition." (I almost used this one because it
conjures images of a burly Russian weight-lifter floundering on ice
skates. Very Monty-Python-esque.)

I chose the one I did in case I needed to later state that "Both figure
skating and marathon running are aerobic sports, but that doesn't mean
that the skills involved are the same."

..

Now, I feel compelled to justify my statement.

Let's look closely at the benchmarks and try to figure out if there's a
good reason why we fell down where we did.

We did poorly at Math, and competitively at I/O.

I'm reminded of Antoine de Saint-Exupe'ry saying "A designer knows he has
achieved perfection not when there is nothing left to add, but when there
is nothing left to take away." While not part of the Zen of Python, this
seems to be an unstated principle of Python's design. It seems to focus
on the bare minimum of what's needed for elegant expression of algorithms,
and leave any extravagances to importable libraries.

Why, then, are integers and longs part of Python itself, and not part of a
library? Well, we need such constructs for counters, loops and indexes.
Both range and xrange are evidence of this. Were it not for this, I
daresay that we'd have at least argued the necessity of keeping these
things in the language itself.

Floats are a pragmatic convenience, because it's nice to be able to throw
around the odd floating point number when you need to. Trig functions are
housed in a separate library and notice that we didn't do too shabby there.

I/O is one of our strengths, because we understand that most programs are
not algorithmically bound, but rather I/O bound. I/O is a big
bottle-neck, so we should be damn good at it. The fastest assembly
program won't do much good if it's always waiting on the disk-drive.

Perhaps our simplicity is the reason we hear so many Lisp'ers vocally
complaining. While more pragmatic than Lisp, Python is definitely edging
into the "Lambda Calculus Zone" that Lisp'ers have historically been the
proud sole-occupants of. After all, until Python, when one wanted a
nearly theoretical programming experience, one either went to C/Assembly
(Turing Machine Metaphor) or Lisp (Lambda Calculus Metaphor.)

Python is being used in so many introductory programming courses for the
very reason that it so purely fits the way a programmer thinks, while
still managing to be pragmatic. It allows for a natural introduction to
some of the hardest concepts: Pointers/Reference, Namespaces, Objects and
Legibility. Each of these concepts is difficult to learn if you are first
indoctrinated into an environment without them. In my attempts to learn
C++, I initially felt like I was beating my head up against a wall trying
to learn what an object was and why one would use them. I have since
observed that people coming from a strongly functional programming
background have the same experience, while those with no functional
programming dogma in them find objects quite a natural concept. The same
thing is true of the other concepts I mentioned. If you have them, it's
easy to work without them. If you don't, you'll flounder trying to pick
them up. Think about how easy it is to pick out a C programmer from their
Python coding style.

The one important concept I didn't mention is message-passing. This is an
important, but much less used concept. It is the domain of Smalltalk and
Ruby. I've looked some at Ruby, and lurk their Usenet group. From what I
can tell, Ruby takes almost the same philosophy as Python, except where we
think namespaces are a honking great idea, they think message-passing is a
honking great idea. The nice thing about message-passing is that if you
have all the other concepts of OO down, message passing seems natural and
is not terribly difficult to "fake" when it's the only missing OO
primitive. This is why C++, while not a message-based OO language, is
used so often in GUI's, an inherently message-based domain. This is also
why we have such a nice, broad choice of GUI toolkits under Python despite
lacking a message primitive.


Well, I've blathered enough on this topic.
I hope, at least, that I've said something worthwhile.
Though, I doubt I've said anything that hasn't been said better before.

Caffeine, Boredom and Usenet are a dangerous mix.

Sam Walters.
 
S

Samuel Walters

|Thus Spake Rainer Deyke On the now historical date of Sun, 11 Jan 2004
06:46:50 +0000|
The issue with C is that it is too slow for implementing those
primitives (in part due to pointer aliasing issues). Fortran is
considerably faster.

I stand corrected.

Please help me to understand the situation better.

I went digging for technical documents, but thus far haven't found many
useful ones. It seems everyone but me already understands pointer
aliasing models, so they might discuss them, but they don't explain them.
I am limited in part by my understanding of compilers and also by my
understanding of Fortran. Here is what I have gathered so far:

Fortran lacks a stack for function calls. This promotes speed, but
prevents recursive functions. (Most recursive functions can efficiently be
written as loops, though, so this shouldn't be considered a hindrance.)

Fortran passes all arguments by reference. (This is the peppiest way to do
it, especially with static allocation)

Fortran 77 lacks explicit pointers and favors static allocation. This
allows for the compiler to apply powerful automatic optimization.

Fortran 90 added explicit pointers, but required them to only be pointed
at specific kinds of objects, and only when those particular objects are
declared as targets for pointers. This allows the compiler to still apply
powerful automatic optimizations to code. I'm a bit hazy as to whether
Fortran 90 uses static or dynamic allocation, or a combination of both,
and whether it permits recursion.

These pointers not only reference location, but also dimension and stride.
Stride is implicit in C pointer declarations (by virtue of compile-time
knowledge of the data type pointed to) but dimension is not.

Fortran's extensions for parallel programming have been standardized, and
the language itself makes it easy to decide how to parallelize procedures
at compile time. Thus, it is especially favored for numeric computation on
big iron with lots of parallelism.

Now, for C:

Because of dynamic allocation on the stack and the heap, there is no
compile-time knowledge of where a variable will live, which adds an extra
layer of reference for even static variables. This also invalidates many
of optimizations used by Fortran compilers.

C lacks many of the fundamental array handling semantics and primitives
that Fortran programs rely on. Implementing them in C is a real PITA.

C memory allocation is just plain goofy in comparison to Fortran.

To sum up:
Fortran sacrifices generality and dynamism for compile-time knowledge
about data, and deeply optimizes based on that knowledge.

C sacrifices speed for the sake of generality and dynamism.

Please correct me or help me flesh out my ideas. Please don't skimp on
the low-level details, I've done my fair share of assembly programming, so
what I don't understand, I'll probably be able to find out by googling a
bit.

Some other interesting things I found out:

There are two projects that allow interfacing between Python and Fortran:
F2Py
http://cens.ioc.ee/projects/f2py2e/
PyFortran
http://sourceforge.net/projects/pyfortran

Fortran amply supports interfaces to C and C++

Fortran is compiled. (Doh! and I thought it was interpreted.)

There are lots of debates on whether C++ will ever be as fast as Fortran.
The consensus seems to be "Only if you use the right compiler with the
right switches and are insanely careful about how you program. IOW Don't
bother, just use Fortran if you want to do numeric processing.

Well, there's another language to add to my list of languages to learn. It
seems to be "The Right Tool" for a great many applications, it interfaces
well with other languages, and it's extremely portable. Chances are, I'll
end up using it somewhere somehow someday. Now. To find some Fortran
tutorials.

Thanks in advance for any of your knowledge and wisdom you are willing to
confer upon me.

Sam Walters.
 
G

Ganesan R

Samuel" == Samuel Walters said:
I/O is one of our strengths, because we understand that most programs are
not algorithmically bound, but rather I/O bound. I/O is a big
bottle-neck, so we should be damn good at it. The fastest assembly
program won't do much good if it's always waiting on the disk-drive.

Actually, Python is much slower than Perl for I/O. See the thread titled
"Python IO Performance?" in groups.google.com for a thread started by me on
this topic. I am a full time C programmer but do write occasional
Python/Perl for professional/personal use.

To answer the original question about how much percentage of time I spend
optimizing my Python programs - probably never. However I did switch back to
using Perl for my most of my text processing needs. For one program that was
intended to lookup patterns in a gzipped word list, performance of the
original python version was horribly slow. Instead of rewriting it in Perl,
I simply opened a pipe to zgrep and did post processing in python. This
turned out to be much faster - I don't remember how much faster, but I
remember waiting for the output from the pure python version while the
python+zgrep hybrid results were almost instantaneous.

Ganesan
 
J

John J. Lee

Samuel Walters said:
|Thus Spake Rainer Deyke On the now historical date of Sun, 11 Jan 2004
06:46:50 +0000| [...]
The issue with C is that it is too slow for implementing those
primitives (in part due to pointer aliasing issues). Fortran is
considerably faster.

I stand corrected.

Please help me to understand the situation better.

I went digging for technical documents, but thus far haven't found many
useful ones. It seems everyone but me already understands pointer
aliasing models, so they might discuss them, but they don't explain them.
[...]

(haven't read all your post, so sorry if I'm telling you stuff you
already know)

Pointer aliasing is just the state of affairs where two pointers refer
to a single region of memory. Fortran compilers have more information
about aliasing than do C compilers, so can make more assumptions at
compilation time.

Have you tried comp.lang.fortran, comp.lang.c++, comp.lang.c, etc?

http://www.google.com/[email protected]&rnum=3

http://tinyurl.com/2v3v5


John
 
D

David M. Cooke

At some point said:
Whether one uses Fortran, Python, or any other language, all primitives
are eventually implemented in either C or assembly. At some point or
another, we end up scraping bare metal and silicon to get our answers.
The question then becomes, "How easily can I extend this language to fit
my needs." NumPy is evidence that at least a few people said "Easily
enough." I don't know how extensible Fortran is, but I would guess "very"
since I've heard of it being applied in many domains other than numerical
processing. (OpenFirmware, for example.)

You're confusing Fortran with Forth, which is a stack-based language,
much like Postscript, or RPL used on HP 48 calculators.

These days, I doubt Fortran is used for anything but numerical processing.
 
D

Dan Bishop

Samuel Walters said:
|Thus Spake Rainer Deyke On the now historical date of Sun, 11 Jan 2004
06:46:50 +0000|
Samuel Walters wrote:
[Fortran is faster than C.]
....
I went digging for technical documents, but thus far haven't found many
useful ones. It seems everyone but me already understands pointer
aliasing models, so they might discuss them, but they don't explain them.
I am limited in part by my understanding of compilers and also by my
understanding of Fortran. Here is what I have gathered so far:

Fortran passes all arguments by reference. (This is the peppiest way to do
it, especially with static allocation)

Btw, for some early compilers, this was the case even with literals,
which meant that code like

CALL FOO(4)
PRINT *, 4

could print something other than 4.
...I'm a bit hazy as to whether
Fortran 90 uses static or dynamic allocation, or a combination of both,

You can use both, at least for arrays.
and whether it permits recursion.

Fortran 90 does permit recursion, although you have to explicitly
declare functions as "recursive".
Now, for C: ....
C lacks many of the fundamental array handling semantics and primitives
that Fortran programs rely on. Implementing them in C is a real PITA.

This is one of my least favorite things about C.
C memory allocation is just plain goofy in comparison to Fortran.

And even worse in comparison to Python ;-)
 
M

Matthias

Peter Hansen said:
This is my "straw poll" question:

Do you spend a "significant" amount of time actually optimizing your
Python applications? (Significant is here defined as "more than five
percent of your time", which is for example two hours a week in a
40-hour work week.)

I was working on an image processing application and was looking for a
quick prototyping language. I was ready to accept a 10-fold decrease
in execution speed w.r.t. C/C++. With python+psycho, I experienced a
1000-fold decrease.

So I started re-writing parts of my program in C. Execution speed now
increased, but productivity was as low as before (actually writing the
programs directly in C++ felt somewhat more natural). Often it
happened that I prototyped an algorithm in python, started the
program, implemented the algorithm in C as an extension module and
before the python algorithm had finished I got the result from the
C-algorithm. :-(

I've tried numerics, but my code was mostly not suitable for
vectorization and I did not like the pointer semantics of numerics.

So my answer to the question above is NO, I don't spend significant
times optimizing python code as I do not use python for
computationally intensive calculations any more. My alternatives are
Matlab and (sometimes) Common Lisp or Scheme or Haskell.
 
F

Frithiof Andreas Jensen

As I see it, when one considers which language is best for one's needs,
one considers a couple of things:

1) Does the language have the semantics I want.
2) Does the language have the primitives I need.
3) Can I *easily* build any missing or suboptimal primitives.

True.

One would assume that Fortran has the proper semantics for numerical
processing because it seems to have been wildly successful for a long
period of time.

That, in my opinion, is wrong: Fortran is successful because it was there
first!

There exists a very large set of actively supported and proven libraries,
NAG f.ex., which nobody will ever bother to port to another language just
for the sake of it, and Fortran has been around for so long that it is well
understood how best to optimise and compile Fortran code. It is easy enough
to link with NAG if one needs to use it.
Fortran naturally comes with the primitives for numerical processing,
because numerical processing is its stated goal. ("FORmula TRANslation")

....or maybe the name sounded cool ;-)
Whether one uses Fortran, Python, or any other language, all primitives
are eventually implemented in either C or assembly. At some point or
another, we end up scraping bare metal and silicon to get our answers.

Exactly - Fortran in itself does not do something that another language
cannot do as well. It is just the case that Fortran is better understood
when applied to numering processing than other languages because more
"numerics" people used it than any other language.

On DSP architectures, f.ex., I doubt that one would have better performance
using Fortran in comparison with the C/C++ tools, DSP's usually ship with -
because DSP's were "born" when C/C++ was hot.

A lot of real, serious DSP work is done in Mathlab - thus skipping the issue
of language choice and getting right onto getting the job done. This is good
IMO.
 
L

Lothar Scholz

Samuel Walters said:
I know very little about numeric processing, and even less about Fortran.
It's true that my background is in mathematics, but in *pure* mathematics
where pointer-based languages tend to be helpful, not hurtful.

Okay seems that you don't know a lot about compiler writing.

A C compiler only knows a little bit about the context so it must
always assume that a data inside a member can be referenced from
another place via an alias pointer.

Fortran does not have this problem so a lot of optimizations can be
done and values can be hold in registers for a much longer time,
resulting in much greater speed.

Remember that on supercomputers a 25% spped enhancement (which a good
fortran compiler is faster then C) can mean a few million dollars of
saved hardware costs. The coding time is not importing compared to the
running time. So real hard numerics are always done in Fortran.

GNU Fortran is a stupid project because it translates the Fortran code
to C.


Python for hardcore numerics, even with PyNumerics, is simply a very
bad solution.
 
P

Peter Hansen

Ganesan said:
Actually, Python is much slower than Perl for I/O. See the thread titled
"Python IO Performance?" in groups.google.com for a thread started by me on
this topic. I am a full time C programmer but do write occasional
Python/Perl for professional/personal use.

To answer the original question about how much percentage of time I spend
optimizing my Python programs - probably never. However I did switch back to
using Perl for my most of my text processing needs. For one program that was
intended to lookup patterns in a gzipped word list, performance of the
original python version was horribly slow. Instead of rewriting it in Perl,
I simply opened a pipe to zgrep and did post processing in python. This
turned out to be much faster - I don't remember how much faster, but I
remember waiting for the output from the pure python version while the
python+zgrep hybrid results were almost instantaneous.

I didn't consider this sort of thing in my poll, but I'd have to say you
actually *are* optimizing your Python programs, even if you did it by falling
back on another language...

-Peter
 
P

Peter Hansen

Paul said:
Yes, absolutely.


Sometimes I'll take the time to implement a fancy algorithm in Python
where in a faster language I could use brute force and still be fast
enough. I'd count that as an optimization.

I would count it as optimization as well, which is why I qualified my
comment with "regardless of implementation language". Clearly in your
case that clause does not apply.

-Peter
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,175
Messages
2,570,944
Members
47,491
Latest member
mohitk

Latest Threads

Top