Python advocacy in scientific computation

M

Michael Tobis

Someone asked me to write a brief essay regarding the value-add
proposition for Python in the Fortran community. Slightly modified to
remove a few climatology-related specifics, here it is.

I would welcome comments and corrections, and would be happy to
contribute some version of this to the Python website if it is of
interest.

===

The established use of Fortran in continuum models such as climate
models has some benefits, including very high performance and
flexibility in dealing with regular arrays, backward compatibility with
the existing code base, and the familiarity with the language among the
modeling community. Fortran 90 and later versions have taken many of
the lessons of object oriented programming and adapted them so that
logical separation of modules is supported, allowing for more effective
development of large systems. However, there are many purposes to which
Fortran is ill-suited which are increasingly part of the modeling
environment.

These include: source and version control and audit trails for runs,
build system management, test specification, deployment testing (across
multiple platforms), post-processing analysis, run-time and
asynchronous visualization, distributed control and ensemble
management. To achieve these goals, a combination of shell scripts,
specialized build tools, specialized applications written in several
object-oriented languages, and various web and network deployment
strategies have been deployed in an ad hoc manner. Not only has much
duplication of effort occurred, a great deal of struggling up the
learning curves of various technologies has been required as one need
or another has been addressed in various ad hoc ways.

A new need arises as the ambitions of physical modeling increase; this
is the rapid prototyping and testing of new model components. As the
number of possible configurations of a model increases, the expense and
difficulty of both unit testing and integration testing becomes more
demanding.

Fortunately, there is Python. Python is a very flexible language that
has captured the enthusiasm of commercial and scientific programmers
alike. The perception of Python programmers coming from almost any
other language is that they are suddenly dramatically several times
more productive than previously, in terms of functionality delivered
per unit of programmer time.

One slogan of the Python community is that the language "fits your
brain". Why this might be the case is an interesting question. There
are no startling computer science breakthroughs original to the
language, Rather, Python afficionados will claim that the language
combines the best features of such various languages as Lisp, Perl,
Java, and Matlab. Eschewing allegiance to a specific theory of how to
program, Python's design instead offers the best practices from many
other software cultures.

The synergies among these programming modes is in some ways harder to
explain than to experience. The Python novice may nevertheless observe
that a single language can take the place of shell scripts, makefiles,
desktop computation environments, compiled languages to build GUIs, and
scripting languages to build web interfaces. In addition, Python is
useful as a wrapper for Fortran modules, facilitating the
implementation of true test-driven design processes in Fortran models.

Another Python advocacy slogan is "batteries included". The point here
is that (in part because Python is dramatically easier to write than
other languages) there is a very broad range of very powerful standard
libraries that make many tasks which are difficult in other languages
astonishingly easy in Python. For instance, drawing upon the standard
libraries (no additional download required) a portable webserver
(runnable on both Microsoft and Unix-based platforms) can be
implemented in seven lines of code. (See
http://effbot.org/librarybook/simplehttpserver.htm ) Installation of
pure python packages is also very easy, and installation of mixed
language products with a Python component is generally not
significantly harder than a comparable product with no Python
component.

Among the Python components and Python bindings of special interest to
scientists are the elegant and powerful matplotlib plotting package,
which began by emulating and now surpasses the plotting features of
Matlab, SWIG, which allows for runtime interoperability with various
languages, f2py which specifically interoperates with Fortran, NetCDF
libraries (which cope with NetCDF files with dramatically less fuss
than the standard C or Fortran bindings), statistics packages including
bindings to the R language, linear algebra packages, various
platform-specific and portable GUI libraries, genetic algorithms,
optimization libraries, and bindings for high performance differential
equation solvers (notably, using the Argonne National Laboratory
package PetSC). An especially interesting Python trick for runtime
visualization in models that were not designed to support it, pioneered
by David Beazley's SWILL, embeds a web server in your model code.

See especially http://starship.python.net/~hinsen/ScientificPython/ and
http://scipy.org as good starting points to learn about scientific uses
of Python.

mt
 
M

mrmakent

Nicely done. But now for a couple of small nits:

other language is that they are suddenly dramatically several times
more productive

'suddenly dramatically several times' seems a bit redundantly
repeditively excessive, don't you think?

Among the Python components and Python bindings of special interest to
scientists are the elegant and powerful matplotlib plotting package,
which began by emulating and now surpasses the plotting features of
Matlab, SWIG, which allows for runtime interoperability with various
languages, f2py which specifically interoperates with Fortran, NetCDF
libraries (which cope with NetCDF files with dramatically less fuss
than the standard C or Fortran bindings), statistics packages including
bindings to the R language, linear algebra packages, various
platform-specific and portable GUI libraries, genetic algorithms,
optimization libraries, and bindings for high performance differential
equation solvers (notably, using the Argonne National Laboratory
package PetSC).

As the length of the sentence built up, and the inumerable commas
passed by, my brain exploded. I'd suggest turning this into a bullet
list.
 
C

Cameron Laird

.
.
.
Among the Python components and Python bindings of special interest to
scientists are the elegant and powerful matplotlib plotting package,
which began by emulating and now surpasses the plotting features of
Matlab, SWIG, which allows for runtime interoperability with various
languages, f2py which specifically interoperates with Fortran, NetCDF
libraries (which cope with NetCDF files with dramatically less fuss
than the standard C or Fortran bindings), statistics packages including
bindings to the R language, linear algebra packages, various
platform-specific and portable GUI libraries, genetic algorithms,
optimization libraries, and bindings for high performance differential
equation solvers (notably, using the Argonne National Laboratory
package PetSC). An especially interesting Python trick for runtime
visualization in models that were not designed to support it, pioneered
by David Beazley's SWILL, embeds a web server in your model code.

See especially http://starship.python.net/~hinsen/ScientificPython/ and
http://scipy.org as good starting points to learn about scientific uses
of Python.

mt

Lovely; putting a copy here is a great service to others.

I want a few subtle changes. I applaud the slogan about how Python
encompasses the best of Matlab and Java (among others); I like to
think that'll get through. In that vicinity would be a good place,
if practical, to work in mention that:
A. Python is *excellent* for long-lasting and/or
group work;
B. Python's licensing is friendly;
C. It's a real language, and therefore generalizes
far better than Matlab; and
D. Has an unrivaled span of practicality, so that
learning it enables a researcher to tackle a
wide variety of software taskes.
You touch on these matters, but I think that section might be pro-
pitious for promoting them, perhaps along with
E. Python's ease-of-learning and successful record
in the hands of children, scientists, and other
casual practitioners.

Also, my instinct is to underline that this stuff is REAL. David
Beazley was winning awards with his scientific Python-Fortran
marriage back in the '90s. Perhaps your audience doesn't need so
much convincing on that point ...
 
J

Juho Schultz

Michael said:
Someone asked me to write a brief essay regarding the value-add
proposition for Python in the Fortran community. Slightly modified to
remove a few climatology-related specifics, here it is.

Thank you - this was very good reading.
I would welcome comments and corrections, and would be happy to
contribute some version of this to the Python website if it is of
interest.

A slight broadening of the perspective could show another advantage:
Python is also used for data processing, at least in astronomy. Modeling
and processing the data in the same environment is very practical. Spend
more time on modeling and processing the critical data sections -
critical data section may depend on model parameters and sampling (which
is often incomplete and uneven). You also avoid wasting CPU cycles to
model things not in the data.

A theorist may be perfectly happy with Fortran, and an observer could do
his stuff with simple scripts. But if they need to work together, Python
is a very good option.
 
G

Georg Brandl

Michael said:
Someone asked me to write a brief essay regarding the value-add
proposition for Python in the Fortran community. Slightly modified to
remove a few climatology-related specifics, here it is.

Great text. Do you want to put it onto a Wiki page at wiki.python.org?

Georg
 
B

beliavsky

I have posted your essay in a thread "Python for Fortran programmers"
in comp.lang.fortran since it is written in part for a Fortran
audience, and since you are more likely to get critical (but hopefully
constructive) comments there.
 
P

Peter Tillotson

Hi,

Like it - an area that doesn't come out strongly enough for me is
Python's ability to drop down to and integrate with low level
algorithms. This allows me to to optimise the key bits of design in
python very quickly and then if I still need more poke i can drop down
to low level programming languages. Optimise design, not code unless I
really need to.

To be fair the same is at least partly true for Java ( though supporting
JNI code scares me ) but my prototyping productivity isn't as high.

The distributed / HPC packages may also be worth noting - PyMPI and
PyGlobus.

p
 
T

tooper

Maybe I'd also emphasize the nice COM interface that allow your wrapped
Fortran to be made available in your Excel macros in a snap. It happens
that Fortran programmers/users tends to be poor Office users except for
Excel which they master at unbelievable level...
My own best low work/high user satisfaction ever is just this, wrap
LOWTRAN call to make is usable from Excel, a 1/2h work and a 100+ users
2 days later !
 
S

sturlamolden

Michael Tobis skrev:

Being a scientist, I can tell you that your not getting it right. If
you speak computer science or business talk no scientist are going to
listen. Lets just see how you argue:
These include: source and version control and audit trails for runs,
build system management, test specification, deployment testing (across
multiple platforms), post-processing analysis, run-time and
asynchronous visualization, distributed control and ensemble
management.

At this point, no scientist will no longer understand what the heck you
are talking about. All have stopped reading and are busy doing
experiments in the laboratory instead. Perhaps it sound good to a CS
geek, but not to a busy researcher.

Typically a scientist need to:

1. do a lot of experiments

2. analyse the data from experiments

3. run a simulation now and then

Thus, we need something that is "easy to program" and "runs fast
enough" (and by fast enough we usually mean extremely fast). The tools
of choice seems to be Fortran for the older professors (you can't teach
old dogs new tricks) and MATLAB (perhaps combined with plain C) for the
younger ones (that would e.g. be yours truly). Hiring professional
programmers are usually futile, as they don't understand the problems
we are working with. They can't solve problems they don't understand.

What you really ned to address is something very simple:


Why is Python better a better Matlab than Matlab?


The programs we need to write typically falls into one of three
categories:

1. simulations
2. data analysis
3. experiment control and data aquisition

(that are words that scientists do know)

In addition, there are 10 things you should know about scientific
programming:

1. Time is money. Time is the only thing that a scientist cannot afford
to lose. Licensing fees for Matlab is not an issue. If we can spend
$1,000,000 on specialised equipment we can pay whatever Mathworks or
Lahey charges as well. However, time spent programming are an issue.
(As are time time spend learning a new language.)

2. We don't need fancy GUIs. GUI coding is a waste of time we don't
have. We don't care if Python have fancy GUI frameworks or not.

3. We do need fancy data plotting and graphing. We do need fancy
plotting and graphing that are easy to use - as in Matlab or S-PLUS.

4. Anything that has to do with website development or enterprise class
production quality control are crap that we don't care about.

5. Versioning control? For each program there is only one developer and
a single or a handful users.

6. The prototype is the final version. We are not making software for a
living, we are doing research.

7. "My simulation is running to slowly" is the number ONE complaint.
Speed of excecution is an issue, regardless of what computer science
folks try to tell you. That is why we spend disproportionate amount of
time learning to vectorize Matlab code.

8. "My simulation is running of of memory" is the number TWO complaint.
Matlab is notoriously known for leaking memory and fragmenting the
heap.

9. What are algorithms and data structures? Very few of us knows how to
use a datastructure more complicated than an array. That is why we like
Matlab and Fortran so much.

10. We are novice programmers. We are not passionate programmers. We
take no pride in our work. The easier hack the better. We don't care if
we are doing OOP or not. However, we do hate complicated APIs or APIs
that look funny. We are used to seeing sin(x) in our calculus textbooks
and because of that we don't find Math.Sin(x) particularly elegant --
even though Math.Sin(x) is more OOP and sin(x) clutters the global
namespace.


Now please go ahead and tell me how Python can help me become a better
scientist. And try to steer clear of the computer science buzzwords
that don't mean anyting to me.

Thanks!

Sturla Molden
(neuroscience PhD)
 
T

Terry Hancock

1. Time is money. Time is the only thing that a scientist
cannot afford to lose. Licensing fees for Matlab is not an
issue. If we can spend $1,000,000 on specialised equipment
we can pay whatever Mathworks or Lahey charges as well.
However, time spent programming are an issue. (As are time
time spend learning a new language.)

"that man speaks for himself!" ;-)

Seriously, this depends on the lab. If you're working for
a monster pharmaceutical corp or on a military contract on
"applied" science (meaning there is a definitely payback
expected), then you likely have money to burn. People
working in a academic or non-profit lab on "unsexy"/"pure"
science, likely don't.

Remember that site-licensing usually works on some kind of
"per seat" basis (even if you are lucky enough *not* to have
a "license server" that constantly tracks usage in order to
deny service if and when N+1 users try to use the system,
the fee the site fee is still based on the number
of expected users). The last science facility I worked at
was in considerable debt to a proprietary scientific
software producer and struggling to pay the bills. The
result was that they had fewer licenses than they wanted
and many people simply couldn't use the software when they
wanted.

I'm not sure what happened in the end, because I left for
unrelated reasons before all of that got sorted out, but
Python (with a suitable array of add-ons) was definitely on
the short-list of replacement software (and partly because I
was trying to sell people on it, of course).

In fact, if I had one complaint about Python, it was the
"with a suitable array of add-ons" caveat. The proprietary
alternative had all of that rolled into one package (abeit
it glopped into one massive and arcane namespace), whereas
there was no "Python Data Language" or whatever that
would include all that in one named package that everyone
could recognize (I suppose SciPy is trying to achieve that).

For similar reasons, Space Telescope Science Institute
decided to go full tilt into python development -- they
created "numarray" and "pyraf", and they are the ones paying
for the "chaco" development contract.

Which brings up another point -- whereas with proprietary
software (and stuff written using it, like the IDL astronomy
library) can leave you with an enormous investment in stuff
you can't use, free software development can often be just
as cheap, and you get to keep what you make.

At one point, I was seriously thinking about trying to write
some kind of translator to convert those IDL libs into
python libs (quixotic of me?).

So why rent when you can own?

Scientists certainly do understand all that bit about
"seeing further" because you're "standing on the shoulders
of giants". With proprietary software, the giants keep
getting shot out from under you, which tends to make things
a bit harder to keep up with.

Cheers,
Terry
 
A

Alex Martelli

Terry Hancock said:
In fact, if I had one complaint about Python, it was the
"with a suitable array of add-ons" caveat. The proprietary
alternative had all of that rolled into one package (abeit
it glopped into one massive and arcane namespace), whereas
there was no "Python Data Language" or whatever that
would include all that in one named package that everyone
could recognize (I suppose SciPy is trying to achieve that).

I believe the Enthought distribution of Python (for Windows, with a Mac
version planned) is trying to move exactly in that direction, by
packaging up everything and a half (while of course leaving a reasonable
assignment of namespaces from the pieces it's packaging!-). However,
maintaining such a distro, and making it available for a wider variety
of platforms, are heavy, continuing tasks -- unless either firms, such
as Enthought, or volunteers, commit to such tasks, they won't "just
happen".


Alex
 
M

Michael Tobis

There is a range of folks doing scientific programming. Not all of them
are described correctly by your summary, but many are. The article is
aimed not at them, but rather at institutions that develop engineered
Fortran models using multipuurpose teams and formal methods. I
appreciate your comments, because I see that there should be another
article aimed at desktop programmers.

One of the things python addresses best is the division of labor, where
the subtle concepts are available to those who need them and hidden
from those who don't need them. From what I understand of your work
(and what I have seen of the work of two other neuroscientists,
actually) Python would be a good choice for you.

That said, the level of computational skill in many scientists is
alarming. Why do we expect to spend six semesters learning mathematics
and expect to pick up computing "on the side"? It baffles me. Frankly,
saying "I don't need version control" sounds to me no less foolish than
saying "I don't need logarithms". (Perhaps you don't but someday soon
you will.)

"Speed of excecution is an issue, regardless of what computer science
folks try to tell you." strikes me as nothing short of hallucinatory.
No informed person says that speed is never an issue, and a great deal
of effort is spent on speed. Where do you suppose your Fortran
compiler came from in the first place?

For someone without legacy code to worry about, fussing with Fortran
for single-user one-off codes strikes me as a weak choice. If you are
hitting Matlab's performance or memory limits, you should take the time
to learn something about computation, not because you are idle, but
because you are busy. Or if you prefer, because your competitors will
be learning how to be more productive while you put all your efforts
into coping with crude tools.

The peculiar lack of communication between computer scientists and
application scientists is real; but I believe the fault is not all on
one side. The fact that you have a PhD does not prove that you know
everything you need to know, and I strongly recommend you reconsider
this attitude. For one thing, you misjudged which side of the divide I
started on.

Michael Tobis
(While I dislike credentialism on usenet, I will reply in kind. I hold
a Ph.D. in geophysical fluid dynamics.)
 
B

Brian Blais

sturlamolden said:
Typically a scientist need to:

1. do a lot of experiments

2. analyse the data from experiments

3. run a simulation now and then

unless you are a theorist! in that case, I would order this list backwards.

1. Time is money. Time is the only thing that a scientist cannot afford
to lose. Licensing fees for Matlab is not an issue. If we can spend
$1,000,000 on specialised equipment we can pay whatever Mathworks or
Lahey charges as well. However, time spent programming are an issue.
(As are time time spend learning a new language.)

Another person stated that they don't have infinite funds, as implied here. I would
add that, in addition to one's own research, professors must also teach and advise.
I find it very helpful to be able to say to a student, "go download this, and here is
the code I wrote for the work I do". The price is often an impediment for getting
students into research. Often there are site licenses, but they don't work off campus.

2. We don't need fancy GUIs. GUI coding is a waste of time we don't
have. We don't care if Python have fancy GUI frameworks or not.

again, for sharing ideas, GUIs are *necessary*. If you work with people who do less
programming than you, then you need to make an interface to your code that they can
understand. it doesn't have to be fancy, just functional.
3. We do need fancy data plotting and graphing. We do need fancy
plotting and graphing that are easy to use - as in Matlab or S-PLUS.

here, I've found python to be good, but not great. matplotlib (pylab) is a really
great thing, but is not as straightforward as plotting in Matlab. Either, you have a
window which locks the process until you close it, or you do interactive mode, but
the window results disappear if any other window is put on top (like your shell), and
has to be manually redrawn. This makes it far less convenient to deal with in
interactive mode.
4. Anything that has to do with website development or enterprise class
production quality control are crap that we don't care about.

I think it can be pitched as an alternative to shell-scripts, which is a nice economy
of concepts: the language you use for your scientific work, you can also use for your
OS work, and your tinkering.

7. "My simulation is running to slowly" is the number ONE complaint.
Speed of excecution is an issue, regardless of what computer science
folks try to tell you. That is why we spend disproportionate amount of
time learning to vectorize Matlab code.

here, I would plug Pyrex like crazy. to me the Python/Pyrex combination is the
biggest selling point for me to convert my scientific matlab code to Python.
Learning a new API is a drag, and I've found that SWIG is not particularly intuitive
(although convenient, if you have a lot of libraries already written). Pyrex seems
to get the best of all possible worlds: seamless use of python objects, and the
ability to do C-loops for speed, with no API. Making extensions this way is a real joy.

Now please go ahead and tell me how Python can help me become a better
scientist. And try to steer clear of the computer science buzzwords
that don't mean anyting to me.


I have been using Matlab for nearly 10 years. My claim to no-fame is the neural
network simulator Plasticity (http://web.bryant.edu/~bblais/plasticity) which has
taken me years to write. I have some complaints about Matlab, but it has been a
useful tool. Some of my complaints are as follows:

1) Cost. I find that the marketing model for Matlab is annoying. They
nickle-and-dime you, with the base package (educational pricing) at $500 per
machine/operating system/user and then between $200-$500 *per* "toolbox", which adds
up really quick. I can't even buy 1 license for a dual boot, to have Matlab run on
a Linux partition and a Windows partition.

The cost impacts my use of Matlab in the classroom, and with research students.

2) License Manager. The license manager for Matlab is the most inconvenient program
I have ever dealt with. It is incredibly sensitive to the license file, and it
nearly impossible to debug. This has made Matlab one of the hardest programs to
install, for me. The issue that impacts my productivity is the following: the
license key is tied to the network card, on eth0. Thus, if I upgrade my laptop, I
need to contact Mathworks for an updated license key. Also, occasionally, my
operating system decides to name my wireless card eth0, and my wired card eth1.
Nothing else is affected by this, but then I can't run Matlab!

3) Upgrade Version Hell. *Every* time Matlab has upgraded, my program has broken.
Usually something small, but still it is a real pain in the butt. Also, I have to
pay full price for the upgrade, or pay some fee continuously whether there is an
upgrade or not.

I have only been using Python for about 2 months, so I can't speak to some issues,
but what does Python offer me?

1) Free (as in free beer). I've elaborated on this above.

2) Free (as in free speech). I like the fact that I am not burdened by having my
projects tied to something proprietary.

3) Distribution ease. With py2exe, I can distribute on Windows systems which have no
python installed. That's a real plus!

4) Clean programming environment. For teaching, it is nice to use a language which
is so readable.

5) A huge number of built-in, or available, packages for nearly everything.

6) The ability to write portions of code in an optimized, as-fast-as-C, manner.

7) Relatively easy GUI frameworks

I'm sure there are other things, but that's the way I am thinking right now.
 
R

Robert Kern

sturlamolden said:
Michael Tobis skrev:

Being a scientist, I can tell you that your not getting it right. If
you speak computer science or business talk no scientist are going to
listen. Lets just see how you argue:

I see we've forgone the standard conventions of politeness and gone straight for
the unfounded assumptions. Fantastic.
At this point, no scientist will no longer understand what the heck you
are talking about. All have stopped reading and are busy doing
experiments in the laboratory instead. Perhaps it sound good to a CS
geek, but not to a busy researcher.

Typically a scientist need to:

1. do a lot of experiments

2. analyse the data from experiments

3. run a simulation now and then

Being a one-time scientist, I can tell you that you're not getting it right. You
have an extremely myopic view of what a scientist does. You seem to be under the
impression that all scientists do exactly what you do. When I was in the
geophysics program at Scripps Institution of Oceanography, almost no one was
doing experiments. The closest were the people who were building and deploying
instruments. In our department, typically a scientist would

1. Write grant proposals.

2. Advise and teach students.

3. Analyze the data from the last research cruise/satellite passover/earthquake.

4. Do some simulations.

5. Write a lot of code to do #3 and #4.

There are whole branches of science where the typical scientist usually spends a
lot of his time in #5. Michael Tobis is in one of those branches, and his
article was directed to his peers. As he clearly stated.

You are not from one of those branches, and you have different needs. That's
fine, but please don't call the kettle black.
Thus, we need something that is "easy to program" and "runs fast
enough" (and by fast enough we usually mean extremely fast). The tools
of choice seems to be Fortran for the older professors (you can't teach
old dogs new tricks) and MATLAB (perhaps combined with plain C) for the
younger ones (that would e.g. be yours truly). Hiring professional
programmers are usually futile, as they don't understand the problems
we are working with. They can't solve problems they don't understand.

I call shenanigans. Believe me, I would love it if it were true. I make my
living writing scientific software. For a company where half of us have science
degrees (myself included) and the other half have CS degrees, it would be great
advertising to say that none of those other companies could ever understand the
problems scientists face. But it's just not true.

Scientists are an important part of the process, certainly. They're called
"customers." Their needs drive the whole process. The depth and breadth of their
knowledge of the field and the particular problem are necessary to write good
scientific software. But it usually doesn't take particularly deep or broad
knowledge to write a specific piece of software. Once the scientist can reduce
the problem to a set of requirements, the CS guys are a perfect fit. That's what
a good professional programmer does: take requirements and produce software that
fulfills those requirements. They do the same thing regardless of the field. In
my company, everyone pulls their weight, even the person with the philosphy degree.

At that point, the CS skillset is perfectly suited to writing good scientific
software. Or at least, any given CS-degree person is no less likely to have the
appropriate skillset than a science-degree person. Frequently, they have a much
broader and deeper skillset that is actually useful to writing scientific
software. Most of the scientists I know couldn't write robust floating point
code to save his life. Or his career.
What you really ned to address is something very simple:

Why is Python better a better Matlab than Matlab?

The programs we need to write typically falls into one of three
categories:

1. simulations
2. data analysis
3. experiment control and data aquisition

(that are words that scientists do know)

In addition, there are 10 things you should know about scientific
programming:

1. Time is money. Time is the only thing that a scientist cannot afford
to lose. Licensing fees for Matlab is not an issue. If we can spend
$1,000,000 on specialised equipment we can pay whatever Mathworks or
Lahey charges as well. However, time spent programming are an issue.
(As are time time spend learning a new language.)

2. We don't need fancy GUIs. GUI coding is a waste of time we don't
have. We don't care if Python have fancy GUI frameworks or not.

Uh, time is money? Fighting unusable interfaces, GUI or otherwise, is a waste of
resources. My brother works in biostatistics at the NIH. Every once in a while,
the doctors he works for will ask him to do a particular analysis which requires
him to use a particularly unusable piece of software. Every time, he has to
spend half a day setting up the problem. This is why he's the one who gets to do
it instead of the doctors.

Now, he's considering rewriting the program in Python with a GUI that will
essentially provide a Big Red Go Button (TM) so the doctors can do the analysis
in a fraction of the time it takes now.
3. We do need fancy data plotting and graphing. We do need fancy
plotting and graphing that are easy to use - as in Matlab or S-PLUS.

4. Anything that has to do with website development or enterprise class
production quality control are crap that we don't care about.

There are quite a few scientists who are managing gigantic amounts of data, and
run experiments/observations/whate-have-you so large that they need
multi-institutional participation. Sharing that data in an efficient manner
*does* require good dynamic websites and enterprise class software backing it up.

There are more kinds of scientist in Heaven and Earth than are dreamt of in your
philosophy.
5. Versioning control? For each program there is only one developer and
a single or a handful users.

I used to think like that up until two seconds before I entered this gem:

$ rm `find . -name "*.pyc"`

Okay, I didn't type it exactly like that; I was missing one character. I'll let
you guess which.

This is one thing that a lot of people seem to get wrong: version control is not
a burden on software development. It is a great enabler of software development.
It helps you get your work done faster and easier even if you are a development
team of one. You can delete code (intentionally!) because it's not longer used
in your code, but you won't lose it. You can always look at your history and get
it again. You can make sweeping changes to your code, and if that experiment
fails, you can go back to what was working before. Now you can do this by making
copies of your code, but that's annoying, clumsy, and more effort than it's
worth. Version control makes the process easier and lets you do more interesting
things.

I would go so far as to say that version control enables the application of the
scientific method to software development. When you are in lab, do you say to
yourself, "Nah, I won't write anything in my lab notebook. If the experiment
works at the end of the day, only that result matters"?
6. The prototype is the final version. We are not making software for a
living, we are doing research.

I have lots of research code on my harddrive with decade-long changelogs that
give the lie to that statement. If the code is useful now, it will probably
still be useful in a few years. People will add to it, make suggestions, build
on your work.

This is how science is supposed to work. Practices which encourage this behavior
are good things for science.
7. "My simulation is running to slowly" is the number ONE complaint.
Speed of excecution is an issue, regardless of what computer science
folks try to tell you. That is why we spend disproportionate amount of
time learning to vectorize Matlab code.

8. "My simulation is running of of memory" is the number TWO complaint.
Matlab is notoriously known for leaking memory and fragmenting the
heap.

9. What are algorithms and data structures? Very few of us knows how to
use a datastructure more complicated than an array. That is why we like
Matlab and Fortran so much.

Yes, and this is why you will keep saying, "My simulation is running too
slowly," and "My simulation is running out of memory." All the vectorization you
do won't make a quadratic algorithm run in O(n log(n)) time. Knowing the right
algorithm and the right data structures to use will save you programming time
and execution time. Time is money, remember, and every hour you spend tweaking
Matlab code to get an extra 5% of speed is just so much grant money down the drain.

That said, we have an excellent array object far superior to Matlab's.

http://numeric.scipy.org/
10. We are novice programmers. We are not passionate programmers. We
take no pride in our work. The easier hack the better. We don't care if
we are doing OOP or not. However, we do hate complicated APIs or APIs
that look funny. We are used to seeing sin(x) in our calculus textbooks
and because of that we don't find Math.Sin(x) particularly elegant --
even though Math.Sin(x) is more OOP and sin(x) clutters the global
namespace.

Now please go ahead and tell me how Python can help me become a better
scientist. And try to steer clear of the computer science buzzwords
that don't mean anyting to me.

1. You will probably spend less time writing and running software.

2. If you play your cards right, more people will be able to use and improve
your software.

--
Robert Kern
(e-mail address removed)

"In the fields of hell where the grass grows high
Are the graves of dreams allowed to die."
-- Richard Harter
 
R

Robert Kern

Brian said:
here, I've found python to be good, but not great. matplotlib (pylab) is a really
great thing, but is not as straightforward as plotting in Matlab. Either, you have a
window which locks the process until you close it, or you do interactive mode, but
the window results disappear if any other window is put on top (like your shell), and
has to be manually redrawn. This makes it far less convenient to deal with in
interactive mode.

All I can say is that I've never seen anything like that behavior. Are you tried
using ipython in pylab mode?

--
Robert Kern
(e-mail address removed)

"In the fields of hell where the grass grows high
Are the graves of dreams allowed to die."
-- Richard Harter
 
B

Brian Blais

Robert said:
That said, we have an excellent array object far superior to Matlab's.

http://numeric.scipy.org/

I'd like to ask, being new to python, in which ways is this array object far superior
to Matlab's? (I'm not being sarcastic, I really would like to know!)

I've heard similar things about matplotlib, about how it surpasses Matlab's graphics.
I haven't personally experienced this, but I'd like to know in which ways it is.



bb
 
S

sturlamolden

Robert said:
1. Write grant proposals.

2. Advise and teach students.

Sorry I forgot the part about writing grant applications. As for
teaching students, I have thankfully not been bothered with that too
much.


Yes, and this is why you will keep saying, "My simulation is running too
slowly," and "My simulation is running out of memory." All the vectorization you
do won't make a quadratic algorithm run in O(n log(n)) time. Knowing the right
algorithm and the right data structures to use will save you programming time
and execution time. Time is money, remember, and every hour you spend tweaking
Matlab code to get an extra 5% of speed is just so much grant money down the drain.

Yes, and that is why I use C (that is ISO C99, not ANSI C98) instead of
Matlab for everything except trivial tasks. The design of Matlab's
language is fundamentally flawed. I once wrote a tutorial on how to
implement things like lists and trees in Matlab (using functional
programming, e.g. using functions to represent list nodes), but it's
just a toy. And as Matlab's run-time does reference counting insted of
proper garbage collection, any datastructure more complex than arrays
are sure to leak memory (I believe Python also suffered from this as
some point). Matlab is not useful for anything except plotting data
quickly. And as for the expensive license, I am not sure its worth it.
I have been considering a move to Scilab for some time, but it too
carries the burden of working with a flawed language.
 
M

Michael Tobis

$ rm `find . -name "*.pyc"`

Ouch. Is that a true story?

While we're remeniscing about bad typos and DEC, I should tell the
story about the guy who clobberred his work because his English wasn't
very strong.

Under RT-11, all file management was handled by a program called PIP.
For example to get a list of files in the current working directory you
would enter PIP *.* /LI . Well this fellow, from one of those countries
where our long "e" sound is their "i" sound, mournfully announced

"I vanted a deerectory so I typed 'PIP *.* /DE' "

That one is true.

mt
 
R

Robert Kern

Michael said:
Ouch. Is that a true story?

Yup. Fortunately, it was a small, purely personal project, so it was no huge
loss. It was enough for me to start using CVS on my small, purely personal
projects, though!

--
Robert Kern
(e-mail address removed)

"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
-- Umberto Eco
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,961
Messages
2,570,131
Members
46,689
Latest member
liammiller

Latest Threads

Top