From Perl to Python: restructuring a HPC workflow

N

neurino

In the need for restructuring our daily workflow, i think it might be a
good idea to ask the Python community and hopefully initiate a thread
about pros and cons.

We are a small group of people (approx. 10), working separetely on
their own projects (each employee manages approx. 2-3 projects). We
deal with high loads of data everyday.

While the processing is accomplished with fortran and C programs mainly
on three systems (one cluster, two standalone IBM HPCs, 8852 and p770,
all managed by a grid-Engine), networking, pre/postprocessing, jobs
queue administration and numerical analysis have been accomplished with
Perl.

This workflow has been flawless now for at least 15 years. New
generations of employees have been given Perl scripts and they
developed the tools further.

If i think at the actual situation of Perl, i can't see a shiny time
ahead. Perl 6 is far to be a reliable solution, the CPAN archive is
slowing down. My idea is to persuade my colleagues to move toward
Python-based solutions. But our concerns are that, in 3-4 years from
now, the tools we are going to develop must be still scalable,
mantainable, portable and of high-performance.

We don't have any solid in-house know-how on Python. We just have to
start everything from scracth. Where do you see advantages and
drawbacks in switching from Perl to Python, given the work picture
above?

Thanks in advance for any opinions you might have.
 
R

rusi

In the need for restructuring our daily workflow, i think it might be a
good idea to ask the Python community and hopefully initiate a thread
about pros and cons.

We are a small group of people (approx. 10), working separetely on
their own projects (each employee manages approx. 2-3 projects). We
deal with high loads of data everyday.

While the processing is accomplished with fortran and C programs mainly
on three systems (one cluster, two standalone IBM HPCs,  8852 and p770,
all managed by a grid-Engine), networking, pre/postprocessing, jobs
queue administration and numerical analysis have been accomplished with
Perl.

This workflow has been flawless now for at least 15 years. New
generations of employees have been given Perl scripts and they
developed the tools further.

If i think at the actual situation of Perl, i can't see a shiny time
ahead. Perl 6 is far to be a reliable solution, the CPAN archive is
slowing down. My idea is to persuade my colleagues to move toward
Python-based solutions. But our concerns are that, in 3-4 years from
now, the tools we are going to develop must be still scalable,
mantainable, portable and of high-performance.

We don't have any solid in-house know-how on Python. We just have to
start everything from scracth. Where do you see advantages and
drawbacks in switching from Perl to Python, given the work picture
above?

Thanks in advance for any opinions you might have.

Switching is always a con; see http://www.joelonsoftware.com/articles/fog0000000069.html
Assuming you have that under your belt
- if python is the way to go, asking on the scipy/numpy and ipython
lists may give you more specific answers.
- And if the 'rewrite-bug' has really got you, remember that if perl
is old, C/Fortran are older.
There are options today for rewriting the whole system, such as
haskell and julia http://julialang.org/

WARNING: If the Spolsky warning above for perl->python is X units,
take it 2X for Haskell and 4X for Julia!
 
C

Chris Angelico

We are a small group of people (approx. 10), working separetely on their own
projects (each employee manages approx. 2-3 projects). We deal with high
loads of data everyday.

This workflow has been flawless now for at least 15 years. New generations
of employees have been given Perl scripts and they developed the tools
further.

I would recommend making sure the tools can all interoperate
regardless of language, and then you can change any one at any time.
Chances are that's already the case - working with stdin/stdout is one
of the easiest ways to do that, for instance. With a structure that
lets anyone use any language, you can then switch some of your things
to Python, and demonstrate the readability advantages (which would you
rather code in, pseudocode or line noise?). Make the switch as smooth
as possible, and people will take it when it feels right.

ChrisA
 
R

rusi

I would recommend making sure the tools can all interoperate
regardless of language, and then you can change any one at any time.
Chances are that's already the case - working with stdin/stdout is one
of the easiest ways to do that, for instance. With a structure that
lets anyone use any language, you can then switch some of your things
to Python, and demonstrate the readability advantages (which would you
rather code in, pseudocode or line noise?). Make the switch as smooth
as possible, and people will take it when it feels right.

ChrisA

What Chris says is fine in the technical sphere.
It seems to me however that your problems are as much human as
technical -- convincing entrenched old fogeys to change.
No I dont have any cooked answers for that… You just need to keep your
eyes and ears open to see where you want a smooth painless transition
and when you want to 'do it with a bang.'

If you look at some of the stuff here
http://blog.explainmydata.com/2012/07/expensive-lessons-in-python-performance.html
you may find that these packages do much of what you want (And add
matplotlib to the set)
This may add some pizzazz to your case.

Warning: In my experience, this can often backfire!
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,968
Messages
2,570,150
Members
46,696
Latest member
BarbraOLog

Latest Threads

Top