Newbie needs to see a large project

J

John Fabiani

Hi everyone,
I have been checking Python recently and have presented what little I know
to mangement. They have asked if there is any large business apps (like an
accounting system) using python and what the performance is like. We would
be running on Linux with a large database (maybe postgres or mysql) of 3-5
Gbytes. We would be moving from a windows platform(foxpro apps) to Linux
(python). About 50 users total. Thanks in advance....
 
J

Jegenye 2001 Bt

John,

Check out www.zope.org
Zope + CMF + Plone + etc, etc. ... that's quite large..
Performance is ok. :)

Ok, you're really talking about databases. I seem to recall a monstrous
project where Python was used to process speeches in the Irish parliament..
... and Google uses Python heavily, too. They do have some big databases.
<wink>

Best,
Miklós
 
R

RobE

Hi everyone,
I have been checking Python recently and have presented what little I
know to mangement. They have asked if there is any large business apps
(like an accounting system) using python and what the performance is
like. We would be running on Linux with a large database (maybe
postgres or mysql) of 3-5 Gbytes. We would be moving from a windows
platform(foxpro apps) to Linux (python). About 50 users total. Thanks
in advance....

I'll say I'm no expert with databases, but I have played with a 5 GB
stock database on Postgres with python for my personal use. The comments
I would make are:

* Yes you can do it.
* Good thing (and bad thing) is that most of the speed issues will
probably be on the database side. You need to think carefully about how
you setup the tables and indexes in your database and you need enough
computing horsepower to handle the queries your users are going to do
throw at it. Complicated queries on large databases can be very time
consuming, but it all depends on number of records and indexing you use.
* On the python side--be aware that python data storage is not
particularly efficient in terms of memory. As an example, in my stock
database an easy way to loadup the daily data would have been to load
all 30000 symbols for the day at a time and then send it to the
database--if you define this using classes--it's huge, if you do it
as a sequence of tuples then it's much smaller--just keep this in
mind if your doing a lot of data--who you store it in python can matter.
You only need to consider this if you find your using a lot of memory and
need to track down why.
* Other thing I would suggest -- figure out where the real processing
needs to be done--if your just handling small amounts of data with
the python clients--then it should not be a problem. If it's large than
you'll have to experiment--python is still interpretivie on one hand--on
the other for large amounts of data your queries (on the database side)
may still be the limiting factor. Also keep in mind if you need to you
can extend python with C code if you have to get speed. There is also
"good" and "bad" python code--so be creative.
* Python can be used for large programs -- I've written a 10000 line
python program without difficulty. The other two common scripting
langauges out there are PHP and Perl. I don't know much about these, but
I heard people say you'd never write large Perl code like this. PHP I
don't know...people do use this for web/db work a lot. Incidently, the
python active content web solution is Zope... but I'm not a web guy so I
don't know anything about it.
* I don't know how much you've done with databases--but a reminder.
Have a backup plan. You can't just run tar over a running database and get
a restorable backup--you'll need to dump the database using whatever
procedure postgres has for this and then backup the static dump.
* Incidently: The reason I use python myself is that it has a large
application range--it can do everything from quick tools to full up
applications, to active web content. Pretty flexible.

Just some food for thought.

Rob
 
D

Dave Benjamin

Jegenye said:
Check out www.zope.org
Zope + CMF + Plone + etc, etc. ... that's quite large..
Performance is ok. :)

Depending on your definition of "ok"... Plone is nice, very tasteful,
aesthetically, but boy is it slow... plus, I had the hardest time
getting my template changes to take effect without restarting Zope. But
maybe that's all been fixed by now.

Dave
 
S

Steve Williams

John said:
Hi everyone,
I have been checking Python recently and have presented what little I know
to mangement. They have asked if there is any large business apps (like an
accounting system) using python and what the performance is like. We would
be running on Linux with a large database (maybe postgres or mysql) of 3-5
Gbytes. We would be moving from a windows platform(foxpro apps) to Linux
(python). About 50 users total. Thanks in advance....

I wrote a Data Warehouse Extract/Transform/Load utility for a client in
Python 1.5.2. On a nightly basis it processes 2-3 gigabytes of data
from about 100 legacy flatfiles into a DB2 ODS and then to the Data
Warehouse.

It's been running now without incident for over 2 years.

I'm just finishing an application for vocational schools (prospects,
students, enrollments, classes, counselors, agencies, ledger cards,
courses, modules, staff, attendance, locations, calendars, reporting,
etc, etc) using Python 2.2/2.3, wxPython and Firebird 1.0/kinterbasdb.
The users seem happy with it. It runs on Linux (Red Hat 8.0), Windows
95, Windows 98 and Windows XP.
 
B

Bruno Desthuilliers

RobE wrote:
(snip)
* Python can be used for large programs -- I've written a 10000 line
python program without difficulty.

Well... Even given the average Python/Any_low_level_language ratio, I
would not call 10 kloc a 'large' program.

Bruno
 
R

Rudy Schockaert

John said:
Hi everyone,
I have been checking Python recently and have presented what little I know
to mangement. They have asked if there is any large business apps (like an
accounting system) using python and what the performance is like. We would
be running on Linux with a large database (maybe postgres or mysql) of 3-5
Gbytes. We would be moving from a windows platform(foxpro apps) to Linux
(python). About 50 users total. Thanks in advance....

Quote from the www.python.org homepage:

"Python has been an important part of Google since the beginning, and
remains so as the system grows and evolves. Today dozens of Google
engineers use Python, and we're looking for more people with skills in
this language." said Peter Norvig, director of search quality at Google,
Inc."

Google may not be a large business app, it surely is a huge database,
and the performance of Google is excellent. It's a pity they don't
mention what they use Python for.
 
P

Peter Hansen

Rudy said:
Quote from the www.python.org homepage:

"Python has been an important part of Google since the beginning, and
remains so as the system grows and evolves. Today dozens of Google
engineers use Python, and we're looking for more people with skills in
this language." said Peter Norvig, director of search quality at Google,
Inc."

Google may not be a large business app, it surely is a huge database,
and the performance of Google is excellent. It's a pity they don't
mention what they use Python for.

This has been discussed in the past, I believe, so check the archives
for more. Basically (from memory only), Python is involved in the indexing
but *none* of the web stuff with which the user directly interacts. The
"excellent" performance comes apparently from nice C code, nothing more.

-Peter
 
G

Gerrit Holl

Google may not be a large business app, it surely is a huge database,
and the performance of Google is excellent. It's a pity they don't
mention what they use Python for.
</quote>

Google is so large, they probably use everything. I have heard they use Ruby.
They surely use C. They probably use Java too, and PHP. And Perl. And other
languages...

Gerrit.
 
N

Nowan

John said:
Hi everyone,
I have been checking Python recently and have presented what little I know
to mangement. They have asked if there is any large business apps (like an
accounting system) using python and what the performance is like. We would
be running on Linux with a large database (maybe postgres or mysql) of 3-5
Gbytes. We would be moving from a windows platform(foxpro apps) to Linux
(python). About 50 users total. Thanks in advance....

================================
Great question.

I'm in the same situation. I'm an enterprise database guy who is
looking at building database-backed transactional web apps. Stuff that
might otherwise be done using ASP or JSP/Servlets.

I need a database-driven web app that might support a few hundred users.

From the many thoughtful responses, I'm afraid that a lot of folks
haven't built large apps, and haven't built sophisticated database apps.

10,000 LOC is not even a medium size app.

In a database-backed web site, the database is definitely not the
bottleneck if you are using Oracle or another enterprise class product.
Unless your database design is very complex, it is the
performance/design of your web/middleware.

I'm building my little demo app. I'm loving Python. Python rocks. Way
cooler than Perl. Way cooler than PHP. But the question looms ahead...
When I get finished, will it be fast enough to be a practical
alternative to ASP or JSP/Servlets?

Will I have to rewrite all the hard stuff in C?

Any performance thoughts at all are appreciated. Especially from anyone
who has deployed a significant database driven web app.

TIA
 
D

Dave Brueck

================================
Great question.

I'm in the same situation. I'm an enterprise database guy who is
looking at building database-backed transactional web apps. Stuff that
might otherwise be done using ASP or JSP/Servlets.

I need a database-driven web app that might support a few hundred users.

From the many thoughtful responses, I'm afraid that a lot of folks
haven't built large apps, and haven't built sophisticated database apps.

I'd characterize a few hundred users as a small-to-barely-medium site,
depending largely on what exactly they do on that site. Would you agree?
10,000 LOC is not even a medium size app.

LOC is misleading, although it's often tough to come up with a better metric.
What you really need is a metric of application functionality. With 10k lines
of Python you can get a ton of functionality, often enough to fall into the
"medium-sized app" category.

Using LOC though, 10k LOC Python ~= 70k-90k of Java, which again puts the app
at least on the lower end of medium-sized apps - certainly into the
non-trivial category and beyond the limited set of functionality you'd
traditionally associate with10k LOC.

It would be much better if we could come up with a way to measure return on
investment, because Python and similar languages would measure up even better
IMO (as it would take into account lower up-front dev costs,
maintenance/bug-fixing costs, transfer-of-ownership costs, etc.).
In a database-backed web site, the database is definitely not the
bottleneck if you are using Oracle or another enterprise class product.
Unless your database design is very complex, it is the
performance/design of your web/middleware.

It really is tough to speak in general terms because the performance really
depends on what the web site _does_ as I'm sure you're aware. If it's just
serving up mostly static pages then, yeah, the database will probably be a
non-issue. A company I consult for has a Zope site with an Oracle backend
that does statistical collection and reporting and management of a network -
the DB machine is often pegged and the Zope machine is often close to idle -
the DB schema isn't all that complex but the DB is most definitely the main
bottleneck, and by a long shot.
I'm building my little demo app. I'm loving Python. Python rocks. Way
cooler than Perl. Way cooler than PHP. But the question looms ahead...
When I get finished, will it be fast enough to be a practical
alternative to ASP or JSP/Servlets?

Others may disagree, but I don't see very many compelling reasons to go with
ASP or JSP/servlets, especially if you expect the project to get large (with
ASP you'll start out ok, but suffer if the project gets large, with JSP
you'll suffer both now and then :) ), so I'd say Python is _always_ a
practical alternative to them.

PHP is a messy language, but can be pretty fast. I haven't used it enough to
say how it would do for a large website, but have heard some horror stories
about small feature addition jobs that turn into absolute death marches -
don't know if that is the norm or what.

Also, it doesn't hurt to have a good understanding of how much performance you
really truly _need_ - if you have a fast DB on the backend and a decent cache
in front, and if your site isn't in one of the top few percent of large sites
out there, and if you app is closer to a traditional site than a web-enabled
application, then sufficient performance may be less of an issue than, say,
time to deployment or cost of new features/maintenance.

One last thought that has probably already crossed your mind: don't
underestimate the blessing of living under the rule of Moore's Law! The
performance of one Python server component in our network was in fact CPU
bound (with network in 2nd place). I estimated that performance was still
about 30-40% above what we actually needed at the time (for one thing, we had
to put in two boxes just for redundancy purposes anyway), but the optimizer
in me wanted to boost performance anyway. Well, that was a year ago - I never
got around to it because of other things that actually did need work. Those
computers have been moved to the testing rack and in their places are boxes
with double the CPU power. Now I don't know if I'll ever get around to
optimizing that code because the reason to do so is much less compelling (and
therefore less interesting) now. What _is_ interesting though is that that
component went from design to production in about a month and has been in
production ever since - so the company got a lot of benefit by quick delivery
and the performance problems have yet to materialize. Obviously this won't be
true in all cases, but in this one that's how it worked out.

Just my two cents,
Dave
 
D

Dave Brueck

================================
Great question.

I'm in the same situation. I'm an enterprise database guy who is
looking at building database-backed transactional web apps. Stuff that
might otherwise be done using ASP or JSP/Servlets.

I need a database-driven web app that might support a few hundred users.

From the many thoughtful responses, I'm afraid that a lot of folks
haven't built large apps, and haven't built sophisticated database apps.

I'd characterize a few hundred users as a small-to-barely-medium site,
depending largely on what exactly they do on that site. Would you agree?
10,000 LOC is not even a medium size app.

LOC is misleading, although it's often tough to come up with a better metric.
What you really need is a metric of application functionality. With 10k lines
of Python you can get a ton of functionality, often enough to fall into the
"medium-sized app" category.

Using LOC though, 10k LOC Python ~= 70k-90k of Java, which again puts the app
at least on the lower end of medium-sized apps - certainly into the
non-trivial category and beyond the limited set of functionality you'd
traditionally associate with10k LOC.

It would be much better if we could come up with a way to measure return on
investment, because Python and similar languages would measure up even better
IMO (as it would take into account lower up-front dev costs,
maintenance/bug-fixing costs, transfer-of-ownership costs, etc.).
In a database-backed web site, the database is definitely not the
bottleneck if you are using Oracle or another enterprise class product.
Unless your database design is very complex, it is the
performance/design of your web/middleware.

It really is tough to speak in general terms because the performance really
depends on what the web site _does_ as I'm sure you're aware. If it's just
serving up mostly static pages then, yeah, the database will probably be a
non-issue. A company I consult for has a Zope site with an Oracle backend
that does statistical collection and reporting and management of a network -
the DB machine is often pegged and the Zope machine is often close to idle -
the DB schema isn't all that complex but the DB is most definitely the main
bottleneck, and by a long shot.
I'm building my little demo app. I'm loving Python. Python rocks. Way
cooler than Perl. Way cooler than PHP. But the question looms ahead...
When I get finished, will it be fast enough to be a practical
alternative to ASP or JSP/Servlets?

Others may disagree, but I don't see very many compelling reasons to go with
ASP or JSP/servlets, especially if you expect the project to get large (with
ASP you'll start out ok, but suffer if the project gets large, with JSP
you'll suffer both now and then :) ), so I'd say Python is _always_ a
practical alternative to them.

PHP is a messy language, but can be pretty fast. I haven't used it enough to
say how it would do for a large website, but have heard some horror stories
about small feature addition jobs that turn into absolute death marches -
don't know if that is the norm or what.

Also, it doesn't hurt to have a good understanding of how much performance you
really truly _need_ - if you have a fast DB on the backend and a decent cache
in front, and if your site isn't in one of the top few percent of large sites
out there, and if you app is closer to a traditional site than a web-enabled
application, then sufficient performance may be less of an issue than, say,
time to deployment or cost of new features/maintenance.

One last thought that has probably already crossed your mind: don't
underestimate the blessing of living under the rule of Moore's Law! The
performance of one Python server component in our network was in fact CPU
bound (with network in 2nd place). I estimated that performance was still
about 30-40% above what we actually needed at the time (for one thing, we had
to put in two boxes just for redundancy purposes anyway), but the optimizer
in me wanted to boost performance anyway. Well, that was a year ago - I never
got around to it because of other things that actually did need work. Those
computers have been moved to the testing rack and in their places are boxes
with double the CPU power. Now I don't know if I'll ever get around to
optimizing that code because the reason to do so is much less compelling (and
therefore less interesting) now. What _is_ interesting though is that that
component went from design to production in about a month and has been in
production ever since - so the company got a lot of benefit by quick delivery
and the performance problems have yet to materialize. Obviously this won't be
true in all cases, but in this one that's how it worked out.

Just my two cents,
Dave
 
A

Alan Kennedy

My feeling is that if your management team are asking about
performance, they've got their focus wrong. When I'm involved with
product development (which is usually on the quality assurance and
testing side these days), my primary focus is correctness of
operation, not speed. What use is a fast accounting system if it comes
up with the wrong numbers?

Also, with accounting regulations changing, for example, you need a
system that is flexible and easy to adapt to those new regs. Even
months or years later when you return to code that you haven't seen
for a while, and have to get your head around what it is doing, and be
able to make changes quickly (i.e. at low cost). Python, IMHO, is the
best possible language for this kind of readability and
maintainability.

Also, your management team have to understand that CPU cycles are
vastly cheaper than programmer cycles. Why spend ten times the
programming effort just to ensure that the system will run on a 1GHz
PC? Just spend the money on a 3GHz PC instead, and save thousands of
currency.units in programmer costs.

And that's not counting the cost of defects introduced into a system
because it's written in a language that requires 5 times more lines
than an equivalent python program. The number of lines of code is
directly proportional to the number of defects in the code. Each
defect comes at a cost, which too many companies fail to recognise and
fail to factor into their development costs. (Instead they scratch
their heads and wonder why their development projects keep going
over-time and over-budget: then they start spending lots of money on
management consultants :).

And as for the projected number of users (50), I'm hard pressed to
think of many business data-processing applications where a single CPU
running a python app would not be up to the task. And I'd be very
surprised if the performance, in a well designed python system, would
not match or exceed that of a FoxPro system. In a RDBMS-centric
system, the bottleneck is far more likely to the RDBMS server anyway.
A day or two spent optimising a relational database (e.g.
normalisation, indexing, precompiling stored procs, etc) can often
achieve far greater performance improvements than spending that same
time optimising other code.

I highly recommend a read of "The Software Project Survival Guide", by
Steve McConnell.

http://www.construx.com/survivalguide/

And it's probably worth buying some copies for your management team as
well.

And if your management team needs some reassurance from someone with
experience of python, I'm sure there are python consultants in your
area who would be willing to provide decision support services.

HTH,
 
J

Jegenye 2001 Bt

Ok, let's omit Plone. ;)
That's the part I didn't get into... but looks nice indeed. <wink>

Miklós
 
N

nnes

John Fabiani said:
Hi everyone,
I have been checking Python recently and have presented what little I know
to mangement. They have asked if there is any large business apps (like an
accounting system) using python and what the performance is like. We would
be running on Linux with a large database (maybe postgres or mysql) of 3-5
Gbytes. We would be moving from a windows platform(foxpro apps) to Linux
(python). About 50 users total. Thanks in advance....

http://www.gnuenterprise.org

Maybe?
 
C

Cameron Laird

.
.
.
I'm in the same situation. I'm an enterprise database guy who is
looking at building database-backed transactional web apps. Stuff that
might otherwise be done using ASP or JSP/Servlets.

I need a database-driven web app that might support a few hundred users.

From the many thoughtful responses, I'm afraid that a lot of folks
haven't built large apps, and haven't built sophisticated database apps.

10,000 LOC is not even a medium size app.

In a database-backed web site, the database is definitely not the
bottleneck if you are using Oracle or another enterprise class product.
Unless your database design is very complex, it is the
performance/design of your web/middleware.

I'm building my little demo app. I'm loving Python. Python rocks. Way
cooler than Perl. Way cooler than PHP. But the question looms ahead...
When I get finished, will it be fast enough to be a practical
alternative to ASP or JSP/Servlets?
.
.
.
I was with you until the last question. My experience
is that JSP and even ASP provide only weak competition
for Pythonic Web frameworks, in terms of performance.
JSP in particular ... well, I'll just say that JSP isn't
*my* standard for performance.

I get your point that transactional systems can be very
serious and large, and require care. I do NOT agree
that middleware is always the bottleneck. I know of
several mission-critical systems throttled by raw data
retrieval. One in particular, a point-of-sale server,
is Oracle-backed. Maybe you have in mind some implicit
point about database retrieval being manageable in that
the system could, in principle, acquire bigger and
faster data-store hardware. I have complete confidence
that all the decision-makers on the projects I have in
mind would answer uniformly: "no way". Along with
other headaches, no one wants to go through Oracle
relicensing.
 
J

John J. Lee

Alan Kennedy said:
My feeling is that if your management team are asking about
performance, they've got their focus wrong. When I'm involved with
product development (which is usually on the quality assurance and
testing side these days), my primary focus is correctness of
operation, not speed. What use is a fast accounting system if it comes
up with the wrong numbers?
[...]

Well, since this is not a mass-market application, I'd guess they're
not thinking about hardware costs (at least, not if they've any sense
-- always a dangerous assumption), but rather about the risk of
getting 'so far and no further', because the application takes longer
than it needs to, even on the fastest hardware. Or, slightly more
realistically, the risk that you'll end up having to port to another
language.

Of course, we all know that this is rarely a problem, because there
are very few applications where rewriting more than a tiny fraction of
the code is necessary. The time savings of using Python simply
outweigh (by a big margin) the costs of having to rewrite a few bits.

Unfortunately, people (especially management) have caught the meme
that says "using multiple languages is bad". This has a very sensible
core, of course: a tower of babel of languages is very bad for code
maintenance. But it's an absurd misapplication of that idea to
compare, say, Python and C with Java and conclude that Java is better
because it's one language instead of two!

Hmm, I'm getting bored of writing these same things over and over.
Why don't business users understand these arguments, or believe
people's experiences? Risk-aversion doesn't really explain it, for
me: *all kinds* of software projects fail, including those that make
use of the most popular software and standards. The fact that
something is an extremely popular rather than just a popular way of
doing something is not a great way of estimating risk. The only
argument that makes much sense to me for not using Python in these
kinds of applications is (as discussed in a recent thread) that
something like Java has big commercial backing, unlike Python, so
library support is better. Jython solves that (if you believe it's a
general problem, which I don't -- it's a problem in particular cases),
but perhaps it does so at the cost of sacrificing the relatively
cheap, big speed improvements you can get from small C extensions? I
wonder if it would be possible to implement the Python C API with JNI
and the Jython API??


John
 
J

John J. Lee

[...useful discussion of frameworks, DBMSes, etc...]

Hmm, probably much more relevant and useful than my general rant about
Java vs. Python/C...


John
 
A

Alan Gauld

On Tue, 07 Oct 2003 19:13:38 +0100, Alan Kennedy

[Lots of valid points about maintainability snipped]
Also, your management team have to understand that CPU cycles are
vastly cheaper than programmer cycles.

BUt here I disagree, it depends entirely on the deployment scale.
If an app requires a PXC upgrade to, say 5000 usrs thats easily
$2.5 million US, which buys a lot of programmer cycles. Most of
the projects I work on tend to require dedicated server resources
and with a Sun F15K coming in at around $1million the hardware
and network links nearly always outweigh the programmer cost.

But on small scale projects utilising cheap PC based servers
( inc Linux etc) then of course the programmer costs start to
come to the fore, but its not always the case.
PC? Just spend the money on a 3GHz PC instead, and save thousands of
currency.units in programmer costs.

Yes, but if it needs a 90 x 1GHz Sun box... Or even a blade
computing solution with 50-100 blade servers...

[ more good stuff on maintenance issues]

And as for the projected number of users (50), I'm hard pressed to
think of many business data-processing applications where a single CPU
running a python app would not be up to the task.

I'm inclined to agree except we don't know how much background
batch work is involved. Some of our apps have few users but they
simply set up the data which then gets crunched for 30-40 hours
of solid CPU crunching...
surprised if the performance, in a well designed python system, would
not match or exceed that of a FoxPro system.

And here we really get to it. Yes, I agree, in this particular
case, if it runs OK on Foxpro then Python shouldn't be an issue.
A day or two spent optimising a relational database (e.g.
normalisation, indexing, precompiling stored procs, etc) can often
achieve far greater performance improvements than spending that same
time optimising other code.

Here too I heartily concur. The two biggest performance issues in
any distributed system will likely be the RDBMS and the network
connectivity.

Alan G.
Author of the Learn to Program website
http://www.freenetpages.co.uk/hp/alan.gauld
 
A

Alan Gauld

Unfortunately, people (especially management) have caught the meme
that says "using multiple languages is bad". This has a very sensible
core, of course: a tower of babel of languages is very bad for code
maintenance.

I have to disagree on this one, I've never heard this particular
meme before.

But every project I've ever worked on used multiple languages
- C/C++, VB, SQL, Awk, Perl, ksh
all on a single project, and that's pretty standard.

On mainframes it would read:
- COBOL, SQL, JCL, REXX, FOCUS

I'd say the average was about 4 languages per project, with the
maximum being 12 (which really was getting to be too many to be
honest!)

But a single language approach doesn't make sense on any
significant project, it just takes too long to write all the
tools, test harnesses etc in low level languages.
compare, say, Python and C with Java and conclude that Java is better
because it's one language instead of two!
Absolutely.

Why don't business users understand these arguments, or believe
people's experiences?

Because they believe high powered consultancies, like Gartner,
Forester etc... rather than their own people!

And because they live in hpe of finding the silver bullet that
will cut their IT spend...

Alan G
Author of the Learn to Program website
http://www.freenetpages.co.uk/hp/alan.gauld
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,166
Messages
2,570,903
Members
47,444
Latest member
Michaeltoyler01

Latest Threads

Top