Why do Pythoneers reinvent the wheel?

S

Stefano Masini

What is pythonutils ?
=====================
ConfigObj - simple config file handling
validate - validation and type conversion system
listquote - string to list conversion
StandOut - simple logging and output control object
pathutils - for working with paths and files
cgiutils - cgi helpers
urlpath - functions for handling URLs
odict - Ordered Dictionary Class

Fuzzyman, your post reminded me of something I can't stop thinking
about. Please don't take this as a critique on your work. I place
myself on the same side of yours.
I just wanted to share this thought with everybody had an opinion about it.

I wonder how many people (including myself) have implemented their own
versions of such modules, at least once in their pythonic life. I
indeed have my own odict (even same name! :). My own pathutils
(different name, but same stuff). My own validate... and so forth.

This is just too bad.
There are a few ares where everybody seems to be implementing their
own stuff over and over: logging, file handling, ordered dictionaries,
data serialization, and maybe a few more.
I don't know what's the ultimate problem, but I think there are 3 main reasons:
1) poor communication inside the community (mhm... arguable)
2) lack of a rich standard library (I heard this more than once)
3) python is such an easy language that the "I'll do it myself" evil
side lying hidden inside each one of us comes up a little too often,
and prevents from spending more time on research of what's available.

It seems to me that this tendency is hurting python, and I wonder if
there is something that could be done about it. I once followed a
discussion about placing one of the available third party modules for
file handling inside the standard library. I can't remember its name
right now, but the discussion quickly became hot with considerations
about the module not being "right" enough to fit the standard library.
The points were right, but in some sense it's a pity because by being
in the stdlib it could have had a lot more visibility and maybe people
would have stopped writing their own, and would have begun using it.
Then maybe, if it was not perfect, people would have begun improving
it, and by now we would have a solid feature available to everybody.

mhm... could it be a good idea to have two versions of the stdlib? One
stable, and one testing, where stuff could be thrown in without being
too picky, in order to let the community decide and improve?

Again, Fuzzyman, your post was just the excuse to get me started. I
understand and respect your work, also because you put the remarkable
effort to make it publicly available.

That's my two cents,
stefano
 
M

Michael Amrhein

Stefano said:
Fuzzyman, your post reminded me of something I can't stop thinking
about. Please don't take this as a critique on your work. I place
myself on the same side of yours.
I just wanted to share this thought with everybody had an opinion about it.

I wonder how many people (including myself) have implemented their own
versions of such modules, at least once in their pythonic life. I
indeed have my own odict (even same name! :). My own pathutils
(different name, but same stuff). My own validate... and so forth.

This is just too bad.
There are a few ares where everybody seems to be implementing their
own stuff over and over: logging, file handling, ordered dictionaries,
data serialization, and maybe a few more.
I don't know what's the ultimate problem, but I think there are 3 main reasons:
1) poor communication inside the community (mhm... arguable)
2) lack of a rich standard library (I heard this more than once)
3) python is such an easy language that the "I'll do it myself" evil
side lying hidden inside each one of us comes up a little too often,
and prevents from spending more time on research of what's available.

It seems to me that this tendency is hurting python, and I wonder if
there is something that could be done about it. I once followed a
discussion about placing one of the available third party modules for
file handling inside the standard library. I can't remember its name
right now, but the discussion quickly became hot with considerations
about the module not being "right" enough to fit the standard library.
The points were right, but in some sense it's a pity because by being
in the stdlib it could have had a lot more visibility and maybe people
would have stopped writing their own, and would have begun using it.
Then maybe, if it was not perfect, people would have begun improving
it, and by now we would have a solid feature available to everybody.

mhm... could it be a good idea to have two versions of the stdlib? One
stable, and one testing, where stuff could be thrown in without being
too picky, in order to let the community decide and improve?

Again, Fuzzyman, your post was just the excuse to get me started. I
understand and respect your work, also because you put the remarkable
effort to make it publicly available.

That's my two cents,
stefano

Did you take a look at pyPI (http://www.python.org/pypi) ?
At least you'd find another odict ...
;-) Michael
 
D

djw

Stefano said:
I don't know what's the ultimate problem, but I think there are 3 main reasons:
1) poor communication inside the community (mhm... arguable)
2) lack of a rich standard library (I heard this more than once)
3) python is such an easy language that the "I'll do it myself" evil
side lying hidden inside each one of us comes up a little too often,
and prevents from spending more time on research of what's available.

I think, for me, this most important reason is that the stdlib version
of a module doesn't always completely fill the requirements of the
project being worked on. That's certainly why I wrote my own, much
simpler, logging module. In this case, its obvious that the original
author of the stdlib logging module had different ideas about how
straightforward and simple a logging module should be. To me, this just
demonstrates how difficult it is to write good library code - it has to
try and be everything to everybody without becoming overly general,
abstract, or bloated.

-Don
 
S

Stefano Masini

Did you take a look at pyPI (http://www.python.org/pypi) ?
At least you'd find another odict ...

Oh, yeah. And another filesystem abstraction layer... and another xml
serialization methodology... :)
PyPI is actually pretty cool. If I had to vote for something going
into a "testing" stdlib, I'd vote for PyPI.

You see, that's my point, we have too many! :)

stefano
 
S

Stefano Masini

I think, for me, this most important reason is that the stdlib version
of a module doesn't always completely fill the requirements of the
project being worked on. That's certainly why I wrote my own, much
simpler, logging module. In this case, its obvious that the original
author of the stdlib logging module had different ideas about how
straightforward and simple a logging module should be. To me, this just
demonstrates how difficult it is to write good library code - it has to
try and be everything to everybody without becoming overly general,
abstract, or bloated.

That's very true. But...
....there are languages (ahem... did I hear somebody say Java? :) that
make it so hard to write code, that one usually prefers using whatever
is already available even if this means adopting a "style" that
doesn't quite match his expectations.
To me, it is not clear which is best: a very comfortable programmer
with a longer todo list, or an unfomfortable programmer with a short
todo list.
So far, I've always struggled to be in the first category, but I'm
amazed when I look back and see how many wheels I reinvented. But
maybe it's just lack of wisdom. :)

stefano
 
T

Tim Daneliuk

Stefano Masini wrote:

I wonder how many people (including myself) have implemented their own
versions of such modules, at least once in their pythonic life. I
indeed have my own odict (even same name! :). My own pathutils
(different name, but same stuff). My own validate... and so forth.

As someone who implemented their own configuration mini-language
with validation, blah, blah, blah (http://www.tundraware.com/Software/tconfpy/)
I can give you a number of reasons - all valid for different people at
different times:

1) The existing tool is inadequate for the task at hand and OO subclassing
is overrated/overhyped to fix this problem. Even when you override
base classes with your own stuff, you're still stuck with the larger
*architecture* of the original design. You really can't subclass
your way out of that, hence new tools to do old things spring into
being.

2) It's a learning exercise.

3) You don't trust the quality of the code for existing modules.
(Not that *I* have this problem :p but some people might.)
 
S

Stefano Masini

As someone who implemented their own configuration mini-language
with validation, blah, blah, blah (http://www.tundraware.com/Software/tconfpy/)

Well, a configuration mini language with validation and blahs is not
exactly what I would call _simple_... :) so maybe it doesn't even fit
into my idea of testing-stdlib, or "quick and dirty" section of the
manual (see my other post).
But certainly it would be worth mentioning in the list of available
solutions under the subsection "Configuration files handling".
1) The existing tool is inadequate for the task at hand and OO subclassing
is overrated/overhyped to fix this problem. Even when you override
base classes with your own stuff, you're still stuck with the larger
*architecture* of the original design. You really can't subclass
your way out of that, hence new tools to do old things spring into
being.

That's true, but usually only when the original design if too simple
comparing to the complexity of the problem. Instead a very general
solution can usually be subclassed to easily handle a simpler problem.
You still have to actually understand the general and complex design
in order to be able to write subclasses, so maybe one can be tempted
to punt on it, and write its own simple solution. But in this case it
would just be enough to propose a few solutions in the testing-stdlib:
a) one simple implementation for simple problems, easy to understand,
but limited.
b) one complex implementation for complex problems,
c) one simplified implementation for simple problems, easy to
understand, but subclassed from a complex model, that leaves room for
more understanding and extension just in case one needs more power.

I fully understand the difficulty of reusing code, as it always forces
you to a learning curve and coming to compromises. But I've also
wasted a lot of time reinventing the wheel and later found stuff I
could have happily lived with if I only had known.
2) It's a learning exercise.

Well, so we might as well learn a little more and rewrite os.path, the
time module and pickle. Right? :)
3) You don't trust the quality of the code for existing modules.
(Not that *I* have this problem :p but some people might.)

That's a good point, but it really boils down to being a wise
programmer on one side, being able to discern the Good from the Bad,
and an active community on the other side, able to provide good
solutions and improve them.
If either one is missing, then a lot of bad stuff can happen, and we
can't really take community decisions basing on the assumption that
programmers won't be able to understand, or that the community won't
be able to provide. So we might as well assume that we have good
programmers and an active community.
Which I think is true, by the way!
So, let's talk about a way to more effectively present available
solutions to our good programmers! :)

cheers,
stefano
 
T

Tim Daneliuk

Stefano said:
Well, a configuration mini language with validation and blahs is not
exactly what I would call _simple_... :) so maybe it doesn't even fit

It's actually not *that* complicated. Then again, the code is not
as elegant as is might be.
That's true, but usually only when the original design if too simple
comparing to the complexity of the problem. Instead a very general
solution can usually be subclassed to easily handle a simpler problem.
You still have to actually understand the general and complex design
in order to be able to write subclasses, so maybe one can be tempted
to punt on it, and write its own simple solution. But in this case it

The problem is that for a lot of interesting problems, you don't know
the "generic" big-picture stuff until you've hacked around at small
specific examples. This is one of the deepest flaws in the gestalt of
OO, IMHO. Good OO requires just what you suggest - and understanding of
generics, specific applications, and just what to factor. But in the
early going of new problems, you simply don't know enough. For the
record, I think Python is magnificent both in allowing you to work
quickly in the "poking around" stage of things, and then later to create
the more elegant fully-featured architectures.

One other point here: In the commericial world, especially, software
tends to be a direct reflection of the organization's *processes*.
Commercial institutions distinguish themselves from one another (in an
attempt to create competitive advantage) by customizing and tuning these
business processes - well, the successful companies do, anyway. For
example, Wal-Mart is really a supply chain management company, not a
consumer goods retailer. It is their supply chain expertise and IT
systems that have knocked their competitors well into 2nd place. And
here's the important point: These distinguishing business processes are
unique and proprietary *by intent*. This means that generic software
frameworks are unlikely to serve them well as written. I realize this is
all at a level of complexity above what you had in mind, but it's easy
to forget that a significant portion of the world likes/needs/benefits
from things that are *not* particularly generic. This is thus reflected
in the software they write.

Well, so we might as well learn a little more and rewrite os.path, the
time module and pickle. Right? :)

I'm not deeply committed to that level of education at the moment :p
So, let's talk about a way to more effectively present available
solutions to our good programmers! :)

Grappa?
 
K

Kay Schluehr

Tim said:
1) The existing tool is inadequate for the task at hand and OO subclassing
is overrated/overhyped to fix this problem. Even when you override
base classes with your own stuff, you're still stuck with the larger
*architecture* of the original design. You really can't subclass
your way out of that, hence new tools to do old things spring into
being.

Allthough I do think that you are completely wrong in principle there
is some true point in your statement: refactoring a foreign ill
designed tool that nevertheless provides some nice functionality but is
not mentioned for being extendable by 3-rd party developers is often
harder than writing a nice and even though inextendable tool on your
own. That's independent of the language allthough I tend to think that
C and Python programmers are more alike in their crude pragmatism than
Java or Haskell programmers ( some might object that it is a bit unfair
to equate Java and Haskell programmers, because no one ever claimed
that the latter need code-generators and no intelligence to do their
work ).

Kay
 
S

Stefano Masini

frameworks are unlikely to serve them well as written. I realize this is
all at a level of complexity above what you had in mind, but it's easy
to forget that a significant portion of the world likes/needs/benefits
from things that are *not* particularly generic. This is thus reflected
in the software they write.

In my opinion this has got more to deal with the open source vs.
proprietary debate, that I wouldn't like to talk about, since it's
somewhat marginal.

What I was pointing out is well summarized in the subject: Why do
Pythoneers reinvent the wheel?
Reinventing the wheel (too much) is Bad for both the open source
community and industry. It's bad for development in general. I got the
feeling that in the specific case of Python the ultimate reason for
this tendency in also the same reason why this language is so much
better that others for, say, fast prototyping and exploration of new
ideas: it's simple.

So, without taking anything out of python, I'm wondering if a richer
and less formal alternative standard library would help forming a
common grounds where programmers could start from in order to build
better and reinvent less.

If such an aid to _general_ problem solving is indeed missing (I might
be wrong) from the current state of python, I don't really think the
reason is related to industry. I would look for reasons elsewhere,
like it beeing difficult to come out with effective decisional support
in an open source community, or something like this. I can certainly
see the challenge of who and how should decide what goes in the
library, and what not.

stefano
 
T

Tim Daneliuk

Kay said:
Tim Daneliuk wrote:




Allthough I do think that you are completely wrong in principle there
is some true point in your statement: refactoring a foreign ill
> designed tool that nevertheless provides some nice functionality but is
> not mentioned for being extendable by 3-rd party developers is often
> harder than writing a nice and even though inextendable tool on your own

It has nothing to do with being "ill designed", though that too would
pose a (different) problem. It has to do with the fact that all
realworld tools are a tradeoff between pragmatism and generic elegance.
This tradeoff yields a tool/module/library/program with some POV about
what problem it was solving. If the problem you wish to solve is not in
that same space, you can inherit, subclass and do all the usual OO
voodoo you like, you're not going to get clean results.

On a more general note, for all the promises made over 3 decades about
how OO was the answer to our problems, we have yet to see quantum
improvements in code quality and productivity even though OO is now "the
thing" everyone is supposed to subscribe to. In part, that's because it
is profoundly difficult to see the most generic/factorable pieces of a
problem until you've worked with it for a long time. Once you get past
the "a mammal is an animal" level of problems, OO starts to
self-destruct pretty quickly as the inheritance hierarchies get so
complex no mere mortal can grasp them all. This is exactly Java's
disease at the moment. It has become a large steaming pile of object
inheritance which cannot be completely grokked by a single person. In
effect, the traditional problem of finding algorithms of appropriate
complexity gets transformed into a "what should my inheritance hierarchy
be" problem.

IMHO, one of Python's greatest virtues is its ability to shift paradigms
in mid-program so that you can use the model that best fits your problem
space. IOW, Python is an OO language that doesn't jam it down your
throat, you can mix OO with imperative, functional, and list processing
coding models simultaneously.

In my view, the doctrinaire', indeed religious, adherence to OO purity
has harmed our discipline considerably. Python was a nice breath of
fresh air when I discovered it exactly because it does not have this
slavish committment to an exclusively OO model.
 
T

Tim Daneliuk

Stefano said:
In my opinion this has got more to deal with the open source vs.
proprietary debate, that I wouldn't like to talk about, since it's
somewhat marginal.

I think the point I was trying to make was there are times when
a generic factoring of reusable code is unimportant since the code
is so purpose-built that doing a refactoring makes no sense.
What I was pointing out is well summarized in the subject: Why do
Pythoneers reinvent the wheel?
Reinventing the wheel (too much) is Bad for both the open source
community and industry. It's bad for development in general. I got the

I don't share your conviction on this point. Reinventing the wheel
makes the wheel smoother, lighter, stronger, and rounder. Well,
it *can* do this. Of far greater import (I think) is whether
any particular implementation is fit to run across a breadth of
platforms. To me, a signficant benefit of Python is that I am
mostly able to program the same way across Unix, Windows, Mac
and so on.

If such an aid to _general_ problem solving is indeed missing (I might
be wrong) from the current state of python, I don't really think the
reason is related to industry. I would look for reasons elsewhere,
like it beeing difficult to come out with effective decisional support
in an open source community, or something like this. I can certainly
see the challenge of who and how should decide what goes in the
library, and what not.


This is too abstract for me to grasp - but I admit to be old and feeble ;)

I think what you see today in the standard library are two core ideas:
1) Modules that are more-or-less pass-through wrappers for the common
APIs found in Unix and 2) Modules needed commonly to "do the things that
applications do" like manipulate data structures or preserve active
objects on backing store. If what you want here is for everyone to agree
on a common set of these and stick exclusively to them, I think you will
be sorely disappointed. OTOH, if someone has a better/faster/smarter
reimplementation of what exists, I think you'd find the community open
to embracing incremental improvement. But there is always going to be
the case of what happened when I wrote 'tconfpy'. The existing
configuration module was nice, but nowhere near the power of what I
wanted, so I wrote something that suited me exactly (well ... sort of,
'tconfpy2' is in my head at the moment). If the community embraced
it as a core part of their work, I'd be delighted (and surprised), but
I don't need for that to happen in order for that module to have value
to *me*, even though it does not displace the existing stuff.
 
A

A.M. Kuchling

Well, so we might as well learn a little more and rewrite os.path, the
time module and pickle. Right? :)

And in fact people have done all of these:
os.path: path.py (http://www.jorendorff.com/articles/python/path/)
time: mxDateTime, the stdlib's datetime.
pickle: XML serialization, YAML.
So, let's talk about a way to more effectively present available
solutions to our good programmers! :)

PEP 206 (http://www.python.org/peps/pep-0206.html) suggests assembling an
advanced library for particular problem domains (e.g. web programming,
scientific programming), and then providing a script that pulls the relevant
packages off PyPI. I'd like to hear suggestions of application domains and
of the packages that should be included.

--amk
 
F

Fuzzyman

Michael said:
Stefano said:
What is pythonutils ?
=====================
ConfigObj - simple config file handling
validate - validation and type conversion system
listquote - string to list conversion
StandOut - simple logging and output control object
pathutils - for working with paths and files
cgiutils - cgi helpers
urlpath - functions for handling URLs
odict - Ordered Dictionary Class


Fuzzyman, your post reminded me of something I can't stop thinking
about. Please don't take this as a critique on your work. I place
myself on the same side of yours.
I just wanted to share this thought with everybody had an opinion about it.

I wonder how many people (including myself) have implemented their own
versions of such modules, at least once in their pythonic life. I
indeed have my own odict (even same name! :). My own pathutils
(different name, but same stuff). My own validate... and so forth.

This is just too bad.
There are a few ares where everybody seems to be implementing their
own stuff over and over: logging, file handling, ordered dictionaries,
data serialization, and maybe a few more.
I don't know what's the ultimate problem, but I think there are 3 main reasons:
1) poor communication inside the community (mhm... arguable)
2) lack of a rich standard library (I heard this more than once)
3) python is such an easy language that the "I'll do it myself" evil
side lying hidden inside each one of us comes up a little too often,
and prevents from spending more time on research of what's available.
[snip..]
Did you take a look at pyPI (http://www.python.org/pypi) ?
At least you'd find another odict ...

Oh right. Where ?

I remember when I started coding in Python (about two years ago) in one
of my first projects I ended up re-implementing some stuff that is in
the standard library. The standard library is *fairly* big - but the
'Python blessed' modules idea sounds good.

I've often had the problem of having to assess multiple third party
libraries/frameworks and decide which of several alternatives is going
to be best for me - without really having the information on which to
base a decision (and nor the time to try them all out). Web templating
and web application frameworks are particularly difficult in this area.

If a module is in the standard library then *most* developers will
*first* use that - and only if it's not suitable look for something
else.

All the best,

Fuzzyman
http://www.voidspace.org.uk/python
 
M

Martin P. Hellwig

Stefano Masini wrote:
<cut reinventing wheel example>

Although I'm not experienced enough to comment on python stuff itself I
do know that in general there are 2 reasons that people reinvent the wheel:
- They didn't know of the existence of the first wheel
- They have different roads
Those reasons can even be combined.

The more difficult it is to create a new wheel the bigger the chance is
that you:
- Search longer for fitting technologies
- Adapt your road
 
P

Paul Boddie

A.M. Kuchling said:
PEP 206 (http://www.python.org/peps/pep-0206.html) suggests assembling an
advanced library for particular problem domains (e.g. web programming,
scientific programming), and then providing a script that pulls the relevant
packages off PyPI. I'd like to hear suggestions of application domains and
of the packages that should be included.

I'm not against pointing people in what I consider to be the right
direction, but PEP 206 seems to be quite the lobbying instrument for
people to fast-track pet projects into the standard library (or some
super-distribution), perhaps repeating some of the mistakes cited in
that document with regard to suitability. Meanwhile, in several areas,
some of the pressing needs of the standard library would remain
unaddressed if left to some kind of Pythonic popularity contest; for
example, everyone likes to dish out their favourite soundbites and
insults about DOM-based XML APIs, but just leaving minidom in the
library in slow motion maintenance mode whilst advocating more
"Pythonic" APIs doesn't help Python's interoperability with (or
relevance to) the wider development community.

The standard library is all about providing acceptable solutions so
that other people aren't inclined or forced to write their own. Every
developer should look at their repertoire of packages and consider
which ones wouldn't need to exist if the standard library had been
better. For me, if there had been a decent collection of Web
application objects in the standard library, I wouldn't have created
WebStack; if I didn't have to insist on PyXML and then provide patches
for it in order to let others run software I created, I wouldn't have
created libxml2dom.

PEP 206 is an interesting idea but "dangerous" because as a PEP it
promotes a seemingly purely informational guide to some kind of edict,
and (speaking from experience) since a comprehensive topic guide to any
reasonable number of packages and solutions is probably too much work
that no-one really wants to do anyway, the likelihood of subjective
popularity criteria influencing the selection of presented software
means that the result may be considerably flawed. Although I see that a
common trend these days is to form some kind of narrow consensus, hype
it repeatedly and, in the name of one cause, to push another agenda
entirely, all whilst ignoring the original problems that got people
thinking in the first place, I am quite sure that as a respected Python
contributor this was not a goal of yours in writing the PEP. However,
we should all be aware of the risks of picking favourites, even if the
level of dispute around those favourites is likely to be much lower for
some packages than for others.

This overly harsh criticism really brings me to ask: what happened to
the maintenance and promotion of the python.org topic guides? Or do
people only read PEPs these days?

Paul
 
A

Aahz

IMHO, one of Python's greatest virtues is its ability to shift paradigms
in mid-program so that you can use the model that best fits your problem
space. IOW, Python is an OO language that doesn't jam it down your
throat, you can mix OO with imperative, functional, and list processing
coding models simultaneously.

In my view, the doctrinaire', indeed religious, adherence to OO purity
has harmed our discipline considerably. Python was a nice breath of
fresh air when I discovered it exactly because it does not have this
slavish committment to an exclusively OO model.

+1 QOTW
 
D

Dennis Lee Bieber

On a more general note, for all the promises made over 3 decades about
how OO was the answer to our problems, we have yet to see quantum

OO goes back /that/ far? (2 decades, yes, I might even go 2.5
decades for academia <G>). My college hadn't even started "structured
programming" (beyond COBOL's PERFORM statement) by the time I graduated
in 1980. Well, okay... SmallTalk... But for most of the "real world", OO
became a known concept with C++ mid to late 80s.

--
 
B

Bengt Richter

Stefano Masini wrote:
<cut reinventing wheel example>

Although I'm not experienced enough to comment on python stuff itself I
do know that in general there are 2 reasons that people reinvent the wheel:
- They didn't know of the existence of the first wheel
- They have different roads

- They want the feeling that they are in the same league as the original inventor ;-)
Those reasons can even be combined.

The more difficult it is to create a new wheel the bigger the chance is
that you:
- Search longer for fitting technologies
- Adapt your road
- Think more carefully about ego satisfaction cost/benefit vs getting the job done ;-)

Regards,
Bengt Richter
 
T

Tim Daneliuk

Dennis said:
OO goes back /that/ far? (2 decades, yes, I might even go 2.5
decades for academia <G>). My college hadn't even started "structured
programming" (beyond COBOL's PERFORM statement) by the time I graduated
in 1980. Well, okay... SmallTalk... But for most of the "real world", OO
became a known concept with C++ mid to late 80s.

OO ideas predate C++ considerably. The idea of encapsulation and
abstract data types goes back to the 1960s IIRC. I should point
out that OO isn't particularly worse than other paradigms for
claiming to be "The One True Thing". It's been going on for
almost a half century. I've commented on this previously:

http://www.tundraware.com/Technology/Bullet/
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,983
Messages
2,570,187
Members
46,747
Latest member
jojoBizaroo

Latest Threads

Top