jl" == jl post@hotmail com said:
jl> I remember when I was new to Perl and I wanted a function that was
jl> similar to glob() but worked recursively. From "Learning Perl" I knew
jl> to use the File::Find module, but when I read its perldoc, I was very
jl> confused
File::Find uses a "callback" pattern, and if you hadn't
been familiar with that from other programming knowledge,
you *would* indeed be confused by File::Find.
By "new to perl", you are probably also meaning "fairly new to
programming".
Not really, no. I was unfamiliar with the new things that were
introduced to me by learning Perl. I remember reading somewhere that
Perl is a lot like C, but at the same time not a lot like C. This
could be considered to be a contradiction, but its point is that,
although C programmers find that Perl may have a "C" type of flow to
it, it's still not C and has some definite differences that the
programmer needs to keep in mind. Hence, when C programmers migrate to
Perl for the first time, they are almost guaranteed to feel disoriented
somewhat as they find themselves somewhere that looks like their home
turf, but really isn't.
Just a few days ago I came across an appropriate example. A C/C++
programmer (who had been programming in C for decades and C++ for at
least a decade) came to me for help with Perl. He had trouble
accessing a specific element in an array.
When I looked at his code, I saw that he was trying to access a
specific element with code like:
print location($ndx);
I explained to him that, like C, those parentheses made that piece
of code a function call (instead of an array look-up) and, like C,
arrays were accessed by placing square brackets around the index
(instead of parentheses). I also explained that, unlike C, a dollar
sign had to be placed in front of "location".
When we corrected these errors and saw that the Perl program
compiled, he told me that the documentation specifically had an example
of parentheses being used. I told him he was probably looking at an
example of a function call and mistook it for an array. He denied
that, and went back to theh documentation.
To my surprise, the documentation he was looking at was a perldoc
(this pleased me because almost nobody I introduce perldocs to actually
turn to them when needed). However, he looked at the sample code again
and, as a surprise to him, the sample code in the perldocs was written
as:
print $location[$ndx];
just like I had said it was supposed to be written.
He muttered something like, "I could have sworn it was written the
other way." Now, what was strange was that a seasoned C/C++ programmer
should make that mistake. It's no stretch to believe that a BASIC
programmer could make that mistake, but someone who's been programming
in C since before Perl was invented? How could he make a mistake like
that, considering that the array syntax for Perl is almost exactly like
it is in C?
I contribute it to disorientation. It's going to happen somewhat no
matter how familiar someone (new to Perl) is to other programming
languages. Sure, having good college professors and studying hard
helps a lot, but some amount of disorientation (no matter how small)
will always be experienced by a programmer learning a new concept.
This disorientation can even leak into areas of knowledge that the
programmer had firm knowledge on (like array syntax). It's an odd
phenomenon, but I know it exists.
So how do we eliminate this disorientation completely? Frankly, I
believe it's not possible to eliminate it entirely. This
disorientation is a fact of life, something we have to live with
because we're human. We can work to reduce it somewhat, but the fact
that everyone is different means that everyone will stumble across
different things.
I love the "perldoc perltrap" documentation because it seeks to shed
light on where people can trip up (and get disoriented) and clearly
explain why and what the correct mode of thinking is. Of course,
because we're all different, that perldoc can never ever be complete,
but at least it addresses some of the more common issues that plague
new (and even experienced) Perl programmers.
But that's not the fault of File::Find or even the documentors
there. If we had to explain what a list is every time a function
returned a list, the docs would balloon to the point where
they'd be far to heavy to carry.
Very true. You could even say that, since programming can be used
to solve open-ended problems, that an infinite amount of documentation
is necessary, since there are, in theory, an infinite number of
programming concepts. I'm not even going to attempt to answer the
question "How much documentation is enough?" because that question has
a very gray answer, and will differ from person to person. (The
answers given by the same person can even vary if the question is asked
several months apart.)
But nevertheless, that's one of those issues that is significant to
me (even though many (or even most) programmers don't give it much
thought). For example, if 20% of Perl programmers got confused about a
list being returned in one particular function, should we re-explain
what a list is? Probably not, but a one sentence reminder designed to
clear up potential confusion might not be out of line, especially if
it's stated near text that has confused a significant number of people
in the past.
I think what you're arguing for is more hand-holding for beginners.
In a way, yes. I've never really been opposed to hand-holding, but
I've noticed that some people are adamantly against it.
I think what we keep saying is it's all there, but maybe a little
rough for beginners to find. But that's why there's a market for paid
consultants and paid books and paid trainings. If you can't
understand readily what the free sources are saying to the experts,
you hire someone to bridge that to your level. This is the efficiency
of Open Source at its heart... the developers can write docs to other
developers (taking a lot of shortcuts), and the rest of us can get
paid to translate that to beginner-speak.
I agree with that. However, I've continually been encouraged to
write code that anyone (familiar with that programming language, of
course) can understand. I happen to mostly agree with statement.
Unfortunately, a common attitude I see is:
"I always write clear code. If I can't understand someone else's
code
[and that can include lists of lists, or even hashes] then that
code is
bad code and should be rewritten to be less esoteric."
To me, this attidue has a major fallacy: it assumes that what is clear
to one programmer is clear to all programmers, and what is unclear to
one programmer is esoteric code.
I find it hard to disagree more with that attitude. One thing I
find myself reminding programmers is: "Code is always clearest to the
programmer who wrote it. Keep in mind that, no matter how clear and
understandable your code is to you, no other programmer will think it's
as clear."
I also ask programmers to look back at the code they've written and
try to identify areas that might trip up future maintainers. If
they've found an area, either re-write it so that the chances of
confusion are descreased or, if that's not possible, add a comment or
two explaining the intent of the code and what its purpose is.
But if the programmer won't identify that any areas of their code
might be difficult to understand (claiming that their code is clear
enough and that only bad programmers would be confused), I remain
skeptical of their claim.
I see I've gone off on a tangent again. Well, permit me to tie
things together:
I've noticed that certain programmers have certain passions. A
friend of mine's passion is to remove duplicate code where possible and
to re-write classes so that they are adhering to real Object-Oriented
concepts (instead of being classes that are only half-way there, which
I've seen a lot of). My personal programming passion (and the reason
for these posts) is to not have blocks of code (and especially whole
files) that have unclear intentions. In other words, if I can't tell
what the purpose of a block of code is by looking at it, it should
either be re-written or have comments added to it (to smooth out the
rough parts, so to speak).
Of course, what is clear and what is not is different for every
programmer, but at least an effort can be made to clarify one's code
(another saying of mine is "Always write code so that it can be
understood and maintained by someone not quite as intelligent as you").
But how far must you clarify your code? This has been a subject of
much debate (made even more heated by the fact that some people think
that every piece of code they write is clear -- despite the fact that
no maintainers seem to think so), so I've taken the stance that a Perl
programmer should be responsible for knowing, at a minimum, whatever is
covered in the O'Reilly "Learning Perl" book (the Llama book). If they
don't know at least 90% of the material covered in there, they
shouldn't be programming in Perl.
But if you do write Perl code, don't bother explaining the concepts
found in that book. But if you come up with new programming concepts
(there are an infinite number of them, after all), explain them. Don't
just assume that a decent programmer will be able to understand it just
as you did when you wrote it in your program. The programmer who wrote
the code has the benefit of the thought processes required to think up
of the new programming concept, and will therefore perceive the code to
be quite clear -- but the maintainer (unless he/she has the ability to
read other people's minds) won't have that benefit and will perceive
the code as being significantly more convoluted.
A neat thing about programming is that a programmer, when asked to
wirte a program to accomplish a certain task, will pull an idea "out of
the ether" (so to speak) to create a program that's never been written
before. The fact that this new program has never been seen before in
all of human history automatically gives it a disadvantage in having it
understood by others.
The programmer who wrote this brand-new program has an exceptional
understanding of how the program works. It is my opinion that it is
his/her duty to not assume that others have the same level of clarity,
but to review the code and identify "problem spots." Once identified,
the programmer should either eliminate them (by rewriting them into a
common convention (such as one covered in the Llama book)) or, if
that's not possible, impart the knowledge needed in the form of
comments. The comments don't necessarily have to explain everything
(they could just point to existing documentation).
But leaving code devoid of comments is a trap: sure, simple code
doesn't necessarily need it, but when code becomes more complex, it's
practically guaranteed that maintainers won't understand it without
hours of study (and even with study there's always a good chance that
the code will be misunderstood, leading to errant and contradictory
code).
An example (the last one, I promise): I once came across code that
had four nested for-loops (there were actually five loops, but one of
them wasn't nested). I could understand what two of the loops were
doing, but I couldn't figure out the other three loops. Unfortunately,
there were no comments explaining the purpose of the other loops,
leaving that to be divined by the maintainers (namely, me). Well, I
found out who originally wrote the code, printed out the code in
question, and brought it over to the original programmer, asking him
what the code was doing.
I don't think the programmer even knew the code was written by him,
as his irritated response was, "You think I can understand some code
that just happens to be brought up to me?"
Unfortunately, this attitude is all too common. Personally (and
this is where many people disagree with me), I think that the intent of
all blocks of code should be immediately apparent. Either they should
follow a well-known convention, like:
my ($filename, $type, $outputFilename) = @_;
(believe it or not, not all Perl programmers understand what that line
of code does... this is unfortunate as they should know) or they should
explain its purpose with the help of comments. And not every
programming concept has a convention (there are an infinite number of
programming concepts, after all), so every program should have
clarifying comments, unless the program is so simple that it probably
already exists elsewhere (or if you're literally writing the at the
command line with the -e switch!).
I'll end it here with the thought that we are all unique and
therefore have different levels of understanding of different
programming concepts. Hand-holding isn't necessarily a bad thing,
especially if more than 20% of maintainers will need it.
(If you got this far, thanks for reading!)
-- Jean-Luc Romano