At the risk of getting off on a tangent, I'm wondering if you read
that chapter carefully, or the book from which the example was taken.
The requirements for the "regular expression package" were, and I
quote:
"I suggested to Rob that we find the smallest regular expression
package that would illustrate the basic ideas while still recognizing
a useful and nontrivial class of patterns. Ideally, the code would fit
on a single page."
I read that passage. However, I'd encountered "regular expressions" in
1971 when I read "Formal Languages and Their Relation to Automata" by
Hopcroft and Ullman before I had access to a computer. I regard myself
as privileged to start with math before wide availability of
computers, since I had to figure more out.
In this book, a "regular expression" was not a regular expression
unless it was a complete fullfunction regular expression, because it
was REQUIRED to be able to parse any regular string. What happened IMO
was that Kernighan Pike et al. seized upon the concept, stole it from
their Princeton/Bell Labs coworkers, and used it to designate a hack:
the unix grep horror.
I analyze the problems that result in a blog post, "Regular
Expressions and 'The Horror'":
http://spinoza1111.wordpress.com/2009/05/17/regular-expressions-and-the-horror/.
Basically, what mathematicians mean by regular expressions and the
horror they became (with ms dos file id patterns being just one
example) was created by the way justification and discourse works in
our society.
People esp. in the USA believe that things such as programs can be
justified. Kernighan tried to justify Pike's code by saying it "was"
"sort of" a regex processor ("regular expression package that
would ... recognize [some] useful ... patterns") and wasn't it great,
he went on to say, that Rob did it in an hour.
The problem is that "regular expression" already has a meaning, and
Rob's "regular expression package that would ... recognize [some]
useful ... patterns" was NOT a regular expression recognizer. In a
university class, where the assignment was "develop a regular
expression package that parses and applies regular expressions", Rob
would have received an F.
Dijkstra would not have approved. But even in my own graduate
coursework, in a department that was focused primarily in making
people employable in practical data processing in Chicago, the
department chair gave us the complete rules, not a partial hack of the
rules. Programming the COMPLETE rules gives us satisfaction and
utility, for life is short but art is long. Programming a part of them
in ten minutes is jerking off. The code is porn.
That is: people here, and elsewhere in programming, do violence to
problems and this is blessed as "hacking". But mathematicians don't
change the meaning of pre-existing notions to get their work done fast
and impress some thug of a boss. The professional ethics that have
resulted are on exhibit here. Any nonsense (a "linked list" of copies
as the only solution, failure to find %s) are presented as solutions,
and any objections are first met with "but I did it in ten minutes",
followed up with Fascistic personal abuse that is the stock in trade
of the corrupt.
Robert Oppenheimer said that physicists learned evil after the atomic
bomb. Computer scientists learned evil the day Algol died. This was
because their work should have been subordinated to university
mathematics.
Why?
NOT because university mathematicians or guys like Dijkstra are
Platonic-perfect beings. It is because thanks to unix, computer
science discourse became a trivial subset of corporate-speak.
What's wrong with corporate-speak?
It's "hegemonic" in the sense of Gramsci, a guy who in Mussolini's
prisons tried to explain why Italians were deluded by mass media into
thinking the (originally socialist) Mussolini was their friend.
It has no outside. What was that movie where Jim Carrey can't get
outside of a TV show?
Any math-based objection to silly code became, in the 1970s and 1980s,
in my experience, a non-starter without a "business case". For
example, I found it helpful to fix bugs in Bell Northern Research's
compiler that I found on my own whilst making changes needed by the
engineers. But I had to do so on my own time, since no "business case"
had been made. The compiler support team in Ottawa had been brutally
disbanded in 1980 because in the view of management, it was "wasting
time" by incrementally improving the compiler by removing bugs and
extending its capabilities. The result being my job, since they'd
thrown the baby out with the bathwater.
My engineering case was "it is intrinsically dangerous that the
compiler references uninitalized variables when the last 'end' is
missing, I know how to fix this, so I should fix it: this is because
the compiler, a large and complex critical system that the field
engineers need every day to solve problems for our customers, is
operating in this case outside of its design specifications. It's not
supposed to do that, and I predict trouble down the line".
But this wasn't a "business case", since (1) I could not predict the
nature of the "trouble" and even more, (2) I couldn't show how this
impacted profits.
Remarkably, this situation was repeated in the Columbia disaster of
2003, when the Space Shuttle exploded on re-entry. NASA engineers said
that the Columbia was operating outside its design window when it shed
foam on takeoff, for the same reason your car is operating outside its
design window when you drive with a busted taillight to the Kennedy
Space Center. They were told that this was (to use Rumsfeld's term) a
known unknown and not to worry, since in NASA's "businesslike"
"management by objectives" and cost-centered regime instituted under
Reagan, everything had to have a "business case".
I then read Habermas, a German philosopher. He, unlike an American,
lacked the American trust that it will all come out right on launch
day or re-entry as long as we have faith in Jesus and obey superiors.
Instead, Habermas made a detailed study of German debates in coffee-
houses under the somewhat enlightened reign of Frederick the Great to
discover that people could engage in a common search for truth, and
from these common searches for truth, utililty resulted in some cases,
but utility was NOT the goal.
Habermas said that there are two forms of discourse: civil discourse,
a common and disinterested search for truth that stops at nothing
(hey, there's a bug, it references uninitialized storage when there is
not balanced end keyword, let's fix it, wow cool) and commercial
discourse, the clerk's discourse, where his remit is to solve only a
specific problem and wait for further instructions.
I realized, to get back to Beautiful Code, that Kernighan had fallen
victim to the fact that programming no longer feels it can look
outside to a disinterested discourse, that of mathematics. But once
you form a closed world of commercial discourse, the main axiom of
self-interest automatically implies corruption: changing the meaning
of words to get it done and please the boss, and here, acting like a
complete asshole when your work is questioned...staying inside the
worst type of commercial discourse, the attack on competence, in
preference to seeking guidance in mathematical "civil discourse".
Then later, confirmation that the requirements didn't include a "full
or true regular expression parser", by which I suspect you mean
something like Perl or .NET regex matching:
No, I mean regular expressions as independently discovered in the
1940s by McCulloch, von Neumann et al.
"This is quite a useful class; in my own experience of using regular
expressions on a day-to-day basis, it easily accounts for 95 percent
of all instances. In many situations, solving the right problem is a
big step toward creating a beautiful program. Rob deserves great
credit for choosing a very small yet important, well-defined, and
extensible set of features from among a wide set of options."
While I might question how much he actually uses regular expressions
if the set of [<literal>, .(period), ^, $, *] makes up 95% of his
usage, the context is important, and it's obvious that a "full and
true regular expression parser" was never claimed in The Practice of
Programming, as evidenced by this quote from that section:
"For full regular expressions, with parentheses and alternatives, the
implementation must be more sophisticated, but can use some of the
techniques we'll talk about later in this chapter."
I'm inclined to side with Kernighan on this one. When viewed in
Not me.
context, the code is indeed a beautiful piece of work for what it
does. You seem to be critiquing it in terms of production code rather
than a rudimentary implementation of regular expressions in a chapter
section describing how regular expressions are a simplifying notation.
Nope. Actually, I could have written a complete parser using the truly
beautiful and simple syntax of ALL regular expressions presented in my
automata class by the head of the department.
Look, I don't want to offend Kernighan. I met him at Princeton, he
remembers me with neutrality if not fondness (probably not the latter)
and he's responded to my emails recently. I shall ask him to weigh in
here, but don't bet on it.
If you call Pike's crude solution "beautiful", you're corrupting
people and producing "programmers" who GET LAID OFF AT THIRTY, and who
cannot find another job or learn new languages on their own, and who
wind up saying "Welcome to Costco, I love you, welcome to Costco, I
love you" for the rest of their lives.
We need discourse other than the loud mouth of money: the health
insurance crisis is being created by the greed of insurance companies.