Since this got some discussion on [...], a reply from a professional
liguist:
Begin forwarded message:
From: Arnold M. Zwicky <
[email protected]>
Date: Wed Sep 17, 2003 7:11:44 PM US/Pacific
To: "Paul Ralston" <
[email protected]>
Subject: Re: Fw: On the value of using spell checker
The phaomnnehil pweor of the hmuan mnid...
in the past week, dozens of (different) versions of this message have
been circulating. it has been much discussed in essentially all the
language-related mailing lists, newsgroups, and websites. no
connection to anyone at cambridge university has been discovered. so
far, its earliest appearance seems to have been on a translators'
mailing list.
there is a general suspicion that no actual research has been done
here, just a "demo" from texts like the one you forwarded.
by the way, lots of the versions have typos in them! for example, the
second word in your version is "phaomnnehil", which lacks one "e" and
has an extra "h". it looks like someone was doing the letter
transpositions by hand, rather than using a random-transposition
scheme, which is what any actual researcher would do.
there are several effects at work here. one is a well-known effect
that the beginnings and ends of chunks of linguistic stuff are
especially attended to. (in spoken language, also the most-stressed
syllable.) another is the great redundancy of language, whether in
spoken or written form. still another is the fact that if you preserve
the first and last letters of an orthographic word, then one-, two-,
and three-letter words are unaffected; but these little words are
powerful cues to the structure of sentences and the nature of the words
around them. (and four-letter words have at most one transposition, in
the middle, so they are usually very easy to recover. about half the
words in this -- rather academic -- message have four or fewer letters
in them. in less academic writing, more than half the words are
essentially instantly recognizable.)
finally, there really is a power of the human mind at word here, namely
our ability to use general knowledge, knowledge of the structure of our
language, and information from the discourse context to interpret what
has just gone before and to predict what is likely to come next. this
allows us to unconsciously correct slips of the tongue, to fill in
material lost in noise or inattention, and to manage other wonderful
feats of comprehension (which is not perfect, but pretty damn good).
the saliency of the first and last, the huge redundancy of language,
and the active (rather than passive) and context-dependent nature of
language understanding are well-established ideas in
linguistics/psychology. they'd combine to predict the "result"
reported on. they also predict that if the text is structurally
difficult, has unfamiliar vocabulary, has lots of long words (suppose
we put the vowels in a bunch, alphabetically, then the consonants,
ditto, so that "phenomenal" becomes "paeeohmnnl", which is a total
stumper unless you have the discourse context), and/or is not
particularly coherent, the whole thing will grind to a halt, no matter
how carefully you preserve first and last letters.
this, of course, *could* be studied, though i'm pretty sure no one has.
arnold