In comp.lang.javascript message <53e60b5e-ddec-4b97-aa14-64c31f883159@j1
9g2000yqk.googlegroups.com>, Sat, 31 Oct 2009 10:08:13, VK
It should be expected in many (but not all) situations.
Contrary to the popular believe, browsers are *not* able to open text
or graphics files. What they are able to - as part of their extended
functionality - is to recognize some file types other than HTML and to
wrap them on the fly into predefined HTML templates so to display them
in the browser window. In the particular for text/plain files they are
using template
<HTML>
<HEAD></HEAD>
<BODY>
<PRE> text file content goes here </PRE>
</BODY>
</HTML>
with the exact tags' case (upper or lower) being browser dependent.
I wrote "getting, ..., a string", not "saw in a window".
Fram being a reference to an iframe recently loaded from a simple *.txt
file, the code
DIR = Fram.contentDocument.body
DIR = DIR.textContent || DIR.innerText // is latter needed? Yes, IE8
alert(DIR) // for VK
directly shows in the alert window plain text, not preceded by anything
using angle-brackets, for MS IE 8, Firefox 3.0.15, Opera 10.01, Safari
4.0.3, and Chrome 3.0. The <localhost> shown by Opera, and the
JavaScript shown by Safari, are parts of the alerts, not of their
contents.
This way the text you "see" is in effect the content of a single <pre>
element necessarily altered from the "as it is on disc" to be placed
into this tag. For instance all less-than and greater-than signs will
be converted to the corresponding named HTML entities. The fact that
you were getting so far "by using innerHTML ..., a string which agrees
visually with the content of a TXT file" suggests that so far you were
lucky but not having any problematic characters in your .txt files,
"getting by using innerHTML" is not the same as "getting directly as
innerHTML". IIRC, most browsers wrapped with <pre> and one put rather
more at the top. When I was using innerHTML, I easily removed those by
RegExp.
This is OT to the discussed FAQ topic but an interesting problem per
se. I am thinking to move it into separate thread or you may do it
yourself. I have a rather close request for ggNoSpam, in order to give
users an ability to adjust the regexp spam filter even with zero
knowledge of regular expressions. The abstract task description would
be:
"Given an array of strings with the minimum 2 and the maximum 1o
elements, find the shortest common word in these strings. If no such
common character sequence found, then try to find the biggest subset
of strings having a common word".
"word" is understood in regexp terms. To avoid "rush answers" with
common words like "a" or "the" articles let's define that the shortest
common word must be no shorter than 4 characters.
I've changed my mind about whether, for the present, I want to do that.
It would certainly increase efficiency, though perhaps not noticeably.
But doing that and the changes which would necessarily be associated
with it would be an impediment to extending capability in a direction
which may be possible and useful.
If such a thread is started, I'll participate, if anything worth writing
occurs to me.
var AoS = ["aaa bbb ccc ddd ccc bbb ddd", "bbb zzz ggg", "banana"]
var J, A, K, T, Obj = {}, Z = 0
J = AoS.length
while (J--) { T = {} // for each string
A = AoS[J].split(/\W+/) // make array of words
K = A.length ; while (K--) T[A[K]] = 1 // no internal dupes
for (K in T) Obj[K] ? Obj[K]++ : Obj[K] = 1 // ...
// ... if entry exists, increment, else create entry value 1
}
for (K in Obj) if (Obj[K]>Z) { Z = Obj[K] ; Word = [K, Z] }
Then Word[0] appears in Word[1] of the strings, and no word appears in
more than Word[1] of them.
The last line can be amended by making the test >= and complicating what
follows, to list all words of the highest popularity and not just the
first one found.
By using another variable
VERY SLIGHTLY TESTED (uses technique of LINXCHEK).