Peter said:
Peter said:
On Aug 21, 1:51 pm, Randy Webb wrote:
Peter Michaux said the following on 8/21/2007 12:33 PM:
On Aug 21, 8:41 am, The Magpie wrote:
Martin Mücke wrote:
I am looking for a good javascript obfuscator [snip]
In my opinion, there are no "good obfuscators".
Ummm, because there aren't any?
What is your criteria for a good obfuscater?
In principle an obfuscator is something that renders something
else more obscure (Note that the question asks for an obfuscator
and not a minimiser, so presumably minimisation is not the/a
point). The criteria for what would represent a good would
depend on why an increase in obscurity is desired. There would
be some standard of 'sufficiently' obscure, to satisfy the
assumed need, and a good obfuscator would be one
that achieved, or exceeded, that level of obscurity.
Does it necessarily have to deal with every possible bit of
code that could go in script tags?
Or just deal with the subset of code that you write?
That is quite a strange question. The code that any individual
may write may include anything that javascript allows.
I was suggesting the individual is self-restricting to a subset of
what can go between script tags. This is a justifiable decision and
writing a processor for a subset may be substantially easier.
I always worry about the notion of (self or externally) imposed blanket
restrictions on what may be done in javascript. For a start we see too
many questions on this group that take the form "I have tied my own
hands and now I need to know how to do the things that became imposable
when I tied my own hands", which don't tend to get what the OP might
consider 'good' answers.
There certainly are some things that everyone should be avoiding, and
the recent illustration of the non-obvious/counterintuitive consequences
of the use of the - with - statement in javascript goes a long way
towards justifying a general assertion that the - with - statement
"should not be used in javascript". But then there is a justifiable use
of the - with - statement, to deliberately augment the scope chain of a
dynamically created function object. The need to do that is rare
(pushing exceptional) but it does exist. Would you want to sacrifice
that possibility just because the rest of the time the - with -
statement "should not be used in javascript"?
The - with - statement, along with - eval - (which is itself subject to
similar injunctions on its use) are the two constructs that can really
mess-up poor obfuscation/minimisation methods.
An illustration of - eval -'s role came to my attention earlier in the
week. For some reason (curiosity as to how they had achieved their
latest spectacular f**k-up) I took a look at one of the javascript files
imported by Google groups pages. The file is -
g2_common-1cb0e454e081b7db50b59661d9bad546.js - (though I suspect that
all the characters after the dash change regularly (they look like a
session ID)), and that file has been 'minimised' using a technique that
has attempted to transform local Identifier names to shorter (one
character) names (a good idea for a minimiser, though a sub-optimal
implementation here). In that code I found (and it is quite easy to find
as it appears twice) a - Function.prototype.appply - emulation (a
subject I have written about many times in the past) which is presumably
to all support on pre-JScritp 5.6 IE browsers (which means mostly IE 5
to 5.5 as that is ActiveX became scriptable). The code reads
(re-wrapped):-
| if(!Function.prototype.apply){
| Function.prototype.apply = function(a,b){
| var c=[],e,g;
| if(!a)a=ra;
| var f=b||[];
| for(var h=0;h<f.length;h++){
| c[h]= "args["+h+"]"
| }
| g="oScope.__applyTemp__.peek()("+c.join(",")+");";
| if(!a.__applyTemp__){
| a.__applyTemp__=[]
| }
| a.__applyTemp__.push(this);
| e=eval(g);
| a.__applyTemp__.pop();
| return e
| }
| }
It is reasonably obvious what that code is trying to do (so much for
obscurity) and it is pretty obvious that the process of minimisation had
broken it, by transforming the Identifiers - oScope - and - args - into
one character equivalents but not doing so inside the strings that will
be - eval -ed. And of course a minimiser should not be messing with the
contents of string literals, but what it should be doing is recognising
the - eval - use and not trying to transform any Identifiers in the
(lexically) containing scopes.
Though a better solution would be observing that you only need to use -
eval - in a truly general - apply - emulation, while no real script
contexts are truly general so there was never a need to use - eval -
here in the first place. Still, if Googl's script authors are not with
it enough to notice that they have included the - appply - emulation
twice in the same script I suppose it is a bit unrealistic to expect
them underside what they are doing or why they are doing it.
I think your answer makes my point. Not everything that goes between
script tags is JavaScript. IE conditional comments are not
JavaScript. If you are willing to limit to JavaScript 1.6 then the
current Rhino based solutions could likely be very robust and if
they have bugs then they are surely fixable.
If conditional comments were a rare, unknown or unexpected feature of
web authoring then you would be right. But we know, if only form
observation, that that is not the case. It may be a pain to take
something like Rhino, for which a comment is just a comment, and make it
see when a comment is a conditional comment (especially when Microsoft
is not going to make the conditional comment algorithms public), but
that is a job that has to be done before the end result is going to be
of general use.
If by JavaScript you mean anything that can go in between
script tags then that would be a very hard target for
obfuscation.
You switch between obfuscation and minimisation at the drop of a hat.
Discussions can drift
Yes, but where has this one drifted to? are we talking about code
minimisation or code obfuscation at this point?
Really it is a related question since resistance to one
may be connected to another and often the same
piece of software does both.
A degree of obfuscation may be a likely side-effect of minimisation, but
it is not the goal of minimisation. While minimisation is only a
possible side effect of obfuscation. There is, for example, a heavily
marketed commercial script obfuscator that attempts to increase
obscurity by transforming Identifiers into sequences of characters that
look numeric (upper case - o - and lower case - L - followed by decimal
digits), the result increases obscurity (at least in the absence of
syntax highlighting) by often does nothing to reduce code size (on its
own).
Every obfuscater I've looked at
tries to use small replacement identifiers in order to reduce
size as well as obscure.
That is not true of many obfuscators I have looked at, and attributing
the motivation of code size reduction to the transformations performed
by an obfuscator would be questionable. If software is designed to
obscure the likely primary motivation for any actions it takes are
likely to be increased obscurity not minimisation.
So the goals of obfuscation and
minification can be implementationally intertwined.
A level of the one can be the side effect of the other, in both
directions.
Minification allows for documentation to be inserted inline in
source code. I think this is very good reason to use minification.
So is "minification" (which implies an attempt to achieve an extreme of
small-ness) the only thing that will prevent commented source code from
being distributed with its comments? Obviously not.
It is much more likely that documentation is kept
up-to-date when the documentation is staring the
developer in the face.
Well, you would think so, but reality is disappointing in that area.
I'm including it. However not specifically as an effort to test the
obfuscater.
<snip>
Where would you attribute the fault in the Google code I posted above?
It is being distributed to client browsers, and it is broken. It may
have been broken by the attempt at minimisation, but it was a failure to
run proper QA tests on the post-minimisation version that has resulted
in its being distributed in its broken state.
I think that a traditional lexing and parsing of the code just as
the final interpreter will makes it more likely the obfuscater has
a chance at success.
Yes, that is fairly necessary if minimisation is to go beyond removing
non-significant white space and comments. On the other hand, removing
non-significant white space and comments will invariably have a big
impact on code size (and a side-effect on relative obscurity). While
interpreting faces the problem that some javascript is (or may be)
interpreted at runtime.
If the software explains it's limitations then the developer
can easily decide if it is appropriate.
Yes, at least when they know enough to make the judgment.
Software with reasonable and declared limitations can
still be useful for the general public.
The "general public"? What do the general public have to do with browser
scripting?
It is reasonable that you can't send JScript 5.7 to a
JavaScript 1.6 obfuscater.
But it would not be reasonable if you could not send ECMAScript to a
browser supporting either (that is, after all, the point of having a
standardised language).
This brings up something I don't like about most
JavaScript libraries.
Only most?
They build hype by saying they are good for everything
but of course they have many limitations.
Any library claiming to be "good for everything" would be a bad library
for that reason alone, as there a no real world contexts where
everything is needed. (Considering that the project I am working on at
the moment now has a client-side code base of over 100,000 lines and it
certainly does not need to do 'everything', for example, there is
currently no code at all that attempts to manipulate user selections)
This actually made me very bitter when I first started
learning Rails and Prototype had the official stamp of
approval.
Does it still, or are the mistakes of the past being corrected?
I was very disappointed when I
learned how poor this approved code actually was.
Seeing what was being done with it I was not at all surprised that the
people making Rails design decisions were also 'approving' poor script
code.
If a compiler is something that takes source program in and
produces source program out (perhaps in a different language)
then they are all compilers.
No, some minimisers work only at the source text level, they may take in
what is effetely a program, and then output what is still effetely a
program, but they are not compilers.
Like many choices, it is a balance of value judgments.
Preferably informed judgments.
I don't think the QA argument stands up that well in general
because if the obfuscater is good then related problems are
so rare they are outweighed by the perceived benefits.
And Google groups think they are supporting JScript < 5.6 but in reality
they are not. and they don't know they are not because they did not do
the QA.
The fact that most obfuscaters have been awful amounts to
problems in particular rather then a general fault with
the idea.
<snip>
The problem with obfuscators is exactly the same problem it has always
been; they do not achieve a sufficient level of
(non-trivially-reversible) obscurity to provide anything resembling
'protection' ("moderate protection" or otherwise). The formatting and
non-significant white space can be resorted with the simplest of tools,
the DOM and browser object model interactions cannot be subject to
Identifier modification because that will break them, the names of
javascript object properties cannot be modified unless you have
simultaneous access to all the code in the entire system when
obfuscating because you could not guarantee that other code will not
want to refer to those property names, the global identifiers cannot be
modified for the same reasons (under the same circumstances), and that
just leaves the local parameters, inner function declaration and
variable names amenable to modification (and only if they are in scopes
where - with - and - eval - are never used).
And what happens when you change those local Identifiers? There
meaningful names are gone, but their names where maybe not meaningful to
someone who did not understand the author's native language to start
with, and as each Identifier was only meaningful within a confined
lexical scope anyway it does not take much looking to work out their
roles in thier context without ever knowing what (possibly) meaningful
names they may have had. Indeed, if a local variable needed a name that
was any more than unique in its scope then no non-English
speaking/reading programmer could never learn anything from most of the
code examples that exist. We would live in a world where software
authoring was virtually restricted to the English speaking nations, and
that is just not the case.
Minimisation and obfuscation are not the same thing, where the goal is
obfuscation the outcome will not justify the effort. Minimisation may
have a different cost-benefit, but Google are doing a good job of
illustration that the cost side of the equation may not always be being
properly perceived.
Richard.