JavaScript code mangler / scrambler / ... khm, more than

A

Andrew

Hi!

I hope I can get close to Sun and avoid the flames... ;)

First of all, I have read and fully understand numerous postings on
obfuscation. Also, I understand and mostly agree with this:
http://jibbering.com/faq/obfuscate.html
I have also personally successfully decoded JS that was encoded with
http://www.javascript-source.com/, http://www.stunnix.com/ and
http://www.semdesigns.com/Products/LanguageTools/ECMAScriptTools.html.
I haven't even checked the numerous others I have found because they
look even worse. So I know how futile the resistance is. ;)

BUT: I am looking for something that would NOT (only) do the basic
comments stripping / names renaming, but also changed the (apparent,
of course) flow of the program. Sth. that would convert:
-----
sqrt=function(x) {
return(x*x);
}
-----
to sth. like this:
-----
b=function(v) {
return(a(-v));
}
a=function(x) {
if (x<4)
{
a=x+3;
b=x-7;
c=(x%a)+4;
return((b-c)*(b-c))
}
else
return(b(v));
}
-----
(no, I haven't written any unit tests or indeed run it, so it might
not be correct - but you get the idea)

I don't care much about the size of the JS file or about the execution
time (though it must be O(n), of course). To unscramble such code you
would in general need an execution optimizer, which is far from
trivial. Or you could spend a LOT of time on that 30k (original size)
js file to decode it. :)

Are there any tools out there that are capable of doing this? Writing
such code by hand is... well, difficult. ;) But writing such tools
should be doable. Anyone found something like this yet?

Please: refrain from "don't do it" posts. I just want to know what can
be done.

Thanks!
 
D

Doug Miller

Hi!

I hope I can get close to Sun and avoid the flames... ;)

First of all, I have read and fully understand numerous postings on
obfuscation. [snip]

Please: refrain from "don't do it" posts. I just want to know what can
be done.

While you may have *read* "numerous postings on obfuscation" it is clear that
you do not *understand* them. The *only* way to effectively conceal the
workings of your scripts from prying eyes is to put them on your server.
*Anything* that's delivered client-side can be read by anyone. Since it must
be de-obfuscated by the client, it can be de-obfuscated by anyone with access
to the client (i.e. anyone who clicks on View | Source and takes the time to
read the code).
 
L

Lasse Reichstein Nielsen

Hi!

I hope I can get close to Sun and avoid the flames... ;)

First of all, I have read and fully understand numerous postings on
obfuscation. [snip]

Please: refrain from "don't do it" posts. I just want to know what can
be done.

While you may have *read* "numerous postings on obfuscation" it is clear that
you do not *understand* them. The *only* way to effectively conceal the
workings of your scripts from prying eyes is to put them on your server.
*Anything* that's delivered client-side can be read by anyone. Since it must
be de-obfuscated by the client, it can be de-obfuscated by anyone with access
to the client (i.e. anyone who clicks on View | Source and takes the time to
read the code).

I take it you didn't read the message you are replying to.
The idea was to complexify the algorithm by introducing extra operations that
cancel out in the end. The code is never de-obfuscated. It is executed directly
in the obfuscated state.
The code will obviously be less efficient (it does more work to get the
same result), but it's also harder to read an reason about. Not impossible,
only harder.

I don't think anything like that has been made already.

Whether to use it would depend on a complex evaluation taking into
account the extra development complexity of doing the obfuscation, the
extra runtime of the obfuscated code, the importance of hiding the
original code (what would it cost us if someone else had the code) and
the expected threat of someone actually trying to get to the original
code. (I personally think any informed evaluation of that kind would
end in rejecting the idea).

/L 'Still don't think there's code that's worth protecting like that'
 
J

Jorge

(...)
BUT: I am looking for something that would NOT (only) do the basic
comments stripping / names renaming, but also changed the (apparent,
of course) flow of the program. Sth. that would convert:
-----
sqrt=function(x) {
  return(x*x);}

-----
to sth. like this:
-----
b=function(v) {
  return(a(-v));}

a=function(x) {
  if (x<4)
  {
    a=x+3;
    b=x-7;
    c=(x%a)+4;
    return((b-c)*(b-c))
  }
  else
    return(b(v));}

Awesome idea (!). scrambler(code) ought to be !== scrambler(code) :
the output ought to be as random as possible.
 
D

Doug Miller

[email protected] (Doug Miller) said:
Hi!

I hope I can get close to Sun and avoid the flames... ;)

First of all, I have read and fully understand numerous postings on
obfuscation. [snip]

Please: refrain from "don't do it" posts. I just want to know what can
be done.

While you may have *read* "numerous postings on obfuscation" it is clear that
you do not *understand* them. The *only* way to effectively conceal the
workings of your scripts from prying eyes is to put them on your server.
*Anything* that's delivered client-side can be read by anyone. Since it must
be de-obfuscated by the client, it can be de-obfuscated by anyone with access
to the client (i.e. anyone who clicks on View | Source and takes the time to
read the code).

I take it you didn't read the message you are replying to.
Incorrect.
The idea was to complexify the algorithm by introducing extra operations that
cancel out in the end. The code is never de-obfuscated. It is executed directly
in the obfuscated state.[/QUOTE]

Yes, I understand that. What both you and the OP overlook is that the issue is
still the same: the entire source code is still clearly visible, and its
operation is obvious to anyone who wishes to take the time to explore it.
The code will obviously be less efficient (it does more work to get the
same result), but it's also harder to read an reason about. Not impossible,
only harder.

Exactly so. And thus useless.
 
S

Scott Sauyet

BUT: I am looking for something that would NOT (only) do the basic
comments stripping / names renaming, but also changed the (apparent,
of course) flow of the program. Sth. that would convert:
-----
sqrt=function(x) {
  return(x*x);}

-----
to sth. like this:
-----
b=function(v) {
  return(a(-v));}

a=function(x) {
  if (x<4)
  {
    a=x+3;
    b=x-7;
    c=(x%a)+4;
    return((b-c)*(b-c))
  }
  else
    return(b(v));}

Others can try to tell you how wrong-headed the idea seems (and I
agree with them.) What I want to point out is just how difficult the
process would be. First of all, your source transformations would
presumably need to be one-way, or someone could simply reverse them to
get your code; even if the exact sequence of transformations applied
was randomized, I'm guessing it would be relatively easy to
iteratively test some reversals to see if they simplify the code,
repeating until the code seems simple enough.

And then testing that you haven't broken anything would likely be a
nightmare. The transformed code would almost certainly have its own
artifacts that need to be checked for edge cases.

But you can't even supply a simple working example here. When I try b
(10) in the above, I get 324. Imagine what would go into making
something that's general enough to be useful and could be tested well
enough that it's unlikely to break the scripts it transforms.

-- Scott
 
T

Thomas 'PointedEars' Lahn

Andrew said:
BUT: I am looking for something that would NOT (only) do the basic
comments stripping / names renaming, but also changed the (apparent,
of course) flow of the program. Sth. that would convert:
-----
sqrt=function(x) {
return(x*x);
}
-----
to sth. like this:
-----
b=function(v) {
return(a(-v));
}
a=function(x) {
if (x<4)
{
a=x+3;
b=x-7;
c=(x%a)+4;
return((b-c)*(b-c))
}
else
return(b(v));
}
-----

IIUC, Rice's theorem precludes that from being accomplished.


PointedEars
 
A

Andrew

Wow, guys... Though I sense the underlying criticism (he, he... ;) I
must say... I'm impressed.

@Doug: thank you for your opinion. :)

@Lasse: that is exactly what I had in mind, nicely put!

I agree, of course one needs to assess if the extra cost is worth it
(of course, there is the "fun" factor in this equation too ;), but
first I wanted to know if there is a tool to do that automatically. It
seems to me it shouldn't be _too_ difficult to write one.

@Jorge: The idea is not mine - it comes from a contest where you had
to manually (as in paper & pencil) decipher what some program does and
simplify it as much as possible.

@Scott:
Yes, that would be another challenge - how to write a program that
deciphers such mangled code. :)
I would say that this is what code optimizers are doing, actually -
trying to optimize our "mangled" code.

As stated, I wrote that example out of my head in less than 2 minutes
- and it shows. Working example would be this:
-----
function b(v) {
return(a(-v));
}

function a(x) {
if (x>7)
{
var a=x+3;
var d=x-7;
c=(a%x)+4;
return((d+c)*(d+c))
}
else
return((x>0)?(x*x):(b(x)));
}

// unit test:
for (x=-100; x<100;x+=0.3)
{
if ( Math.abs(a(x)-x*x) > 0.000001 )
{
alert('Oops! '+a(x)+'!='+(x*x));
break;
}
}
if (x>=100)
alert('Test passed.');
-----
(and it still took me less than 10 minutes while making something
correct and very difficult to read).

But that is beside the point. Such process should of course be
automated and very well tested if it is to be successful.


@PointedEars:
IIUC, Rice's theorem precludes that from being accomplished.

Could you please elaborate on that? What exactly does it preclude and
why? Surely you don't mean that the code can't be made less readable?

Thank you all for sharing!
 
T

Thomas 'PointedEars' Lahn

Andrew said:
@PointedEars [et al.]:

Please don't, this is not a chat or bulletin board on the Web; it is a
thread-based discussion medium, a newsgroup, instead. Keep attribution
lines that are generated when you post a followup to each separate posting,
in which you should only refer to that posting and optionally its precursors
(as quoted).

Could you please elaborate on that? What exactly does it preclude and
why? Surely you don't mean that the code can't be made less readable?

STFW. According to Wikipedia:

,-<http://en.wikipedia.org/wiki/Rice's_theorem>
|
| In computability theory, *Rice's theorem* states that, for any non-trivial
| property of partial functions, there is no general and effective method to
| decide whether an algorithm computes a partial function with that
| property. Here, a property of partial functions is called trivial if it
| holds for all partial computable functions or for none, and an effective
| decision method is called general if it decides correctly for every
| algorithm. The theorem is named after Henry Gordon Rice, and is also known
| as the Rice-Myhill-Shapiro theorem after Rice, John Myhill, and Norman
| Shapiro.

AIUI, that means in a nutshell that you cannot devise a general function
that computes another function that computes the same result for the same
input (a trivial property) as a third function.

Of course an intelligent being, using the approach of informed reasoning
(trial and error), would be able to do that to a certain extent, but an
algorithm-driven machine usually cannot. This is directly connected to the
equally undecidable halting problem which states that a machine cannot
decide if another machine (or itself) holds on a given input (that would be
a trivial property, too).

That does not apply to obfuscation as a whole (so code *can* be made less
readable, if that would be desirable to begin with), but ISTM it does apply
to your approach at obfuscation. And on closer inspection of the Wikipedia
article which uses the very same example in the proof, I think I just did
your homework in theoretical computer science.


PointedEars
 
A

Andrew

@PointedEars [et al.]:
Please don't, this is not a chat or bulletin board on the Web; it is a
thread-based discussion medium, a newsgroup, instead.  Keep attribution

Will do my best, thanks for pointing it out to me.
STFW.  According to Wikipedia:

DUSMA. (== Don't Use So Many Acronyms)
,-<http://en.wikipedia.org/wiki/Rice's_theorem>
|
| In computability theory, *Rice's theorem* states that, for any non-trivial
| property of partial functions, there is no general and effective methodto
| decide whether an algorithm computes a partial function with that
| property. Here, a property of partial functions is called trivial if it
| holds for all partial computable functions or for none, and an effective
| decision method is called general if it decides correctly for every
| algorithm. The theorem is named after Henry Gordon Rice, and is also known
| as the Rice-Myhill-Shapiro theorem after Rice, John Myhill, and Norman
| Shapiro.

AIUI, that means in a nutshell that you cannot devise a general function
that computes another function that computes the same result for the same
input (a trivial property) as a third function.

Ah, but my function is not a black box. The program has full control
over inner workings of the function, so this theorem (AIUI == as I
understand it) does not apply to this case.

I have read that exact Wikipedia article - I was asking for further
clarification because nothing in the theorem applies to this case
(IMHO of course).
Of course an intelligent being, using the approach of informed reasoning
(trial and error), would be able to do that to a certain extent, but an
algorithm-driven machine usually cannot.  This is directly connected tothe
equally undecidable halting problem which states that a machine cannot
decide if another machine (or itself) holds on a given input (that would be
a trivial property, too).

I was thinking more of a way to replace known constructs with mangled
versions (randomly chosen from a set of predefined options), not of a
way for an algorithm to produce other algorithms automatically. That
would be difficult / impossible?, I agree.
That does not apply to obfuscation as a whole (so code *can* be made less
readable, if that would be desirable to begin with), but ISTM it does apply
to your approach at obfuscation.  And on closer inspection of the Wikipedia
article which uses the very same example in the proof, I think I just did
your homework in theoretical computer science.

Well, not really. ISTM (== it seems to me) that the proof only proves
that it is not possible to make a program that would say without doubt
that the mangled function really does what we think it does.
While that would be interesting for testing purposes, it is not
essential for the mangling process I have described.

Anyway, it looks like there is no such tool, and as I am not willing
to invest the time to develop it - so we'll just have to minimize the
JS sources and forget about it. ;)

Again, thank you all for answers, appreciate it.
 
P

Peter Michaux

DUSMA. (== Don't Use So Many Acronyms)

I've asked him many times to type out his acronyms in full as a
courtesy to readers. Like pressing tab in his editor after an acronym
to expand in full would kill him. Probably takes less time than the
time for him to look up the appropriately uncommon, terse acronym just
to confuse readers in the first place. But I guess that is the
"point".

Merry Christmas, Ears.

Peter
 
T

Thomas 'PointedEars' Lahn

Andrew said:
[Thomas 'PointedEars' Lahn wrote:]
@PointedEars [et al.]:
Please don't, this is not a chat or bulletin board on the Web; it is a
thread-based discussion medium, a newsgroup, instead. Keep attribution

Will do my best, thanks for pointing it out to me.

Yet you removed the attribution line again. (I *know* that you removed it
because Google Groups includes it by default.)
DUSMA. (== Don't Use So Many Acronyms)

RTFM, HTH.
,-<http://en.wikipedia.org/wiki/Rice's_theorem>
|
| [...]

AIUI, that means in a nutshell that you cannot devise a general function
that computes another function that computes the same result for the same
input (a trivial property) as a third function.

Ah, but my function is not a black box.

That is exactly the point.
The program has full control over inner workings of the function,

No, it has not.
so this theorem (AIUI == as I understand it) does not apply to this case.

It does.
I was thinking more of a way to replace known constructs with mangled
versions (randomly chosen from a set of predefined options), not of a
way for an algorithm to produce other algorithms automatically.

That is the same. In order to choose one randomly (i.e., programmatically)
from a set of predefined options, either the result is so trivial that it is
easily decoded, such as

function f(x)
{
return g(x);
}

function g(x)
{
return Math.pow(x, h(1));
}

function h(x)
{
return 2 * x;
}

or you would need to *decide* programmatically (i.e., using an algorithm)
whether the "predefined option" applies to the algorithm that should be
obfuscated. According to Rice's theorem, that cannot be done.

In addition, any new call that would increase complexity, thus make it
harder to decipher the whole, would decrease efficiency considerably: memory
efficiency because each new function requires a new Function object and a
stack, and runtime efficiency because creating this object and setting up
the stack takes time.
Well, not really. ISTM (== it seems to me) that the proof only proves
that it is not possible to make a program that would say without doubt
that the mangled function really does what we think it does.

No. What matters is that it is not possible to make a program that would
say without doubt that the mangled function returns the same than the
original function for any given input (here: argument). IOW, it cannot be
computed.


PointedEars
 
L

Lasse Reichstein Nielsen

Thomas 'PointedEars' Lahn said:
AIUI, that means in a nutshell that you cannot devise a general function
that computes another function that computes the same result for the same
input (a trivial property) as a third function.

I don't think you can conclude that from Rice's theorem.

It may not be possible to detect, in general, any non-trivial property
of a given partial function, but that doesn't preclude creating a
different (syntax for) a function that behaves the same. You just can't
tell a-priori what the behavior is for all inputs.

Trivial example is eta expansion:
f |-> function(x) { return f(x); }

You can argue whether the latter is a *different* function from "f",
but in this case it's not important. The desired result is a function
that is extensionally equivalent to the original, but with a more
convoluted syntax.

/L
 
T

Thomas 'PointedEars' Lahn

Lasse said:
I don't think you can conclude that from Rice's theorem.

It may not be possible to detect, in general, any non-trivial property
of a given partial function, but that doesn't preclude creating a
different (syntax for) a function that behaves the same.

Not for an intelligent being, that is. But that was not what was asked.
You just can't tell a-priori what the behavior is for all inputs.

That is exactly the point of Rice's theorem, and the reason why what the OP
wants cannot be done.
Trivial example is eta expansion:
f |-> function(x) { return f(x); }

You can argue whether the latter is a *different* function from "f",
but in this case it's not important. The desired result is a function
that is extensionally equivalent to the original, but with a more
convoluted syntax.

Your example function, when called, would recurse (call itself) until the
stack memory runs out. (So yes, it holds, but not on a Turing machine ;-))

I fail to see how this could disprove anything.


PointedEars
 
A

Andrew

Thomas 'PointedEars' Lahn said:
Not for an intelligent being, that is.  But that was not what was asked..

Actually, this might be the point - the set of possible
transformations _would_ be done by an intelligent being (the developer
preparing the mangling system). Only the choice and application of the
transformations would be done by an algorithm (the mangler program).
That is exactly the point of Rice's theorem, and the reason why what the OP
wants cannot be done.



Your example function, when called, would recurse (call itself) until the
stack memory runs out.  (So yes, it holds, but not on a Turing machine ;-))

I think you misunderstood the syntax - "|->" is not the same as "=".

Enjoy!
 
S

Scott Sauyet

function b(v) {
return(a(-v));
}

function a(x) {
if (x>7)
{
var a=x+3;
var d=x-7;
c=(a%x)+4;
return((d+c)*(d+c))
}
else
return((x>0)?(x*x):(b(x)));
}

But how would you tell your super-obfuscator about the boundary
conditions?

// a(36028797018963969) = 1.2980742146337066e+33
// 36028797018963969 * 36028797018963969 = 1.298074214633707e+33
// a difference of 288230376151711740

So for the value x = 36028797018963969, not only is

a(x) != x * x,

and

Math.abs(a(x) - x * x) > 0, (much greater!)

In other words, even for your simple, hand-tuned function, there are
values for which it does not match the intended function. Perhaps
this input value is irrelevant to your code, but do you have to
indicate to the tool the possible values for every function?

I think you're underestimating the difficulty of creating such a
tool. Moreover, even if you could build it, it would only stop casual
viewers from understanding your code. Anyone who understands
Javascript could, with some perseverance, be able to weed through this
nonsense to understand the core of your programs.

In other words, while I think this is an interesting abstract
discussion, I don't think there are any real chances of building a
tool that does what you want.

-- Scott
 
T

Thomas 'PointedEars' Lahn

Andrew said:
Actually, this might be the point - the set of possible
transformations _would_ be done by an intelligent being (the developer
preparing the mangling system). Only the choice and application of the
transformations would be done by an algorithm (the mangler program).

You are still missing the point.
I think you misunderstood the syntax - "|->" is not the same as "=".

That is true, but it does not matter.


PointedEars
 
T

Thomas 'PointedEars' Lahn

Scott said:
In other words, while I think this is an interesting abstract
discussion, I don't think there are any real chances of building a
tool that does what you want.

And even if there were, it would not be desirable to use in production code,
for with greater complexity comes less efficiency.


PointedEars
 
L

Lasse Reichstein Nielsen

Thomas 'PointedEars' Lahn said:
Not for an intelligent being, that is. But that was not what was asked.

Nor for a computer.
That is exactly the point of Rice's theorem, and the reason why what the OP
wants cannot be done.

No, Rice's theorem states that there is no total computable function on
the set of algorithms that determines any non-trivial property of the
partial function computed by the algorithm.

Here "algorithms" are programs and the partial functions computed by
algorithm is the runtime behavior of the programs. The theorem simply
says that no non-trivial static analysis can be correct on all
programs (i.e., there is no way to find the runtime behavior of any
program - except to run in at risk non-termination).

The theorem says nothing about rewriting of programs at the syntax
level. It is quite possible to, programmatically, rewrite a program
into another program with the same behavior *without knowing the
behavior*. After all, that's what compilers do all the time: rewrite a
program into another program with the same runtime behavior.
Your example function, when called, would recurse (call itself) until the
stack memory runs out. (So yes, it holds, but not on a Turing machine ;-))

Ah, my bad. My ASCII art wasn't self-explanatory :)
The "|->" arrow should represent the "maps-to" symbol (LaTeX \mapsto,
seems to be Unicode code point 0x21a6).

The was to rewrite a function expression into the eta-expanded form.
The expression is extensionally equivalent to the original, and the
program has the same runtime behavior as the original.
(In Javascript, with variadic functions, eta-expansion gets a little
more convoluted).
I fail to see how this could disprove anything.

It's a rewrite of a Javascript expression that preserves its behavior.
As I read what you have written, Rice's theorem says that this should
not be possible. I claim that it says no such thing.

/L
 
A

Andrew

In other words, even for your simple, hand-tuned function, there are
values for which it does not match the intended function.  Perhaps
this input value is irrelevant to your code, but do you have to
indicate to the tool the possible values for every function?

I think you're underestimating the difficulty of creating such a
tool. Moreover, even if you could build it, it would only stop casual
viewers from understanding your code. Anyone who understands
Javascript could, with some perseverance, be able to weed through this
nonsense to understand the core of your programs.

I am beginning to think that the example was not such a good idea...
yes it's flawed. It is an EXAMPLE, not proof of concept. You can do
better if you wish. :)

So, to recap:
- of course it is possible to do it, at least to some extent (still
not buying the Rice's theorem having to do anything with this - as
Lasse pointed out, compilers do that all the time)
- of course this is not the protection against determined hacker (much
better than name replacement though)
- of course there will be performance penalty
In other words, while I think this is an interesting abstract
discussion, I don't think there are any real chances of building a
tool that does what you want.

Well, that might be true. But not because of all the things others
have written in these posts, but because JS is such a difficult
language for this task. I am no way an expert in JS, but from what I
have seen (closures come to mind) there are some concepts that make it
really hard to do this kind of thing. Not impossible, but difficult.

Not sure it would be a bad business idea though - seeing how
obfuscators sell for $60-$250 when they are not even remotely
successful. If someone is capable of doing it, of course. I would have
bought a copy right now, but of course, it would have to be so good
that a capable programmer would have difficulties deducting what the
code does.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,999
Messages
2,570,243
Members
46,836
Latest member
login dogas

Latest Threads

Top