anyone interested in decompilation

W

Walter Roberson

QuantumG said:
We generate C. :) I want to talk about decompilers. If you don't
want to talk about decompilers, don't reply to a thread that is clearly
about decompilers.

"Silence indicates consent" -- in other words, if we did not point
out that we consider the topic inappropriate here, then your assumption
would be that we don't mind.
 
D

Default User

QuantumG said:
What does that tell you about moderated newsgroups? Similarly, can
you infer what result the topic police have on unmoderated newsgroups?

On this one, they have very good results. That's why most of the posts
are on-topic. Yeah, there are dumbos like you, but they tend to be a
small minority.

Groups have topicality. Smart people recognize that and respect it. The
topic police will let everyone know that the thread is not appropriate,
and the often scuttles most of the discussion. The chucklehead (that's
you) finds that he either does have to go to the correct group or have
his threads turn into this.

So the question becomes, do you want to have good, relevant discussion
of your chosen topic? If so, you don't want to try it here, because it
won't happen. You'll need to move to the proper group.




Brian
 
D

Default User

QuantumG wrote:

Gee, I'm sorry. Please don't kick me... oh wait, this isn't a
moderated newsgroup. I'll talk about whatever the hell I like.

Maybe, but most of use real newsreaders, so we can killfile twits. Bye.

*plonk*




Brian
 
K

Kenny McCormack

QuantumG wrote:



Maybe, but most of use real newsreaders, so we can killfile twits. Bye.

*plonk*

What's so absolutely precious is that these regs would have you believe
that they're all a bunch of high-powered, serious, educated "software
developers" (or whatever the term of fashion is these days), and yet
they act like a bunch of girls in this newsgroup (see above for what I
mean - this means you, "Mr. Brian Default User").

To put it another way, I remember reading somewhere along the way about
mobsters - and how the smart ones take the view that "if you're going to
kill someone, just do it - just walk up to the guy and do it. Don't
talk about it - don't threaten him ahead of time - don't be a girl about
it."

So, it should be with kill-filing - unless, as I suspect, the talking
about it is as far as it ever gets.
 
K

Keith Thompson

QuantumG said:
Can I suggest you move to comp.lang.c.moderated? That *is* what it is
for.

comp.lang.c and comp.lang.c.moderated have exactly the same purpose:
discussing the C programming language. They achieve this by different
means. In comp.lang.c.moderated, the moderator pre-filters all
postings. Here, there's no moderator, and we depend on posters to
show some common sense.

The disadvantage of comp.lang.c.moderated is that it's slow. The
comp.lang.c approach actually works pretty well, but it has the
disadvantage that we occasionally have to deal with trolls like you.

There are newsgroups where decompilers are topical. This is not one
of them. You are not welcome here, and you are making a fool of
yourself.
 
M

Martin Ambuhl

Igmar said:
That remarkable feature is called 'debugging'. You know, when you fire
up your debugger, it knows that at some point, a certains variable
exists, and how it's called.

Against my better judgment, I will dip my toe into this completely
off-topic thread. Your "counterexample" suggests that this "decompiler"
is a remarkably useless thing. If one has an executable in which the
symbolic information needed for us to know that a certain variable
exists and how it's named, then one has a pre-release copy of that
executable. That pre-release version should belong to people with
access to the source code, and those people have no use for this
"decompiler" at all.
 
M

Martin Ambuhl

QuantumG said:
Gee, I'm sorry. Please don't kick me... oh wait, this isn't a
moderated newsgroup. I'll talk about whatever the hell I like.

yes, you will. And you will be ignored by almost everyone, and those
who imagine that your head actually emerges from your anus will be
disabused. The only thing that you have accomplished is to alienate
people needlessly. This does not fir with your role as a propagandist.
 
W

Walter Roberson

If one has an executable in which the
symbolic information needed for us to know that a certain variable
exists and how it's named, then one has a pre-release copy of that
executable. That pre-release version should belong to people with
access to the source code, and those people have no use for this
"decompiler" at all.

I see several difficulties with those statements.

1) A binary for open-source code may have symbols intact, belongs
to people with access to the source code, and yet is not necessarily
pre-release.

2) You speak as if failure to strip the symbols from an executable
is Not Done, or is at least commercial suicide. A lot depends upon
market and commercial and technical support decisions.

Suppose for example (random example) that software to schedule subways
is sold: the vendor technical support might not have access to the
live system (e.g., firewalls, or because it isn't on the public net),
but might be able to provide useful support in some cases by talking a
client technical person through, "okay, now tell the debugger to print
out sch7_overld". Running a simulation at a vendor's machine does NOT
always suffice to track weird combinations of circumstances.

3) Libraries frequently get shipped with global variables and
function names exposed in the symbol table (for linking purposes
if nothing else), though information about local variables might
not be present.

4) Unix "namelist" is considered to provide important access
to parts of the kernel. Some of the key kernel variables may even be
safely alterable on a live system (e.g., via SGI IRIX's "systune".)


(Not that any of these have anything to do with deliberate
off-topic posting.)
 
I

Igmar Palsenberg

Martin said:
Against my better judgment, I will dip my toe into this completely
off-topic thread. Your "counterexample" suggests that this "decompiler"
is a remarkably useless thing.

Somewhat true.
If one has an executable in which the
symbolic information needed for us to know that a certain variable
exists and how it's named, then one has a pre-release copy of that
executable. That pre-release version should belong to people with
access to the source code, and those people have no use for this
"decompiler" at all.

Also true. I personally won't use such a thing.



Igmar
 
W

websnarf

QuantumG said:
Decompilation is the process of recovering human readable source code
from a program executable. Many decompilers exist for Java and .NET as
the program executables (class files) maintain much of the information
found in the source code. This is not true for machine code
executables however.

In recent years decompilation for machine code has moved from the
domain of crackpots and academic hopefuls to a number of real
technologies that are available to the general public. Decompilers for
machine code now exist which produce output that rivals disassemblers
as a tool for analysing programs for security flaws, malware or just
simply to see how something works. Full source code recovery that is
economically attainable will soon be a reality.

As it should be. While IdaPro is a great tool, its still too much of a
pain in the butt doing this sort of thing by hand. And there are not
many of us who can do it.
The legal challenges posed by this technology differs country to
country. As such, much research is being done in secret in countries
that prohibit some uses of the technology, whereas some research is
being done more publicly in countries that have laws which support the
technology (Australia, for example).

Its probably only illegal in the US and Japan (and maybe Canada).
Other countries obviously would like access to this "intellectual
property" which comes largely from the US. I think the whole point of
things like WIPO/WTO is to try to trick the rest of the world that they
should not simply steal "intellectual property" -- only the most
compliant nation states actually fall for this nonsense.
Boomerang is an open source decompiler written (primarily) by two
Australian researchers. Open source projects need contributors. If
you have an interest in decompilation, we'd like to hear from you.

I do, but mostly from an observational point of view.
We're not only interested in talking to programmers. The project
suffers from a lack of documentation, tutorials and community. There
are many tasks that can be performed by users with minor technical
knowledge.

So I looked at your page and through the examples. My question is
*which* compiler and .EXE outputs are you targetting? You are
declaring main as "int main (int argc, char * argv, char * envp)" which
I am pretty sure is GNU/UNIX-only. Is your plan to support a really
wide range of compilers?

The point being that WATCOM and Intel's compiler optimizations can
perform some pretty extreme code transformations. Intel does constant
propagation and function cloning, and for static functions WATCOM C/C++
just totally ignores function prologue/epilogue and may inline. You
*could* try to detect which compiler was used to compile the code,
however, its possible to link with different libraries and compilers
than the original object code compiler.
 
W

websnarf

Martin said:
Against my better judgment, I will dip my toe into this completely
off-topic thread. Your "counterexample" suggests that this "decompiler"
is a remarkably useless thing. If one has an executable in which the
symbolic information needed for us to know that a certain variable
exists and how it's named, then one has a pre-release copy of that
executable. That pre-release version should belong to people with
access to the source code, and those people have no use for this
"decompiler" at all.

You are making assumptions about how software is being delivered. For
example, the Mars Rover software, almost certainly had symbolic
information in it as it was running.

A company that doesn't have proper source control measures, or had
disgruntled employees might easily be in a position where a decompiler
would be handy to have.
 
W

Walter Roberson

My question is
*which* compiler and .EXE outputs are you targetting? You are
declaring main as "int main (int argc, char * argv, char * envp)" which
I am pretty sure is GNU/UNIX-only.

I'm not sure what you mean by "GNU/UNIX" in this context.

As a datapoint, the envp parameter is supported by SGI IRIX no
matter whether you are using SGI's MipsPro compilers or gcc or
SGI's older series of compilers, or the third-party commercial
compiler DCC that used to be available for IRIX.

It seems likely to me that the envp parameter is supported on
a variety of Linux; Linux is not UNIX.

Also, main with envp is allowed in at least some MS Windows.
I find some matches when searching for main envp site:microsoft.com
but unfortunately my browser is acting up right now so I can't
post an example link right at the moment.
 
M

Martin Ambuhl

You are making assumptions about how software is being delivered. For
example, the Mars Rover software, almost certainly had symbolic
information in it as it was running.

That is not a useful counterexample. There is no reason to deliver the
Mars Rover software with the necessary symbolic information for either
debugging or "decompiling" unless the source code was available. To
suggest that somehow "decompiling" becomes a useful thing on the basis
of Mars Rover is, frankly, not credible. To hallucinate that
a) the Mars Rover software was delivered with the symbolic information
provided, and
b) the Mars Rover software was delivered but with its source code
available, and
c) the process of "decompilation" would be reliable enough, or timely
enough, to make it preferred to simply getting the source code
is to live in the fringes of sanity.
 
W

Walter Roberson

Its probably only illegal in the US and Japan (and maybe Canada).

Not that it matters to comp.lang.c, but extracting [see the
reference for full restrictions]
http://laws.justice.gc.ca/en/C-42/230536.html#Section-30.6

30.6 It is not an infringement [..]

a) make a single reproduction of
the copy by adapting, modifying or
converting the computer program
or translating it into another
computer language if the person
proves that the reproduced copy is

(i) essential for the compatibility of
the computer program with a
particular computer,


Thus in Canada, decompilation is not inherently illegal, since the
above process might involve decompilation.
 
F

Frederick Gotham

QuantumG posted:
Full source code recovery that is economically attainable will soon be a
reality.


The addition of two numbers yields 100.

What are the two numbers?

The project suffers from a lack of documentation, tutorials and
community.


It also suffers from a lack of "reality".
 
D

dcorbit

QuantumG said:
Decompilation is the process of recovering human readable source code
from a program executable. Many decompilers exist for Java and .NET as
the program executables (class files) maintain much of the information
found in the source code. This is not true for machine code
executables however.

In recent years decompilation for machine code has moved from the
domain of crackpots and academic hopefuls to a number of real
technologies that are available to the general public. Decompilers for
machine code now exist which produce output that rivals disassemblers
as a tool for analysing programs for security flaws, malware or just
simply to see how something works. Full source code recovery that is
economically attainable will soon be a reality.

The legal challenges posed by this technology differs country to
country. As such, much research is being done in secret in countries
that prohibit some uses of the technology, whereas some research is
being done more publicly in countries that have laws which support the
technology (Australia, for example).

Boomerang is an open source decompiler written (primarily) by two
Australian researchers. Open source projects need contributors. If
you have an interest in decompilation, we'd like to hear from you.
We're not only interested in talking to programmers. The project
suffers from a lack of documentation, tutorials and community. There
are many tasks that can be performed by users with minor technical
knowledge.

For more information on machine code decompilation see the Boomerang
web site (http://boomerang.sourceforge.net/). For interesting
technical commentary on machine code decompilation, see my blog
(http://quantumg.blotspot.com/).

You want comp.compilers I think. This comes up once or so per year.

P.S.
You can't turn the DNA of a dead cow back into a cow. That sort of
thing only works on "Jurasic Park" movies.

When you want another cow, the best way to get one is to get a momma
cow and a daddy cow (sometimes known as 'bulls') and let them do their
business.

When you want to get your source code back, if you are using a compiled
language, the best thing is to restore from backup or pull from CVS.

I hope you succeed and make a workable decompiler, despite the known
impossibility of the general solution.

I also recommend that you stick to because that is
the arena where this sort of thing has ardent admirers.

Over here, in comp.lang.c we are not terribly interested in it. You
might say, "It's written in C!" but so is Microsoft Word, and
Microsoft Word is not topical here. You might say, "It outputs C
target language!" Which would be doubly interesting if the input were
a COBOL program but in any case, we don't care about that either.

Once you have it all working properly, I promise to give it a look.
Until then, don't go away mad -- just go away.
[If you know that a program was compiled by a particular compiler, I gather
it's possible to do pattern matching on the code idioms it uses to recover
more source than one might expect. And debug symbols help a lot. -John]
 
S

Stephen Sprunk

The point being that WATCOM and Intel's compiler optimizations can
perform some pretty extreme code transformations. Intel does
constant
propagation and function cloning, and for static functions WATCOM
C/C++ just totally ignores function prologue/epilogue and may
inline.
You *could* try to detect which compiler was used to compile the
code,
however, its possible to link with different libraries and compilers
than the original object code compiler.

I think we're all, except maybe for the OP, aware that it's impossible
to get the original source code back, just like using cloning cannot
get the original cow back from hamburger.

However, it is reasonable to think it's possible to generate source
code that can be recompiled to produce a binary that works at least as
well as the one you decompiled. The difference between the generated
source and the original source will vary depending on the extent of
optimizations (if any), presence of symbol information, amount of
preprocessing in the original, etc.

Just reaching this "round trip" goal would be a worthy exercise and
valuable for many purposes, some legal and some not.

S
 
Q

QuantumG

Martin said:
That is not a useful counterexample. There is no reason to deliver the
Mars Rover software with the necessary symbolic information for either
debugging or "decompiling" unless the source code was available. To
suggest that somehow "decompiling" becomes a useful thing on the basis
of Mars Rover is, frankly, not credible.

Really, the way you guys are hung up on symbols anyone would figure
you've never read source code written in French or German or whichever
natural language it is that you don't understand. Recreating sensible
symbols is not a problem people have trouble solving (for foriegn
languages, or even asm code), so clearly it is not a problem they will
have solving for the output of a decompiler.

QuantumG
 
Q

QuantumG

As it should be. While IdaPro is a great tool, its still too much of a
pain in the butt doing this sort of thing by hand. And there are not
many of us who can do it.

Which can be a good thing for some of us, but it's still a waste of
time if it can be automated.
So I looked at your page and through the examples. My question is
*which* compiler and .EXE outputs are you targetting? You are
declaring main as "int main (int argc, char * argv, char * envp)" which
I am pretty sure is GNU/UNIX-only. Is your plan to support a really
wide range of compilers?

Yes. Boomerang is a general decompiler for machine code. There is
very little compiler specific code in it. Primarily, we're not
interested in decompiling the runtime library code so we use patterns
to recognise it and find the start of the code that is interesting.
So, for example, we can load ELF binaries that were compiled by GCC and
skip all the glibc runtime to find main(). Or we can load an EXE that
was compiled by Microsoft Visual C and skip the libc runtime to find
WinMain() or, for console apps, main(). From there we use all general
algorithms and try to assume as little as possible about what compiler
was used to make the binary, if any was used at all!
The point being that WATCOM and Intel's compiler optimizations can
perform some pretty extreme code transformations. Intel does constant
propagation and function cloning, and for static functions WATCOM C/C++
just totally ignores function prologue/epilogue and may inline. You
*could* try to detect which compiler was used to compile the code,
however, its possible to link with different libraries and compilers
than the original object code compiler.

Yes, absolutely. We currently don't try to "uninline" anything, and if
two constants are combined by the compiler they will have to be
uncombined by the user after the decompiler has done its job.

QuantumG
 
M

MQ

Frederick said:
The addition of two numbers yields 100.

What are the two numbers?

10 + 10, or 11 + 1, igoring identities and arithmetic equivalents of
the above.

MQ
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,995
Messages
2,570,230
Members
46,819
Latest member
masterdaster

Latest Threads

Top