ASM => C

U

Unsolved Mysteries

Anyone know of a translator that converts an Intel Pentium assembly
listing into C? The quality of output code doesn't have to be great,
so long as it's accurate.
 
G

Gordon Burditt

Anyone know of a translator that converts an Intel Pentium assembly
listing into C? The quality of output code doesn't have to be great,
so long as it's accurate.

It is possible to write an emulator for a Pentium CPU in C.
From there, you can add

unsigned char memory[] = {
(code for program goes here)
(oh, yes, you probably have to throw in a copy of
the BIOS ROM and the OS, too)
};

and the emulator will run the code, and it's written in C.
(You will probably have to do something more specific about I/O
getting to a real device, and the emulator probably won't run
real-time).

Gordon L. Burditt
 
U

Unsolved Mysteries

(e-mail address removed) wrote...
Anyone know of a translator that converts an Intel Pentium assembly
listing into C? The quality of output code doesn't have to be great,
so long as it's accurate.

It is possible to write an emulator for a Pentium CPU in C.
From there, you can add

unsigned char memory[] = {
(code for program goes here)
(oh, yes, you probably have to throw in a copy of
the BIOS ROM and the OS, too)
};

and the emulator will run the code, and it's written in C.
(You will probably have to do something more specific about I/O
getting to a real device, and the emulator probably won't run
real-time).

Gordon L. Burditt

I am looking for something that more directly translates an .s (or
..asm) file into a .c file -- but that's an interesting observation
you've made.
 
E

E. Robert Tisdale

Something said:
Anyone know of a translator
that converts an Intel Pentium assembly listing into C?

In general, no.
C does not implement all Intel machine instructions.

Design information is discarded when C is converted to assembler.
More generally, this is a problem
when programmers fail to document their designs
*before* they implement them as C programs
because they never get around to documentation
after the design is working.
It is very hard to "reverse engineer" undocumented C code
because the original author's intent is unknown.
The quality of output code doesn't have to be great,
so long as it's accurate.

I used Google

http://www.google.com/

to search for

+"convert assembler to C"

and I found a couple of things that might interest you.
 
K

Kevin D. Quitt

Generally speaking, what you're asking cannot be done. Even assuming the
assembly was generated by a C compiler, it's still impossible in theory.
The general statement is "You can't make steak from a hamburger" -
information is destroyed in the compilation process; you cannot recover
the original code. Go to vivisimo and search for 'decompilation'.

You *can* generate a C program that does the equivalent of what the
assembly code does, although that depends on how strictly you define
'equivalent'. What you cannot do is create a C program that is guaranteed
to do all and only the things the original C program did. Information on
variable types can be lost. For example, a variable may be signed in the
C code, but the assembly gives no indication because there was nothing in
the C that actually made use of the sign. If you make it unsigned in the
decompilation, it's possible to get behaviour that is different from the
original program.

That being said, my rates are semi-reasonable.
 
S

Stephen Sprunk

Unsolved Mysteries said:
I am looking for something that more directly translates an .s (or
.asm) file into a .c file -- but that's an interesting observation
you've made.

Disassembly works because there is a 1:1 (or nearly so) correspondence
between machine and assembly code. There is no such correspondence between
assembly and C; there are an infinite number of C sources that could result
in the same assembly listing and vice versa.

So, if you want the original C source corresponding to a given assembly
file, you're completely out of luck. If you want _any_ C source that might
compile to a given assembly listing, you might have a chance of writing such
a program, but AFAIK none exists. Even with debug symbols (which aren't
guaranteed to exist), the C you end up with is unlikely to even
superficially resemble the original C program or even any C that a human is
likely to write.

S
 
U

Unsolved Mysteries

(e-mail address removed) wrote...
Disassembly works because there is a 1:1 (or nearly so) correspondence
between machine and assembly code. There is no such correspondence between
assembly and C; there are an infinite number of C sources that could result
in the same assembly listing and vice versa.

So, if you want the original C source corresponding to a given assembly
file, you're completely out of luck. If you want _any_ C source that might
compile to a given assembly listing, you might have a chance of writing such
a program, but AFAIK none exists. Even with debug symbols (which aren't
guaranteed to exist), the C you end up with is unlikely to even
superficially resemble the original C program or even any C that a human is
likely to write.

The mapping between some large number of C programs and a given
assembler listing is understood, and OK.

I don't care at all how close to the original program it is.
 
E

Eric Sosman

Unsolved said:
(e-mail address removed) wrote...



The mapping between some large number of C programs and a given
assembler listing is understood, and OK.

I don't care at all how close to the original program it is.

Then what's the purpose of creating "trashy" C source?
The value of a source file is that it can be read and
understood, then modified and recompiled to produce a new
program. If it's unreadable (or nearly so) it's also
unmodifiable (o.n.s.) -- so, what do you intend to do with
your cow made from hamburger?
 
S

Stephen Sprunk

Eric Sosman said:
Then what's the purpose of creating "trashy" C source?
The value of a source file is that it can be read and
understood, then modified and recompiled to produce a new
program. If it's unreadable (or nearly so) it's also
unmodifiable (o.n.s.) -- so, what do you intend to do with
your cow made from hamburger?

If nothing else, it makes a great project for undergrads ;)

Depending on how smart the decompiler is, it might do a reasonable job of
ferreting out calling conventions, flow control instructions, etc. With
debug information available, it could even get the function and variable
names (and types?) right. That's certainly more readable/modifiable for me
than what I get from a disassembler, but the usefulness is still low
compared to the original source.

S
 
U

Unsolved Mysteries

(e-mail address removed) wrote...
Then what's the purpose of creating "trashy" C source?

I said it doesn't have to be the original C. While I think we could
agree that there are many, many readable C programs that do the same
thing, your question implies otherwise.
The value of a source file is that it can be read and
understood, then modified and recompiled to produce a new
program. If it's unreadable (or nearly so) it's also
unmodifiable (o.n.s.) -- so, what do you intend to do with
your cow made from hamburger?

But if it's a readable C program, then your question is badly formed.
Nonetheless: C is more portable than ASM, last time I looked.
 
E

Eric Sosman

Unsolved said:
(e-mail address removed) wrote...
Unsolved said:
[...]
I don't care at all how close to the original program it is.

Then what's the purpose of creating "trashy" C source?

I said it doesn't have to be the original C. While I think we could
agree that there are many, many readable C programs that do the same
thing, your question implies otherwise.

No (or at any rate, I don't think so): I'm suggesting
that mechanical dis-compiling is likely to produce one of
the many possible *un*readable C sources for the object code.
But if it's a readable C program, then your question is badly formed.

If the output is readable, consider yourself either lucky
or an excellent reader ... In any case, questions are valid or
invalid on their premises, not on whatever the answer turns out
to have been.
Nonetheless: C is more portable than ASM, last time I looked.

It depends rather strongly on the C: you have but to lurk
on this newsgroup for a few days to see enough examples of
wildly non-portable C as you can stomach. Here's a plausible
example: somewhere in the object code you find instructions
that load a `double' register from one location and store
it to another. Your dis-compiler may well generate

*(double*)p = *(double*)q;

.... which accurately reflects the object code. Portable?
By no means! What was *really* going on was

struct st { short s; int i; };
struct st x = { 42, 42 };
/* here come the instructions in question: */
struct st y = x;

.... where the compiler decided to copy an eight-byte struct
by copying an eight-byte `double'. How portable is this?
Not very! The compiler has taken advantage of its own non-
portable knowledge in generating the code, as it is permitted
to do. Is the idea that sizeof(struct st) == sizeof(double)
portable? No, it is not. How about alignment: Is there any
guarantee that "alignof(struct st)" >= "alignof(double)"?
No, there is not. How about preservation of representation:
Is there any guarantee that loading something that might look
like a signalling NaN into a `double' register will preserve
its bit pattern for the subsequent store? No, there is not.

If you hope to dis-compile on machine A and re-compile
on machine B and get working code, you may well hope and your
hope may be rewarded, at least some of the time. But you would
be well-advised not to expect much ...
 
U

Unsolved Mysteries

(e-mail address removed) wrote...
Unsolved said:
(e-mail address removed) wrote...
Unsolved Mysteries wrote:
[...]
I don't care at all how close to the original program it is.

Then what's the purpose of creating "trashy" C source?

I said it doesn't have to be the original C. While I think we could
agree that there are many, many readable C programs that do the same
thing, your question implies otherwise.

No (or at any rate, I don't think so): I'm suggesting
that mechanical dis-compiling is likely to produce one of
the many possible *un*readable C sources for the object code.

OK. I guess there _would_ be more unreadable than readable resulting
sources, and purely mechanical rendering would be much more likely to
produce one of the former than one of the latter.
If the output is readable, consider yourself either lucky
or an excellent reader ... In any case, questions are valid or
invalid on their premises, not on whatever the answer turns out
to have been.


It depends rather strongly on the C: you have but to lurk
on this newsgroup for a few days to see enough examples of
wildly non-portable C as you can stomach.

Cripes. I'm 0-2 here. Point taken.
 
M

Mac

Anyone know of a translator that converts an Intel Pentium assembly
listing into C? The quality of output code doesn't have to be great,
so long as it's accurate.

Funny, I thought this was in the FAQ list, but when I went to look for it
I couldn't find it.

It probably should be in the FAQ list.

--Mac
 
R

Richard Bos

Unsolved Mysteries said:
(e-mail address removed) wrote...

The mapping between some large number of C programs and a given
assembler listing is understood, and OK.

For a pre-determined compiler, used on a pre-determined platform, with
pre-determined options, perhaps. If you don't know any of that, it isn't
and when push gets to shove cannot be understood.
I don't care at all how close to the original program it is.

Then you have a better chance at generating _something_, but still
hardly a chance of generating something legible, let alone maintainable.

Richard
 
M

Martijn

Unsolved said:
Anyone know of a translator that converts an Intel Pentium assembly
listing into C? The quality of output code doesn't have to be great,
so long as it's accurate.

I know there are decompilers that can generate C code, but the problem is
that it will just catch simple things, such as loops etc. The resulting C
source still is far from portable, especially because escapes to assembler
are made throughout the source.

One big issue is the stub that a C compiler puts in front of every program
it generates. This stub may not (and probably will not) be recognized as
such, causing the decompiler to generate some strange code which is
completely redundant.

If what you are trying to do is reverse engineer a program by getting the C
equivalent for it, any C generator will only help so little and
understanding of the machine on which the original program was intented to
run is still necessary, as well as some basic ASM knowledge.

If your goal is to generate a portable version of a piece of software, you
might as well forget about it and see if you can make a deal with the
original developer.

Good luck either way!
 
I

Ira Baxter

Anyone know of a translator that converts an Intel Pentium assembly
listing into C? The quality of output code doesn't have to be great,
so long as it's accurate.

Check out Software Migrations, Ltd. http://www.smltd.com/
They supply a service to do this.
I believe it to be very high quality, although I have not used it.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,161
Messages
2,570,891
Members
47,423
Latest member
henerygril

Latest Threads

Top