need XML schema to store infomation in a language neutral format

A

AViS

Hi,
I am building a language translator, that must convert input from
source languages to a language neutral format in XML. This XML must be
read by the target language translator and produce the output in the
target language. I am thinking of using a hashed map to handle
translations but am have trouble in deciding on the schema in which the
XML must be stored


The application must work as follows...
{c translator} <---> | X M L | <---> {vb translator}
int i; stored in Dim i as Integer
printf("%d",i); neutral format Print i

Proposed XML format:
<translate>
<action index=1>i</action>
<action index=2>i</action>
</translate>

the index attribute of the XML tag action will refer to a hash table
that will aid in translations thus
__________________________________________________________________
| index | c | vb |
|==================================================================|
| 1 | int $token | Dim $token as Integer |
| 2 | printf("%d",$token) | print $token |
===================================================================


Is the XML format and translation method I propose sufficient. Please
consider that the conversion is 100% possible (meaning my translator
excludes C's asm, pointers etc.)
 
S

Stefan Ram

AViS said:
I am building a language translator, that must convert input from
source languages to a language neutral format in XML.

There is no language neutral format. Or - in other words:
A "language neutral format" is just another language.
I am thinking of using a hashed map to handle translations

Mentioning a low-level implementation detail as a hashing when
talking about a very high-level task seems inappropriate.
Is the XML format and translation method I propose sufficient.

Even XML is a low-level implementation detail when in fact you
should be talking about annotated trees or similar structures.

It will be possible for you to translate a small restricted
and controlled subset of both languages. More might be beyond
the capabilities of most individuals, though very gifted
programmers or organizations might be able to translate a
large part of both languages: But I expect this to be a huge
effort.
 
A

AViS

There is no language neutral format. Or - in other words:
A "language neutral format" is just another language.
"Language Neutral" was meant to be in the sense that I did not want the
source language to be recognized by looking at the XML, in other words
if 'printf' in c is translated to 'X' in the XML then an equivalent
'cout' in c++ must also be translated to the same 'X'
Mentioning a low-level implementation detail as a hashing when
talking about a very high-level task seems inappropriate.
Sorry about the hash map, I just thought it would make more sense to
explain the index attribute of the xml <action> tag along with the hash
map

Even XML is a low-level implementation detail when in fact you
should be talking about annotated trees or similar structures.
Can you explicate more about "annotated trees or similar structures". I
am not able to find much info on the net. Even if you can suggest some
sites, that'll go a long way
It will be possible for you to translate a small restricted
and controlled subset of both languages...
It is enough if it is translates only a subset, in other words...
though I need to find a way to store in the XML, the functions held by
a c++ class, when translating from xml to c the functions will be
removed and the class be converted to a typedef struct.
 
S

Stefan Ram

AViS said:
"Language Neutral" was meant to be in the sense that I did not
want the source language to be recognized by looking at the XML,

This is easy: If any code, such as

printf( "%d", i );

is given, and I tell you that it was translated from a language
X, there is no way for you, to find out what X is. So /every/
representation will fulfil this requirement.
in other words if 'printf' in c is translated to 'X' in the
XML then an equivalent 'cout' in c++ must also be translated
to the same 'X'

In general, equivalence between two programs is undecidable.

See »Equivalence Problem« in

http://www.cs.rochester.edu/u/nelson/courses/csc_173/computability/undecidable.html

However, for a restricted domain you might indeed suceed
to find such a representation. One possibility would be
to translate the C++ into C as early C++ compilers did.
Can you explicate more about "annotated trees or similar structures". I
am not able to find much info on the net. Even if you can suggest some
sites, that'll go a long way

Starting points might be

http://en.wikipedia.org/wiki/Abstract_syntax_tree
http://www.cse.iitk.ac.in/users/raj/cs335/WebNotes/lec17.html

An annotated tree is a tree with annotations, which might be
represented as attributes in XML. While "annotated tree" means
the information structure itself, an XML documented is one way
to represent such an information structure using a text
document.

Maybe, to ask in this XML newsgroup, you should try to isolate
that part of your question that is directly related to the
language XML from the rest that deals with your algorithm, but
has nothing to do with XML.
 
J

Joseph Kesselman

It sounds like you're talking about an XML representation of an
Intermediate Language general enough to cover multiple source languages.
Your first step, therefore, is to find or design that IL; from there,
writing an XML rendering of it is straightforward.

I'd recommend reading any of the standard reference works on compiler
design as a starting point for picking your IL. Note that its required
characteristics are going to depend heavily on exactly what operations
you're going to want to perform against that representation.
 
A

AViS

Thanks Stefan and Joseph.
The IL in XML was intended to be my Proof of Concept for a much bigger
initiative
I shall try to keep this thread updated with the latest.


Thanks again.
 
?

=?iso-8859-1?q?Jean-Fran=E7ois_Michaud?=

Stefan said:
This is easy: If any code, such as

printf( "%d", i );

is given, and I tell you that it was translated from a language
X, there is no way for you, to find out what X is. So /every/
representation will fulfil this requirement.

The idea of using an intermediate language might not be the best way to
go about it but so what? Let the guy explore, he might find
interresting things and he surely will learn alot.
In general, equivalence between two programs is undecidable.

This whole thing seems fishy to me. If 2 languages are Turing complete,
then they can both represent everything that is representable by a
Turing machine which is everything that is computable. This means that
any program representation in the first language DOES have an
equivalent representation in the second language.

Knowing weather 2 given programs written in 2 different languages are
indeed functionally equivalent if both languages are Turing complete is
far from being a trivial problem but it is possible.

Baloney. If the input and output subsets of each program are known for
both programs then they can be compared to evaluate if they are
functionally equivalent. The Equivalence problems speaks of Equivalence
in general terms (whatever that means (nothing in context if you ask
me)). The difficulty resides in our inability to track very complex
problems. They are not impossible to solve, they are simply too complex
to aprehend when taken as a whole upfront.

Of course the proof makes sure not to mention any specific languages.
The proof applies to a program that would compute equivalence for ANY 2
programs. No such program can exist in the first place. The guy isn't
trying to translate anything to everything else, he's writing a
translater that goes from one language to another. Quite challenging,
but not impossible.

[snip]

Regards
Jean-Francois Michaud
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,002
Messages
2,570,261
Members
46,859
Latest member
VallieMcKe

Latest Threads

Top