An often asked question - document consistency

G

Geico Caveman

Hello,

I am a long time user of LaTeX on Linux platform. I have episodically used
OpenOffice.org Writer and Microsoft Word 2003 (using Crossover Linux) to
satisfy a few people who insist in putting habit over quality.

However, now I am faced with a situation that is probably familiar to some
of you. I have a document that needs to be available as PDF, as LaTeX
source code (not the least for myself), and unfortunately, as a DOC file at
the same time. The first two are easy to arrange, and I have been using
pdflatex for years to produce high quality pdfs. The last is the problem.
For reasons that are obvious, and need not be discussed, doc is kind of a
stand alone format, refusing to play nice with anything else.

I have been looking at xml format as a possible way out of this mess. Is it
possible for me to convert LaTeX to xml (texml claims to do this), and then
have Microsoft Word 2003 read this somehow ? I do fear that true to form,
Microsoft Office 2003 XML might be inconsistent in some fashion with the
output of that process (would be too standard otherwise for Word).

The other option seems to be to use mk4ht/oolatex to convert the document
to odt and then save as doc using OpenOffice.org. I do not like that
approach as I know from personal experience - OpenOffice.org's doc export
is not perfect, and becomes increasingly deficient for more complicated
documents. Its a miracle that the doc export works to the extent it does,
but its not acceptable for my documents which are often very complicated.
Export to Microsoft Office 2003 XML has problems when Word 2003 sometimes
fails to read the documents generated.

Any suggestions (short of asking me to maintain two versions manually, one
in LaTeX, and the other in Word) would be very welcome.

Thanks.
 
R

Robert Heller

Hello,

I am a long time user of LaTeX on Linux platform. I have episodically used
OpenOffice.org Writer and Microsoft Word 2003 (using Crossover Linux) to
satisfy a few people who insist in putting habit over quality.

However, now I am faced with a situation that is probably familiar to some
of you. I have a document that needs to be available as PDF, as LaTeX
source code (not the least for myself), and unfortunately, as a DOC file at
the same time. The first two are easy to arrange, and I have been using
pdflatex for years to produce high quality pdfs. The last is the problem.
For reasons that are obvious, and need not be discussed, doc is kind of a
stand alone format, refusing to play nice with anything else.

Correct: "doc is kind of a stand alone format, refusing to play nice
with anything else" -- this is root of the problem.
I have been looking at xml format as a possible way out of this mess. Is it
possible for me to convert LaTeX to xml (texml claims to do this), and then
have Microsoft Word 2003 read this somehow ? I do fear that true to form,
Microsoft Office 2003 XML might be inconsistent in some fashion with the
output of that process (would be too standard otherwise for Word).

Most likely this is the case. Just because MS-Word 2003 uses XML does
not mean very much.
The other option seems to be to use mk4ht/oolatex to convert the document
to odt and then save as doc using OpenOffice.org. I do not like that
approach as I know from personal experience - OpenOffice.org's doc export
is not perfect, and becomes increasingly deficient for more complicated
documents. Its a miracle that the doc export works to the extent it does,
but its not acceptable for my documents which are often very complicated.
Export to Microsoft Office 2003 XML has problems when Word 2003 sometimes
fails to read the documents generated.

Any suggestions (short of asking me to maintain two versions manually, one
in LaTeX, and the other in Word) would be very welcome.

There really is no *good* (read: perfect and totally automated) way of
doing what you want. There are many reasons for this (and you seem to
be aware of all/most of them).
 
O

Oleg Paraschenko

Hello,

Hello,

I am a long time user of LaTeX on Linux platform. I have episodically used
OpenOffice.org Writer and Microsoft Word 2003 (using Crossover Linux) to
satisfy a few people who insist in putting habit over quality.

However, now I am faced with a situation that is probably familiar to some
of you. I have a document that needs to be available as PDF, as LaTeX
source code (not the least for myself), and unfortunately, as a DOC file at
the same time.

In such cases, I use an Open Office (or Word) document as the source
document. I accurately use styles only, without any manual formatting,
therefore

* I can save the document as a raw XML. Then
* an XSLT program converts the raw XML to an XML with a logical
structure, and
* TeXML plus Consodoc make LaTeX and PDF.
 
P

Peter Flynn

Geico said:
I am a long time user of LaTeX on Linux platform. I have episodically
used OpenOffice.org Writer and Microsoft Word 2003 (using Crossover
Linux) to satisfy a few people who insist in putting habit over
quality.

A common position.
However, now I am faced with a situation that is probably familiar to
some of you. I have a document that needs to be available as PDF, as
LaTeX source code (not the least for myself), and unfortunately, as a
DOC file at the same time.

Is this (are these?) documents you author yourself, or are they written
by someone else over whose software you have no control?
The first two are easy to arrange, and I have been using pdflatex for
years to produce high quality pdfs. The last is the problem. For
reasons that are obvious, and need not be discussed, doc is kind of a
stand alone format, refusing to play nice with anything else.

The .doc format is valueless, as you clearly understand. It is also now
obsolescent.
I have been looking at xml format as a possible way out of this mess.

Correct choice. Unfortunately the XML created by Word tends to be just
as valueless as .doc, as all it does it provide an XML-readable
expression of the visual appearance, unless you are rigorously using a
very carefully-designed stylesheet.
Is it possible for me to convert LaTeX to xml (texml claims to do
this),

Yes, but with some difficulty, and not a lot of reliability unless the
document structure and markup is very simple.
and then have Microsoft Word 2003 read this somehow ?

Not at all easily.
I do fear that true to form,
Microsoft Office 2003 XML might be inconsistent in some fashion with the
output of that process (would be too standard otherwise for Word).

Word is capable of reading a non-WordML XML document but it requires a
Schema and some massaging. Not a path I would choose to tread.
The other option seems to be to use mk4ht/oolatex to convert the
document to odt and then save as doc using OpenOffice.org. I do not
like that approach as I know from personal experience -
OpenOffice.org's doc export is not perfect, and becomes increasingly
deficient for more complicated documents.

That about sums it up. Other wordprocessor conversions to Word format
have similar restrictions.
Its a miracle that the doc export works to the extent it does, but
its not acceptable for my documents which are often very
complicated.

Ah. Can you give a minimal example?
Export to Microsoft Office 2003 XML has problems when Word 2003
sometimes fails to read the documents generated.

Any suggestions (short of asking me to maintain two versions
manually, one in LaTeX, and the other in Word) would be very
welcome.

The canonical solution to this is to author the documents in a suitable
document type in XML to start with, and then convert them to your output
targets using XSLT. Store the master in XML, generate what you need. In
general, I avoid using anything other than XML for the master version of
a document.

Transformation using XSLT to LaTeX is similar to transformation to HTML,
and not difficult to achieve. Making the generated LaTeX source look
pretty is more difficult (if that is a requirement), but if all you need
is LaTeX code that compiles without error, it's straightforward, modulo
the complexity of your document structure.

Transformation to WordML and similar is also possible but much more
complex, because there is a vast amount of redundancy to include, and
there are multiple ways of achieving the same result. As another writer
has asked, do these Word documents need to be editable? Without knowing
the level of complexity, it's hard to be more specific.

A shortcut which may work is to transform the document to very carefully
constructed XHTML with an embedded <style> header element, and then
rename the output file to end in .doc. Word is undiscriminating, and
seems to open such files in native (.doc) mode without complaint, but I
have only used this method for relatively simple documents.

///Peter
XML FAQ: http://xml.silmaril.ie/
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Staff online

Members online

Forum statistics

Threads
474,008
Messages
2,570,270
Members
46,872
Latest member
Stephendes

Latest Threads

Top