Ruby -Word2002-XML

U

Useko Netsumi

I have a word 2002 document with a table in it. Is there a way of using ruby
to extract the table(5 fields) and then parse the content of the table and
spit it out to an XML file.

Thanks
 
J

James

Useko said:
I have a word 2002 document with a table in it. Is there a way of using ruby
to extract the table(5 fields) and then parse the content of the table and
spit it out to an XML file.

Well, the short answer is "Yes." You can use WIN32OLE to create an
instance of Word, then script it as you might using VBA.

require 'win32ole'
application = WIN32OLE.new('Word.Application')
# Do stuff with application

See http://homepage1.nifty.com/markey/ruby/win32ole/index_e.html
(WIN32OLE is also part of Ruby 1.8)

You'll need a decent reference to the Word DOM; I believe you should be
able to get that on msdn.microsoft.com.

Back in another life I wrote a bunch of VB/VBA that took a Word 2000
doc, walked the object model, and spit out XML.

I have no idea how much of the Word 2000 DOM has carried over to Word
2002. I seem to recall that there was a collection of tables, and then
some API2 for walking the rows and fields.

If you think it might help, the source code can be found at
http://www.jamesbritt.com/code/ProVb6XmlBookCode.zip

That zip holds a bunch of other zip files. Look at Word2Xml.zip, and
ConvertToXml.dot for the macros that call inot VB code.

(Never thought *that* would be of any relevance here ... )

James
 
J

Josef 'Jupp' SCHUGT

Hi!

* Useko Netsumi; 2003-12-10, 23:57 UTC:
I have a word 2002 document with a table in it. Is there a way of
using ruby to extract the table(5 fields) and then parse the
content of the table and spit it out to an XML file.

I don't use Word 2002 (perhaps it is not available for Linux?) but I
assume it is some kind of text processor that does not use XML as its
internal format. The main problem is the format used, so the best
idea seems to be converting data to something useful:

- copy table, past it into a document of its own, export to csv

- same, but export to some kind of SGML or XML (HTML for example)

- if 'Word 2002' happens to stand for 'Microsoft Word 2002' you may
try 'antiword'

Josef 'Jupp' SCHUGT
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,142
Messages
2,570,818
Members
47,362
Latest member
eitamoro

Latest Threads

Top