Merging two Word documents with Ruby?

D

Denver Mike

I've got a bugger of a problem and I thought I'd toss it out there to
see if anyone can provide any guidance.

I'm working on an application that needs to merge two Microsoft Word
documents. However, the application will definitely run on a Linux
server, so Word won't be installed.

My only thought would be to use the new XML format -- maybe I can find a
way to merge two documents with those files.

Has anyone else had any experience merging Word documents in Ruby (and
Rails)? Any other experience in manipulating Word documents in other
ways?

Denver Mike
 
G

Graham

Several points
- What do you mean by "Merge"?.. Word documents have structure and the
interleaving of lines or words would appear to make little sense.

- Unless your application and user base is new, then you will have many
files NOT in the XML format, in which case you would need to convert
them - and would need Word installed somewhere. Perhaps you could
reconsider your platform choice (to make the problem simpler) - or if
you have no pre-existing documents reconsider your approach to make
Word unecessary? Word can read a wide variety of document types
(including HTML) - so perhaps this is another way to simplify your
problem.

More details required...
Graham
 
E

Edwin van Leeuwen

Denver said:
My only thought would be to use the new XML format -- maybe I can find a
way to merge two documents with those files.

The only way I see is to use openoffice. There must be a script
somewhere to run openoffice in batch convert mode. That way you can
convert the doc format to odf. ODF is xml based, so should be mergeable.
The xml based format of microsoft is not used yet. The first office
version that will support that is office 12 and not released yet
 
P

Paul Duncan

--9b/uWrIH8C2V3aH3
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

* Denver Mike ([email protected]) wrote:
[snipped]
I'm working on an application that needs to merge two Microsoft Word=20
documents. However, the application will definitely run on a Linux=20
server, so Word won't be installed.

There's the POI Ruby bindings, although I've never used them myself and
have no idea how good they are.

http://jakarta.apache.org/poi/poi-ruby.html

If that doesn't work, I'd try wv and catdoc, respectively.
Denver Mike

--=20
Paul Duncan <[email protected]> pabs in #ruby-lang (OPN IRC)
http://www.pablotron.org/ OpenPGP Key ID: 0x82C29562

--9b/uWrIH8C2V3aH3
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: Digital signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.0 (GNU/Linux)

iD8DBQFDqV7szdlT34LClWIRAn5FAJ9rlEtiZDRpXZvTUuYa2K0uosVixwCgrZrI
QW8Ny+OuAFAJbgpJP8VkjqA=
=6Urs
-----END PGP SIGNATURE-----

--9b/uWrIH8C2V3aH3--
 
D

Denver Mike

- What do you mean by "Merge"?.. Word documents have structure and the
interleaving of lines or words would appear to make little sense.

Thanks for your thoughts on this Graham. By "merge", I meant appending
one Word document to the end of another, but to make things more
complicated, I need to add text into the headings across the entire
document.
 
E

Edwin van Leeuwen

Denver said:
Thanks for your thoughts on this Graham. By "merge", I meant appending
one Word document to the end of another, but to make things more
complicated, I need to add text into the headings across the entire
document.

Microsoft word has something called a master document. Maybe you could
add a masterdocument that inclkudes both files+extra headings. This
masterdocument might be simple enouh that you can actually reverse
engineer it. (Create one in word once and just edit the parts you need
to edit with ruby).
 
W

Wilson Bilkovich

Thanks for your thoughts on this Graham. By "merge", I meant appending
one Word document to the end of another, but to make things more
complicated, I need to add text into the headings across the entire
document.
This can actually be extremely complex, because a named style (such as
'Body', 'Normal', or 'Heading 1') can (and will) have different
properties (fonts, colors, sizes, margins, encoding, etc) in each of
the two documents. You will need to rename every style and style
reference in the second document in order to prevent the two from
colliding.
 
L

Lei Wu

If you have a choice, don't use Word document. Use RTF format instead.
RTF files can be opened the same way as Word documents, but are a lot
easier to process.

Lei
 
D

Daniel Calvelo

If your documents are properly structured using styles (which is rare)
and they share the same styles (and I mean the *same* styles), you can
try to use openoffice in remote command mode, convert the .doc into
..odt, parse the xml of both files, proceed to merge the XMLs and
rebuild an odt file; perhaps going through OOo again to have a .doc
back. But you will need to ensure that the styles are always converted
into something reliably identifiable.

FAO (the UN branch for food and agriculture) uses a template system
(thus forcing a set of styles) which is used to output RTF which is
converted into XML for storage. Are your documents existing legacy ones
or is this a new setup? If you're building it all, then you might
seriously consider using openoffice all the way.
 
D

Dave Howell

Thanks for your thoughts on this Graham. By "merge", I meant appending
one Word document to the end of another, but to make things more
complicated, I need to add text into the headings across the entire
document.

Does it still need to be a Word document when you're done? An entirely
different approach would be to use some kind of Word file display
program and make PDFs of the files, then chain the PDFs together. Do
the headers by slapping a white block over the existing headers and
writing a new header over them.

Personally, my approach would be to abandon the project as just too
messy for words. :)
 
D

Daniel Calvelo

OpenOffice.org can do the .doc to pdf conversion. I like your idea very
much, Dave. Maybe PostScript would be easier to fiddle with ex-post.
 
D

Dave Howell

OpenOffice.org can do the .doc to pdf conversion. I like your idea very
much, Dave. Maybe PostScript would be easier to fiddle with ex-post.

Probably. If you have a program that lets you overlay one PDF page on
another, then your best bet is to output a PDF page with your header in
it. (I'd probably use TeX, or maybe script OSX's TextEdit program, and
my copy of full Acrobat 4 for the page overlay.) The other alternative
would be to create (or have somebody create for you) an .eps with the
white box and a line of text in a program like Freehand or Illustrator.
If you pop open the .eps file in a text editor, you'll find it not too
difficult to programmatically replace the text, although you won't
easily be able to duplicate the kerning and other textual adjustments.
Have OpenOffice print to a postscript file, then figure out what you
can use as a page marker in order to embed the .eps in that file on
each page so that it comes after (and thus covers) the original
headers, if any. Then feed the modified .ps file into a PDF distiller.

That's what I'd try, I think.
 
H

hari

hi guys,

i have got a doubt .hopeu guy can help

I need to build a utility ,which if i run ,i need to merger two MS wor
documents & i should be able to print the meged document enabling us to
select the ptions of "remove header" & "remove footer"
& consecutively should print document with footer/header removed

help
 
H

hari

hi guys,

i have got a doubt .hopeu guy can help

I need to build a utility ,which if i run ,i need to merge two MS wor
documents & i should be able to print the merged document enabling us to
select the options of "remove header" & "remove footer"
& consecutively should print document with footer/header removed

help -pls
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,982
Messages
2,570,185
Members
46,736
Latest member
AdolphBig6

Latest Threads

Top