Parsing Word Doc files

R

Regent

Hi, friends

I searched cpan in vain for a module that can read (parse) MS Word .doc files. Can someone refer me to one somewhere or give me a hint about the structure of such files? Thanks!

Regent
(e-mail address removed)
 
C

Chris

Regent said:
Hi, friends

I searched cpan in vain for a module that can read (parse) MS Word .doc files. Can someone refer me to one somewhere or give me a hint about the structure of such files? Thanks!

Assuming you are (1) on *nix and (2) wanting to use Perl to just get the
text from a Word document, trying piping the results of 'antiword' into
Perl:

$ antiword microslop.doc | perl script.pl # or...
$ antiword microslop.doc | perl -e 'while (<>) { chomp; do_stuff; }'

Chris
 
M

Michele Ouellet

Can someone refer me to one somewhere or give me a hint about the
structure of such files? Thanks!

The format is proprietary and liable to change at any time. If you are
running under Windows, your best bet is probably to use OLE ( for instance,
the Win32::OLE module in the ActiveState Perl distribution ). You then need
to study the Word object model for the manipulations you want to perform; VB
code can generally found and adapted to your needs.

If you are not running under Windows, you might find something in
SourceForge, not necessarily in Perl, though.

Mich¨¨le.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,146
Messages
2,570,832
Members
47,374
Latest member
EmeliaBryc

Latest Threads

Top