M
Matt Garrish
Not a question, but a solution.
I had to convert a number of Word files to sgml today and at the same time
retain the font formatting (in the form of <b>, <i>, etc. tags). I know this
can be done by saving to html and cleaning up the other html garbage spit
out, but these files were already nicely styled and the conversion program
was meant to take advantage of this fact.
I also couldn't write this search/replace as a VBA macro, because the
program is automatically launched by a service watching a specific directory
for files dropped into it (and I couldn't find any way to run a Word macro
via OLE; I could use $word->Run() from the command line, but it wouldn't
work when run by the service, even when I changed who the service was run
as).
The code snippet below should be self-explanatory, but it sure was a
headache trying to convert the data structures. Hope this comes in handy for
anyone trying to do the same. I also encourage anyone else to post to this
thread any examples of VBA -> Perl code they think others might find useful,
as I found there is really very little to work from (or feel free to point
out where my code can be optimized).
Original VBA macro to find bolded text:
Dim rngSearch As Word.Range
Set rngSearch = ActiveDocument.Content
With rngSearch.Find
.Format = True
.Forward = True
.Wrap = wdFindStop
.MatchWildcards = False
.Text = ""
.Replacement.Text = ""
.Font.Bold = True
Do While .Execute
With rngSearch
.InsertBefore "<b>"
.InsertAfter "</b>"
.Collapse wdCollapseEnd
End With
Loop
End With
Perl equivalent:
my $word = Win32::OLE->new('Word.Application', 'Quit');
my $doc = $word->Documents->Open($infile) or die Win32::OLE->LastError();
my $range = $doc->Content();
$range->{Find}->{Format} = 1;
$range->{Find}->{Forward} = 1;
$range->{Find}->{Wrap} = wdFindStop;
$range->{Find}->{MatchWildcards} = 0;
$range->{Find}->{Text} = '';
$range->{Find}->{Replacement}->{Text} = '';
$range->{Find}->{Font}->{Bold} = 1;
while ( $range->{Find}->Execute() ) {
$range->InsertBefore('<b>');
$range->InsertAfter('</b>');
$range->Collapse(wdCollapseEnd);
}
Matt
I had to convert a number of Word files to sgml today and at the same time
retain the font formatting (in the form of <b>, <i>, etc. tags). I know this
can be done by saving to html and cleaning up the other html garbage spit
out, but these files were already nicely styled and the conversion program
was meant to take advantage of this fact.
I also couldn't write this search/replace as a VBA macro, because the
program is automatically launched by a service watching a specific directory
for files dropped into it (and I couldn't find any way to run a Word macro
via OLE; I could use $word->Run() from the command line, but it wouldn't
work when run by the service, even when I changed who the service was run
as).
The code snippet below should be self-explanatory, but it sure was a
headache trying to convert the data structures. Hope this comes in handy for
anyone trying to do the same. I also encourage anyone else to post to this
thread any examples of VBA -> Perl code they think others might find useful,
as I found there is really very little to work from (or feel free to point
out where my code can be optimized).
Original VBA macro to find bolded text:
Dim rngSearch As Word.Range
Set rngSearch = ActiveDocument.Content
With rngSearch.Find
.Format = True
.Forward = True
.Wrap = wdFindStop
.MatchWildcards = False
.Text = ""
.Replacement.Text = ""
.Font.Bold = True
Do While .Execute
With rngSearch
.InsertBefore "<b>"
.InsertAfter "</b>"
.Collapse wdCollapseEnd
End With
Loop
End With
Perl equivalent:
my $word = Win32::OLE->new('Word.Application', 'Quit');
my $doc = $word->Documents->Open($infile) or die Win32::OLE->LastError();
my $range = $doc->Content();
$range->{Find}->{Format} = 1;
$range->{Find}->{Forward} = 1;
$range->{Find}->{Wrap} = wdFindStop;
$range->{Find}->{MatchWildcards} = 0;
$range->{Find}->{Text} = '';
$range->{Find}->{Replacement}->{Text} = '';
$range->{Find}->{Font}->{Bold} = 1;
while ( $range->{Find}->Execute() ) {
$range->InsertBefore('<b>');
$range->InsertAfter('</b>');
$range->Collapse(wdCollapseEnd);
}
Matt