P
patrick.waldo
Hi all,
I'm trying to copy a bunch of microsoft word documents that have
unicode characters into utf-8 text files. Everything works fine at
the beginning. The word documents get converted and new utf-8 text
files with the same name get created. And then I try to copy the data
and I keep on getting "TypeError: coercing to Unicode: need string or
buffer, instance found". I'm probably copying the word document
wrong. What can I do?
Thanks,
Patrick
import os, codecs, glob, shutil, win32com.client
from win32com.client import Dispatch
input = 'C:\\text_samples\\source\\*.doc'
output_dir = 'C:\\text_samples\\source\\output'
FileFormat=win32com.client.constants.wdFormatText
for doc in glob.glob(input):
doc_copy = shutil.copy(doc,output_dir)
WordApp = Dispatch("Word.Application")
WordApp.Visible = 1
WordApp.Documents.Open(doc)
WordApp.ActiveDocument.SaveAs(doc, FileFormat)
WordApp.ActiveDocument.Close()
WordApp.Quit()
for doc in glob.glob(input):
txt_split = os.path.splitext(doc)
txt_doc = txt_split[0] + '.txt'
txt_doc = codecs.open(txt_doc,'w','utf-8')
shutil.copyfile(doc,txt_doc)
I'm trying to copy a bunch of microsoft word documents that have
unicode characters into utf-8 text files. Everything works fine at
the beginning. The word documents get converted and new utf-8 text
files with the same name get created. And then I try to copy the data
and I keep on getting "TypeError: coercing to Unicode: need string or
buffer, instance found". I'm probably copying the word document
wrong. What can I do?
Thanks,
Patrick
import os, codecs, glob, shutil, win32com.client
from win32com.client import Dispatch
input = 'C:\\text_samples\\source\\*.doc'
output_dir = 'C:\\text_samples\\source\\output'
FileFormat=win32com.client.constants.wdFormatText
for doc in glob.glob(input):
doc_copy = shutil.copy(doc,output_dir)
WordApp = Dispatch("Word.Application")
WordApp.Visible = 1
WordApp.Documents.Open(doc)
WordApp.ActiveDocument.SaveAs(doc, FileFormat)
WordApp.ActiveDocument.Close()
WordApp.Quit()
for doc in glob.glob(input):
txt_split = os.path.splitext(doc)
txt_doc = txt_split[0] + '.txt'
txt_doc = codecs.open(txt_doc,'w','utf-8')
shutil.copyfile(doc,txt_doc)