Help needed: Unicode and file format problem

P

Pekka Niiranen

Hi gurus,

I have stored Excel97 -table in ascii csv -file.
The table contains pairs of values: string and its replacement.
I parse the csv-file and run "search and replace" to other
files with "sed" and "perl" -scripts in Cygwin environment.
Sometimes the replacements are nonascii strings
(Chinese characters for example). This does not matter
because I am treating all files as list of bytes of which
some are replaced.

Now, however, the file to which I run search and replace has to
be in UTF-8 format. How can I run Unicode regural expressions in Python
when csv -file contains ascii mixed with some nonascii characters?
How can I work out corresponding Unicode character out of bare bytes?

In other words I have to open target file like this:
fileObj = codecs.open( "File_to_be_modified", "w", "utf-8" )
and then run Unicode regular expression to it, where read
replacements are bytes that must be written out as UTF-8 strings.

I could try to read directly from Excel to Python thru COM interface
or try to created Python COM service that is called from the Excel, but
I would hate to do that. I could also try switch to Excel2000 which
supports UTF-8 as saving format, but there are other
issues (VBA code) involved.

-pekka-
 
?

=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=

Pekka said:
In other words I have to open target file like this:
fileObj = codecs.open( "File_to_be_modified", "w", "utf-8" )
and then run Unicode regular expression to it, where read
replacements are bytes that must be written out as UTF-8 strings.

You need to read the file into a unicode string, perform the
replacement, then write it back out as UTF-8. Something like this:

infile = codecs.open( "File_to_be_read", "r", "utf-8" )
outfile = codecs.open( "File_to_be_written", "w", "utf-8" )

regexp = re.compile("some_expression", re.U)
for line in infile.readlines():
line = regexp.sub(new_text, line)
outfile.write(line)

HTH,
Martin
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,989
Messages
2,570,207
Members
46,783
Latest member
RickeyDort

Latest Threads

Top