Help needed: Unicode and file format problem

Pekka Niiranen · Sep 21, 2004

Hi gurus,

I have stored Excel97 -table in ascii csv -file.
The table contains pairs of values: string and its replacement.
I parse the csv-file and run "search and replace" to other
files with "sed" and "perl" -scripts in Cygwin environment.
Sometimes the replacements are nonascii strings
(Chinese characters for example). This does not matter
because I am treating all files as list of bytes of which
some are replaced.

Now, however, the file to which I run search and replace has to
be in UTF-8 format. How can I run Unicode regural expressions in Python
when csv -file contains ascii mixed with some nonascii characters?
How can I work out corresponding Unicode character out of bare bytes?

In other words I have to open target file like this:
fileObj = codecs.open( "File_to_be_modified", "w", "utf-8" )
and then run Unicode regular expression to it, where read
replacements are bytes that must be written out as UTF-8 strings.

I could try to read directly from Excel to Python thru COM interface
or try to created Python COM service that is called from the Excel, but
I would hate to do that. I could also try switch to Excel2000 which
supports UTF-8 as saving format, but there are other
issues (VBA code) involved.

-pekka-

=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?= · Sep 21, 2004

Pekka said:
In other words I have to open target file like this:
fileObj = codecs.open( "File_to_be_modified", "w", "utf-8" )
and then run Unicode regular expression to it, where read
replacements are bytes that must be written out as UTF-8 strings.

You need to read the file into a unicode string, perform the
replacement, then write it back out as UTF-8. Something like this:

infile = codecs.open( "File_to_be_read", "r", "utf-8" )
outfile = codecs.open( "File_to_be_written", "w", "utf-8" )

regexp = re.compile("some_expression", re.U)
for line in infile.readlines():
line = regexp.sub(new_text, line)
outfile.write(line)

HTH,
Martin

Thinking Unicode	0	Aug 8, 2013
Problem with format string and unicode	1	Mar 28, 2008
I need help with my python assignment and I'm stuck can't find any solution for it. Convert CSV string format to JSON format	0	Oct 12, 2021
Python 3.3, gettext and Unicode problems	0	Dec 31, 2012
helping with unicode	4	Jul 3, 2012
I need help in understanding these files on my phone, Could someone help me understand these files? Urgent help needed. Please help.	3	Jun 4, 2023
I need some help on a format issue that should be simple for someone here (but not me!)	0	Jul 6, 2023
Convert unicode escape sequences to unicode in a file	1	Jan 11, 2011

Help needed: Unicode and file format problem

Pekka Niiranen

=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads