Unable to decode file written by C++ wostringstream

Yan Cheng CHEOK · Dec 23, 2010

Currently, I have the following text file (https://sites.google.com/site/yanchengcheok/Home/TEST.TXT?attredirects=0&d=1) written by C++ wostringstream.

What I want to do it, I want to write a python script which accept user browser request, and then send over the entire file for user to download. The downloaded file, should be exactly same as the original text file inside server itself.

The code is written as follow :

import cgi

print "Content-Type: text/plain"
print "Content-Disposition: attachment; filename=TEST.txt"
print

filename = "C:\\TEST.TXT"
f = open(filename, 'r')
for line in f:
print line

However, when I open up the downloaded file, the file is all having weird characters. I try to use rb flag, it doesn't either.

Is there anything I had missed out? What I wish is, the file (TEST.TXT) downloaded by the client by making query to the above script, will be exactly same as the one in server.

I also try to specific the encoding explicitly.

import cgi

print "Content-Type: text/plain; charset=UTF-16"
print "Content-Disposition: attachment; filename=TEST.txt"
print

filename = "C:\\TEST.TXT"
f = open(filename, 'r')
for line in f:
print line.encode('utf-16')

It doesn't work either. Here is the screen shoot for original text file (http://i.imgur.com/S6SjX.png) and file after downloaded from a web browser. (http://i.imgur.com/l39Lc.png)

Is there anything I had missed out?

Thanks and Regards
Yan Cheng CHEOK

jmfauth · Dec 23, 2010

Currently, I have the following text file (https://sites.google.com/site/yanchengcheok/Home/TEST.TXT?attredirect...) written by C++ wostringstream.

The coding of the file is utf-16le. You should take care
of this coding when you *read* the file, and not when
you display its content.
.... r = f.readlines()
....

len(r) 94
r[:5]

Click to expand...

Click to expand...

[u'\n', u' 0.000 1.500 3.000 0.526
0.527 0.527 0.00036 0.00109 1381.88
485.07\n', u' 0.000 1.500 3.000 1.084
1.085 1.086 0.00037 0.00111 1351.86
978.02\n', u' 0.000 1.500 3.000
1.166 1.167 1.168 0.00043 0.00130
1152.71 897.16\n', u' -3.000 0.000 3.000
-0.031 -0.029 -0.025 0.00158 0.00475
632.17 626.13\n']
jmf

Ulrich Eckhardt · Dec 23, 2010

Yan said:
Currently, I have the following text file
(https://sites.google.com/site/yanchengcheok/Home/TEST.TXT?attredirects=0&d=1)
written by C++ wostringstream.

Stringstream? I guess you meant wofstream, or? Anyway, the output encoding
of C++ iostreams is implementation-defined, so you can't assume that such
code is generally portable. If you want a certain encoding, you need to
tell the ofstream using the codecvt facet of the locale, a websearch should
turn up more info on that.

If you have the data in memory and it is encoded as UTF-16 there (which is
what MS Windows uses for its wchar_t) then you could also use a plain
ofstream, open it with the binary flag and then simply write the memory to
a file.

In any case, you need to know the encoding in order to get the content into
a Python string or unicode object, otherwise you will only get garbage.

Good luck!

Uli

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb6 in position	58	Sep 29, 2013
PHP failed to create file	13	Dec 12, 2023
How do I encode and decode this data to write to a file?	11	Apr 29, 2013
regex line by line over file	8	Mar 27, 2014
Opening and appending to file in Python3	6	Feb 10, 2024
Reading/writing a dictionary to file problem :(	1	Mar 31, 2020
Python client/server that reads HTML body from server	1	Apr 12, 2023
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x89 in position	4	Dec 6, 2012

Unable to decode file written by C++ wostringstream

Yan Cheng CHEOK

jmfauth

Ulrich Eckhardt

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads