encoding problem

J

Jim Lawton

Hi,

..net c# httphandler straight html form at browser.

GBP pound sign problem (I know I know - I *can* decode it, but I've got to
understand what and why I should be doing stuff)

I am uploading text data from a form. This data is either directly input into a
textarea, or is a file stream originating from a .txt file, (or other basic text
file (like off Mac or Unix - of course I don't necessarily know at present it's
only .txt)

The page encoding is :-
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />

On arrival at the server the content encoding is, sure enough UTF8.

Data input via the textarea and input to a string is displayed in the debugger
as pounds (£)

Data input as a filestream has in the stream single bytes containing 0xA3 for
the GBP pound sign.

I process the input stream like this :-

public static string StreamToString(Stream aStream)
{ {
aStream.Position = 0;
long i = aStream.Length;
byte[] buffer = new byte;

aStream.Read(buffer,0,(int)aStream.Length);
return BytesToUTF8String(buffer);
}

public static string BytesToUTF8String(byte[] Array)
{
Encoding utf8 = Encoding.UTF8;
char[] utf8Chars = new char[utf8.GetCharCount(Array, 0,Array.Length)];
utf8.GetChars(Array, 0, Array.Length, utf8Chars, 0);

return new string(utf8Chars);
}

The resulting string contains nothing ...

If I use ASCII instead of UTF8, I get sense except my GBP signs are query ?
marks.

If I use UTF7 I get an apparently OK decoding.

I am dubious about using UTF7 for no better reason than that it works. Is there
logic here? What should I be doing?

Thanks,
Jim
 
B

bruce barker

it doesn't really matter what encoding you use for the page response, whats
important is the encoding used on the post from the browser. the browser
picks this (though often it will match). you should check the content-type
header the browser sends to determine the character set. for a html form
post (application/x-www-form-urlencoded) IS0-8859-1 is the default character
set not utf8.

-- bruce (sqlwork.com)


| Hi,
|
| .net c# httphandler straight html form at browser.
|
| GBP pound sign problem (I know I know - I *can* decode it, but I've got to
| understand what and why I should be doing stuff)
|
| I am uploading text data from a form. This data is either directly input
into a
| textarea, or is a file stream originating from a .txt file, (or other
basic text
| file (like off Mac or Unix - of course I don't necessarily know at present
it's
| only .txt)
|
| The page encoding is :-
| <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
|
| On arrival at the server the content encoding is, sure enough UTF8.
|
| Data input via the textarea and input to a string is displayed in the
debugger
| as pounds (£)
|
| Data input as a filestream has in the stream single bytes containing 0xA3
for
| the GBP pound sign.
|
| I process the input stream like this :-
|
| public static string StreamToString(Stream aStream)
| { {
| aStream.Position = 0;
| long i = aStream.Length;
| byte[] buffer = new byte;
|
| aStream.Read(buffer,0,(int)aStream.Length);
| return BytesToUTF8String(buffer);
| }
|
| public static string BytesToUTF8String(byte[] Array)
| {
| Encoding utf8 = Encoding.UTF8;
| char[] utf8Chars = new char[utf8.GetCharCount(Array, 0,Array.Length)];
| utf8.GetChars(Array, 0, Array.Length, utf8Chars, 0);
|
| return new string(utf8Chars);
| }
|
| The resulting string contains nothing ...
|
| If I use ASCII instead of UTF8, I get sense except my GBP signs are query
?
| marks.
|
| If I use UTF7 I get an apparently OK decoding.
|
| I am dubious about using UTF7 for no better reason than that it works. Is
there
| logic here? What should I be doing?
|
| Thanks,
| Jim
 
J

Jim Lawton

it doesn't really matter what encoding you use for the page response, whats
important is the encoding used on the post from the browser. the browser
picks this (though often it will match). you should check the content-type
header the browser sends to determine the character set. for a html form
post (application/x-www-form-urlencoded) IS0-8859-1 is the default character
set not utf8.

-- bruce (sqlwork.com)

Thanks Bruce,

for anyone googling this topic in future, there's more in
dotnet.languages.csharp
Message-ID: <[email protected]>

cheers Jim
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,995
Messages
2,570,230
Members
46,817
Latest member
DicWeils

Latest Threads

Top