J
Jon Maz
Hi,
I am working on a dotnet url rewriting mechanism that has to be able to deal
with urls containing non-standard characters, eg
http://www.mysite.com/Télécharger.
The problem is that some browsers will encode this url using utf8 & some
using ISO 8859 (I *think* those are the only two possibilities). For ISO
8859 I can use the built-in UrlDecode function, for utf8 I am using a
function I found on Google groups:
public static string Utf8ToString(string inputString)
{
byte[] utf8Bytes = new byte[inputString.Length];
for (int i=0; i < utf8Bytes.Length; i++)
{
utf8Bytes = (byte)inputString;
}
return Encoding.UTF8.GetString(utf8Bytes);
}
The problem is deciding *which* encoding the browser has used, and therefore
which decoding function I need to use. It seems that Mozilla-based browsers
use ISO 8859, whereas IE can use either, depending on a user-setting, and I
haven't looked at any other browsers yet.
As far as I know, the browser does NOT send anything in the headers that
tell you what url-encoding it is using, so I guess I need some way of
looking at the raw url and working out which encoding it's using.
Can anyone help me with writing a function to do this? The ideal would be a
GetEncoding(string testString) function, but I'd settle for a function
IsUtf8Encoded(string testString), on the grounds that if it *isn't* utf8, it
must be ISO 8859.
TIA,
JON
I am working on a dotnet url rewriting mechanism that has to be able to deal
with urls containing non-standard characters, eg
http://www.mysite.com/Télécharger.
The problem is that some browsers will encode this url using utf8 & some
using ISO 8859 (I *think* those are the only two possibilities). For ISO
8859 I can use the built-in UrlDecode function, for utf8 I am using a
function I found on Google groups:
public static string Utf8ToString(string inputString)
{
byte[] utf8Bytes = new byte[inputString.Length];
for (int i=0; i < utf8Bytes.Length; i++)
{
utf8Bytes = (byte)inputString;
}
return Encoding.UTF8.GetString(utf8Bytes);
}
The problem is deciding *which* encoding the browser has used, and therefore
which decoding function I need to use. It seems that Mozilla-based browsers
use ISO 8859, whereas IE can use either, depending on a user-setting, and I
haven't looked at any other browsers yet.
As far as I know, the browser does NOT send anything in the headers that
tell you what url-encoding it is using, so I guess I need some way of
looking at the raw url and working out which encoding it's using.
Can anyone help me with writing a function to do this? The ideal would be a
GetEncoding(string testString) function, but I'd settle for a function
IsUtf8Encoded(string testString), on the grounds that if it *isn't* utf8, it
must be ISO 8859.
TIA,
JON