query string encoding/decoding

G

Guest

I've run a few simple tests looking at how query string encoding/decoding gets handled in asp.net, and it seems like the situation is even messier than it was in asp... Can't say I think much of the "improvements", but maybe someone here can point me in the right direction...

First, it looks like asp.net will automatically read and recognize query strings encoded in utf8 and 16-bit unicode, only the latter is some mutant, non-standard encoding mechanism that only works in IIS (%u00f1 for example). This looks like it's the *only* way to decode querystrings. Too bad, 'cause browsers out there can encode them all kinds of different ways, and the way most will get done by default is windows-1252. At least in old asp, the lazy defaults for putting together your forms and the default behavior for most browsers would fit well. Seems like there's more active attention required in asp.net

Second, it no longer appears to be using the page's declared output encoding as a means of interpreting the input (both good and bad, i guess). This means if it runs into a character in the querystring that's *not* encoded utf-8 or mutant, it just drops that character out of the input. Period. No way to handle it. Accented spanish characters, for example, that most browsers are going to encode in 1252 (i.e. %f1 for ñ) just vanish from the asp.net environment

Third, in asp, when you have more than one value for a query string variable name, referencing the Request object gives you a collection. Now that collection has a toString method that makes a comma-separated list of the values but you *can* refer to each of the different values separately. In asp.net, the NameValueCollection mashes multiple values into a single comma-separated string so if your input has commas in it, well too bad

Fourth, in asp Request.QueryString gives you the original urlencoded bytes of the querystring i.e. what you were sent. In asp.net, it's actually a re-urlencoding of the post-interpreted values, so you can't get out what you got in... This is most annoying when you get a querysting encoded in utf-8; referencing Request.QueryString returns you a value encoded in the mutant %uxxxx syntax instead in asp.net

I'm just getting my feet wet in asp.net, coming from an asp environment. Any pointers on how to handle query string issues better than what appears to be the default in asp.net? Seems like there are some steps backwards in asp.net

Thank
-mar
 
S

Steven Cheng[MSFT]

Hi Mark,

Thanks for posting in the community!
From your description, you're wondering on the means ASP.NET treat the
Request's querystring which seems quite different from the classic ASP's.
As for the first two points you mentioned, I think they're because the
querystring is encoded based on its client broswer's codepage and then post
to serverside. The serverside will decode the querystring via the page's
codepage. If not specified(both clientbrowser for serverside page), they'll
take the default codepage, if not equals, the result we got may become
incorrect.

As for the #3 you mentioned , I've searched the MSDN and found that in
ASP.NET if you want to get Multi-value querystring item, you need to first
use Querystring.GetValues method, here is the description in MSDN:
--------------------------------------------
If the item you are accessing contains exactly one value for the specified
key, you do not need to modify your code. However, if there are multiple
values for a given key, you need to use a different method to return the
collection of values. Also, note that collections in Visual Basic .NET are
zero-based, whereas the collections in VBScript are one-based.

For example, in ASP the individual query string values from a request to
http://localhost/myweb/valuetest.asp?values=10&values=20 would be accessed
as follows:

<%
'This will output "10"
Response.Write Request.QueryString("values")(1)

'This will output "20"
Response.Write Request.QueryString("values")(2)
%>

In ASP.NET, the QueryString property is a NameValueCollection object from
which you would need to retrieve the Values collection before retrieving
the actual item you want. Again, note the first item in the collection is
retrieved by using an index of zero rather than one:

<%
'This will output "10"
Response.Write (Request.QueryString.GetValues("values")(0))

'This will output "20"
Response.Write (Request.QueryString.GetValues("values")(1))
%>

In both the case of ASP and ASP.NET, the follow code will behave
identically:

<%
'This will output "10", "20"
Response.Write (Request.QueryString("values"))
%>

----------------------------------------------------------

As for the #4 point, I think such things as %uxxxx is because the
querystrings are in url and url can only contains ISO-8859-1 charset
characters, so if contains unicode, it'll be first encoded and also
urlencoded( replace some particular characters). I think we're certainly to
get the values in the querystring how they're input at client as long as we
mapping the correct codepage between the client and serversdie.

In addition, here are some tech articles on Migrating from ASP TO ASP.NET:
#New ASP.NET Page Directives
http://msdn.microsoft.com/library/en-us/cpguide/html/cpconnewpagedirectives.
asp?frame=true

#Migrating to ASP.NET: Key Considerations
http://msdn.microsoft.com/library/en-us/dnaspp/html/aspnetmigrissues.asp?fra
me=true

#Migrating a Commerce Server Site from ASP to ASP.NET
http://msdn.microsoft.com/library/en-us/dncomsrv02/html/mscs_csnetmig.asp?fr
ame=true

#Converting ASP to ASP.NET
http://msdn.microsoft.com/library/en-us/dndotnet/html/convertasptoaspnet.asp
?frame=true

Hope these help.


Regards,

Steven Cheng
Microsoft Online Support

Get Secure! www.microsoft.com/security
(This posting is provided "AS IS", with no warranties, and confers no
rights.)

Get Preview at ASP.NET whidbey
http://msdn.microsoft.com/asp.net/whidbey/default.aspx
 
G

Guest

Hi Steve..

First off, thanks for the pointer about GetValues(). That seems much closer to the old method than the NameValueCollection and makes it possible to work with querystrings more effectively
As for the first two points you mentioned, I think they're because the
querystring is encoded based on its client broswer's codepage and then post
to serverside. The serverside will decode the querystring via the page's
codepage. If not specified(both clientbrowser for serverside page), they'll
take the default codepage, if not equals, the result we got may become
incorrect

Is the default code page in .aspx different than .asp? Do you set the codepage differently in .aspx? I have a little sample .aspx page with the heade
<%@Language="Jscript" CodePage=1252 EnableSessionState="False"%
and yet asp.net does *not* seem to be decoding using the declared codepage. It only decodes utf-8 and the mutant utf-16 declarations. Period. Characters encoded in the querystring using the declared codepage just vaporize because asp.net is not decoding them properly

This does pose some real practical problems, since most legacy pages don't go out of their way to declare codepages on either the client or the server side. In .asp, the default codepage is based on a system setting, usually windows-1252. This lazy default matches up well with 99% of the browsers out there, which are also set up to default to windows-1252 and the two can play together. Since asp.net seems only to decode utf-8 (and non-standard mutant), extra care seems to be necessary to get the client and server to play together reliably... An unnecessary potential gotcha for going to asp.net it seems

Having the client and server play well by default is especially important becaus
a) there is no way to communicate what the codepage is for url encoding on GET
b) I.E. doesn't seem to send the client's codepage back up for the ride by default even on POSTs. I wrote a couple of little sample posts. The form gets the charset both from the Content-Type header and a <Meta http-equiv="Content-type"> header in the html and it does encode the form values in that codepage but the http request on the form submission does *not* include any info to tell the server what codepage the post data is in (seems like a deficiency in IE, but probably not uncommon among any browsers)

The fact that asp.net doesn't follow/respect any of the common settings like asp did seems like it creates unnecessary openings for bugs

On point #3, do you know off hand if asp.net works the same way as asp in that the QueryString collection doesn't get chopped up until there's a reference to it? Or is asp.net going to interpret the querystring whether or not the code references it?
As for the #4 point, I think such things as %uxxxx is because the
querystrings are in url and url can only contains ISO-8859-1 charset
characters, so if contains unicode, it'll be first encoded and also
urlencoded( replace some particular characters). I think we're certainly to
get the values in the querystring how they're input at client as long as we
mapping the correct codepage between the client and serversdie

I think you missed the point of #4. The point was that no, you *don't* get back what the client put in and that seems undesirable and arbitrary. If the user input i
http://foo.com/test.aspx?query=añ
(the ñ is encoded with the utf-8 codepage
<% Response.Write (Request.QueryString); %
output
query=a%u00f1
(the post-interpreted value re-encoded using the non-standard mutant form instead of the original encoding) This is kinda like the xml assertion that one equivalent representation is just as good as another, but that's codified in the xml standard. The change in asp.net from asp just seems arbitrary and annoying (especially since it uses a syntax that is non-standard and only makes sense to certain versions of IE and IIS and nobody else). I guess I can still get at the real user input by looking at the rawUrl property instead. I haven't tried that yet


I know this encoding stuff is difficult. That's why we had to write our own COM objects to interpret the querystring in asp. Mostly we wrote them for two reasons:
1) we wanted, like other websites do, to let the page handle the querystring encoding based on user input (like google with a form value) so that people could encode things any which way and we'd still be able to read them.

2) I haven't tried this in asp.net yet, but at least in asp, there was also the unfortunate side-effect that Server.UrlEncode would only encode things in the page's output-encoding. If you have a multi-tier system where you need to construct urls to call the next tier, you can't always say the next tier will take urls in the encoding the last tier does. So we needed the extra flexibility to be able to urlencode in any codepage.

For the most part, as I said, the codepage handling seemed to work pretty well by default in asp. There were limitations, but by default most pages worked pretty well with the world at large. Your default page would cover English and all latinate languages without extra effort. Asp.net, it seems, requires extra non-default effort to correctly handle anything that's not 7-bit ascii, and that seems like a step backwards. That's all i'm saying. Pardon me for saying so, but it doesn't seem like you've actually tried any of this stuff past the standard ascii input. If you want I can send you my sample page and some sample queries to demonstate what I'm saying.

I'll read through the reference pages, but after the first few there still doesn't appear to be anything to correct the impression I've gotten that asp.net is worse than asp in codepage issues.

Thanks
-Mark
 
S

Steven Cheng[MSFT]

Hi Mark,

Thank you for the response. Regarding on the issue, I'll consult some
further experts on this and will update you as soon as posible. Also, I
think it'll be helpful if you'd attach some sample pages on the issues
you've mentioned. Thanks.


Regards,

Steven Cheng
Microsoft Online Support

Get Secure! www.microsoft.com/security
(This posting is provided "AS IS", with no warranties, and confers no
rights.)

Get Preview at ASP.NET whidbey
http://msdn.microsoft.com/asp.net/whidbey/default.aspx
 
G

Guest

Hi Steve..

The example code is really rather simple
<% @Page Language="JScript" CodePage="1252" EnableSessionState="false"%><% var key, args : NameValueCollection
args = Request.QueryString
for (key=0; key < args.AllKeys.Length; key++
{ var keys : String
keys = args.AllKeys[key]
var val = Request.QueryString.GetValues (keys)
Response.Write ("<pre>"+keys+": "+val+"\n<pre>type: "+typeof (val)+"\n")
Response.Write (val.Length+" "+typeof(val)+"\n")
if (val.Length > 0
{ var i : int
for (i = 0; i < val.Length; i++
Response.Write ("\t"+i+" "+val(i)+"\n")

Response.Write ("</pre>\n");

Response.Write (Request.QueryString+"<br>\n")
%

I tweaked it from the original to use GetValues() as you suggested, which works fine for multiple values. All the page does is output the individual query string values that were passed into it

The important parts of the demonstration, though are these
1) note that the page *does* declare a codepage explicitly (not that it seems to matter) to windows-1252, which, if it were like asp, would/should be the default codepage anyway

2) queries that will demonstrate all of the problems in asp.net
(a little background: ñ = %f1 (latin-1, windows-1252 encoding) = %c3%b1 (utf-8 encoding) = %u00f1 (MS mutant utf-16 syntax
http://localhost/qs.aspx?query=a%f1
http://localhost/qs.aspx?query=añ
http://localhost/qs.aspx?query=a%u00f1

The first url encodes the querystring año in the declared codepage of the program. In the output, you'll see that the ñ just gets vaporized (i.e. declared invalid on parsing). Problem #1 is that asp.net is not using the page's declared codepage for interpretation

The second url encodes the same query in utf-8. This page will show you that, despite the declared codepage, asp.net is going to read the querystring as a utf-8 encoding (at least it reads it properly). But this url also demonstrates problem #4 at the end with the Response.Write (Request.QueryString+"<br>\n");. It does *not* output the same value you got in. Instead it outputs mutant encoding. This is inconsistent with asp, where Request.QueryString will give you the raw bytes as you got them (without interpretation) but I suppose arguably consistent with serializing a NameValueCollection, post-interpretation. I'm feeling generous today, so I can acknowledge that if you put the same values into a string array and then tried to write out the .toString() of the array, you'd probably get the same result as asp.net currently produces. But it is still a deviation from asp. As I also said, I suppose I can grab the rawUrl property and to a split on the '?'

The third url encodes the same query with the MS mutant utf-16 encoding, which, despite the declared codepage, asp.net reads just fine. This url only demonstrates that it will read utf-16 mutant no matter what your codepage is. Point #4 is not exactly demonstrated by this since getting MS mutant utf-16 out happens to be what you put in, in this instance. Even a broken clock is right twice a day, I guess

Thank
-mar
 
E

Earl Beaman[MS]

Hi Mark,

Here's some inline replies.

I've run a few simple tests looking at how query string encoding/decoding
gets handled in asp.net, and it seems like the situation is even messier
than it was in asp... Can't say I think much of the "improvements", but
maybe someone here can point me in the right direction...
First, it looks like asp.net will automatically read and recognize query
strings encoded in utf8 and 16-bit unicode, only the latter is some mutant,
non-standard encoding mechanism that only works in IIS (%u00f1 for
example). This looks like it's the *only* way to decode querystrings. Too
bad, 'cause browsers out there can encode them all kinds of different ways,
and the way most will get done by default is windows-1252. At least in old
asp, the lazy defaults for putting together your forms and the default
behavior for most browsers would fit well. Seems like there's more active
attention required in asp.net.
This is due to responseEncoding/requestEncoding in <globalization> element
in web.config. The default is UTF-8. You can modify this to windows-1252
if you want. UTF-8 is much better.
For IE, it will encode the url (unless you type it directly into the
address bar) according to the following: Response.Charset, Charset header,
then <meta> tag. Using the responseEncoding will send the Charset header.
If you simply type the address into the address, I believe that it is
encoded with the system codepage.

Second, it no longer appears to be using the page's declared output
encoding as a means of interpreting the input (both good and bad, i guess).
This means if it runs into a character in the querystring that's *not*
encoded utf-8 or mutant, it just drops that character out of the input.
Period. No way to handle it. Accented spanish characters, for example,
that most browsers are going to encode in 1252 (i.e. %f1 for ?) just vanish
from the asp.net environment.
It does use the declared output encoding, but this encoding is not where
you expect. It is in <globalization> in web.config.
If you do expect to get upper ascii characters, you can use the following
code in Global.asax in the Application_BeginRequest event:
protected void Application_BeginRequest(Object sender, EventArgs e)
{
//Fires at the beginning of each request
string str;
str = Request.QueryString.ToString();
string delimStr = "%"
char [] delimiter = delimStr.ToCharArray();
string [] split = null;

split = str.Split(delimiter, 2);

if (split.Length <= 1)
{
System.Web.HttpContext.Current.RewritePath(Request.RawUrl.ToString());
}

}

or
Dim str As String
str = Server.UrlPathEncode(Request.RawUrl)
System.Web.HttpContext.Current.RewritePath(str)
Third, in asp, when you have more than one value for a query string
variable name, referencing the Request object gives you a collection. Now
that collection has a toString method that makes a comma-separated list of
the values but you *can* refer to each of the different values separately.
In asp.net, the NameValueCollection mashes multiple values into a single
comma-separated string so if your input has commas in it, well too bad.
In ASP and ASP.NET, you get a collection when referencing the querystring
object.
The comment about the NameValueCollection is the exact same behaviour as in
asp.
Example: using the following querystring gives the following results:
http://kronicas26/Converting/P1.asp?id=5&id=9&test=7
ASP
Code:
<%
Response.Write Request.QueryString & "<BR>"
dim o
for each o in Request.QueryString
Response.Write "Key: " & o & " Value: " & Request.QueryString(o) & "<BR>"
next
%>

Output:
id=5&id=9&test=7
Key: id Value: 5, 9
Key: test Value: 7
ASP.NET
http://kronicas26/ssCS/webform1.aspx?id=5&id=9&test=7
Code:
Response.Write(Request.QueryString.ToString() + "<BR>");
foreach(string t in Request.QueryString.Keys)
{
Response.Write("Key :" + t + " Value: " + Request.QueryString[t] +
"<BR>");
}
Output:
id=5&id=9&test=7
Key :id Value: 5,9
Key :test Value: 7

This shows there is no difference in querystring handling between asp and
asp.net as far as the collection behaviour goes.


Fourth, in asp Request.QueryString gives you the original urlencoded bytes
of the querystring i.e. what you were sent. In asp.net, it's actually a
re-urlencoding of the post-interpreted values, so you can't get out what
you got in... This is most annoying when you get a querysting encoded in
utf-8; referencing Request.QueryString returns you a value encoded in the
mutant %uxxxx syntax instead in asp.net.
If you want to get the original querystring, use Request.RawUrl. This
property is not encoded, and is pulled straight from the aspnet_isapi.dll.
I'm just getting my feet wet in asp.net, coming from an asp environment.
Any pointers on how to handle query string issues better than what appears
to be the default in asp.net? Seems like there are some steps backwards in
asp.net.

If you want things to behave like they did in asp, at least as far as
encoding issues go, do the following:
In web.config, set <globalization
requestEncoding="utf-8"
responseEncoding="utf-8"
/>
To
<globalization
requestEncoding="windows-1252"
responseEncoding=" windows-1252"
/>
And use Response.Charset and session.codepage. Windows-1252 is a single
byte encoding scheme, and multibyte characters will pass through unchanged.

Thanks,
Earl Beaman
Microsoft, ASP.NET

This posting is provided "AS IS", with no warranties, and confers no
rights.
 
E

Earl Beaman[MS]

Hi Mark,

Thing is, in asp.net, you really don't use the codepage. You can, but it
is only there for compatibility.
Now, you use the System.Threading and System.Globalization namespaces to
create cultureInfo objects and set that as the current thread's
currentculture.

The main thing you are missing is the extra layer in asp.net. This is the
<globalization> element in web.config. This controls how input and output
are encoded. You can override it in web.config or on the page itself.
This is why you are seeing that only utf-8 strings are correctly deciphered.

The querystring gets chopped up and encoded automatically. I believe that
this also occurred in asp, i can confirm that in another post. I do know
that the servervariables collection behaves that way, but I don't believe
we have had any issues with the querystring.

It may seem like querystring handling is more difficult, but the
globalization story is 9000 times better then in asp. And when you
understand how things work, I believe you will see that things are better.

Here are some links that should help you out.
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/cpgenref/ht
ml/gngrfglobalizationsection.asp
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/cpref/html/
frlrfsystemglobalization.asp
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/cpref/html/
frlrfsystemthreadingthreadclasscurrentculturetopic.asp

I will check this thread again to see if you have questions for me.

Thanks,
Earl Beaman
Microsoft, ASP.NET

This posting is provided "AS IS", with no warranties, and confers no
rights.
 
T

T Conti

Thanks for this posting. We are running into Globalization issues
with asp calls to ashx/aspx. Your last posting helped clear up some
questions about asp.net (and confirmed some fears).
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,995
Messages
2,570,230
Members
46,816
Latest member
SapanaCarpetStudio

Latest Threads

Top