Character lost in POST submit

P

Pavils Jurjans

Hello,

I am experiencing a weird behaviour on my ASP.NET project. The project
consists from client-side, which can be whatever environment - web page, EXE
application, etc. The client sends HTTP POST request to the server with
data, and the server has ASP.NET application that handles the request and
gives answer.

I have biled all the fat code down to a very simple test case, which
consists from three files - HTML page, which does runtime HTTP POST request
(available in IExplorer from JS5, and all versions of Mozilla), and calls
two supposed-to-be-identical scripts, one done in classic ASP, another in
ASP.NET. The latter seems to lose all the characters 0xE4 in the incoming
POST data.

Please see the demo code here: http://www.s3.lv/demo/msnews.lostcharacter ,
you can also download the source code there.

I am now considering to do some mumbo-jumbo to handle the "%e4" characters
in some other way, what is hassle af course, because it involves both client
and server side code adjustments. It would be nice to understand, why this
is happening.

Regards,

Pavils
 
J

Joerg Jooss

Pavils said:
Hello,

I am experiencing a weird behaviour on my ASP.NET project. The
project consists from client-side, which can be whatever environment
- web page, EXE application, etc. The client sends HTTP POST request
to the server with data, and the server has ASP.NET application that
handles the request and gives answer.

I have biled all the fat code down to a very simple test case, which
consists from three files - HTML page, which does runtime HTTP POST
request (available in IExplorer from JS5, and all versions of
Mozilla), and calls two supposed-to-be-identical scripts, one done in
classic ASP, another in ASP.NET. The latter seems to lose all the
characters 0xE4 in the incoming POST data.

First of all, what is 0xE4 supposed to be? A Unicode code point? A
character from an 8 bit character encoding?
Please see the demo code here:
http://www.s3.lv/demo/msnews.lostcharacter , you can also download
the source code there.

I am now considering to do some mumbo-jumbo to handle the "%e4"
characters in some other way, what is hassle af course, because it
involves both client and server side code adjustments. It would be
nice to understand, why this is happening.

Your test code claims to post UTF-8, but 0xE4 is not a valid byte
sequence in UTF-8. Thus, the ASP.NET UTF-8 decoder stops after "a=X". I
guess the reason why it works in ASP is that the ASP runtime ignores
the charset attribute and uses some default character encoding like
ISO-8859-1 or Windows-1252 for which 0xE4 is a valid character.

Cheers,
 
P

Pavils Jurjans

Hi Joerg,

First of all, what is 0xE4 supposed to be? A Unicode code point? A
character from an 8 bit character encoding?

It's a character code for an Estonian character.
Your test code claims to post UTF-8, but 0xE4 is not a valid byte
sequence in UTF-8. Thus, the ASP.NET UTF-8 decoder stops after "a=X". I
guess the reason why it works in ASP is that the ASP runtime ignores
the charset attribute and uses some default character encoding like
ISO-8859-1 or Windows-1252 for which 0xE4 is a valid character.

This really shouldn't matter, because the sequence is URL-encoded anyway.

Pavils
 
J

Joerg Jooss

Pavils said:
Hi Joerg,



It's a character code for an Estonian character.

In what encoding? In Unicode, this character is "ä".
This really shouldn't matter, because the sequence is URL-encoded
anyway.

It does matter. URL encoding is a means to transport special characters
(or rather their code points) in URLs using safe character sequences.
Sender and receiver need to agree on how to map those sequences to the
real characters. There's no implict or default character encoding when
using URL encoding.

If "ä" is the character you want to use, try %C3%A4 instead -- that's
the proper way to URL encode it based on UTF-8.

Cheers,
 
P

Peter Morris [Droopy eyes software]

Hi

I once experienced problems with my website losing my £ pound sign on
postback.
 
P

Peter Morris [Droopy eyes software]

P

Pavils Jurjans

Hi Joerg,
It does matter. URL encoding is a means to transport special characters
(or rather their code points) in URLs using safe character sequences.
Sender and receiver need to agree on how to map those sequences to the
real characters. There's no implict or default character encoding when
using URL encoding.

It nice to see a person here who knows the talk about encodings and their
application in transfers, its a true rarity.

I was somewhat blinded by assumption, that the only thing the server does is
decodes URL-encoded content, and creates string from the aquired charcodes
right away. Of course, there comes the character conversion in the middle! I
checked a code that I wrote for classic ASP two years ago, that was doing
just that - since the classic ASP did not intercept any character encoding
information for the incoming POST data (nor the browser is obliged to send a
hint), I made a code that reads the Request.BinaryRead(), splits the
received content by "=" and "&" characters, URL-decodes every key-value
pair, and finally applies the character encoding provided in the function
parameter, to get the correct unicode string.

I decided to solve my %e4 problem by committing to the server a direct code
point information:

var dataBody = "a=%u00e4";

This works on both classic ASP and ASP.NET, and I need not to bother about
the applied character encodings.

Thanks for opening my eyes,

Pavils
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,994
Messages
2,570,223
Members
46,813
Latest member
lawrwtwinkle111

Latest Threads

Top