Encoding values for sending to an web server

D

diegodpf1

I used to encode values before sending thru a GET or POST request like
this:


$var =~ s/([\W])/"%".uc(sprintf("%2.2x",ord($1)))/eg;
$var =~ s/%20/+/g;


But recently, I realized that my browser IE6.0 is doing it differently.
For instance:
"ã" after encoded by IE6.0 is "%C3%A3"
"ã" after encoded by my script is "%E3"

So, my question is:

How can I encode like my browser does? I've looked for an answer at
Google, but all I found were different ways to produce the same result
my script already does, I couldn't find an way to produce a result
equal to IE's.
 
B

Brian Wakem

I used to encode values before sending thru a GET or POST request like
this:


$var =~ s/([\W])/"%".uc(sprintf("%2.2x",ord($1)))/eg;
$var =~ s/%20/+/g;


But recently, I realized that my browser IE6.0 is doing it differently.
For instance:
"ã" after encoded by IE6.0 is "%C3%A3"
"ã" after encoded by my script is "%E3"

So, my question is:

How can I encode like my browser does? I've looked for an answer at
Google, but all I found were different ways to produce the same result
my script already does, I couldn't find an way to produce a result
equal to IE's.


use URI::Escape;
 
D

diegodpf1

Thanks Brian.

I have tried it:


use URI::Escape;

$query = "ã";
$query = uri_escape($query);
print "$query\n";


But I still got the result "%E3" whereas what I wanted was "%C3%E3".

I'm trying access www.perldoc.com to learn more about URI::Escape, but
sounds like it's temporarily down...
 
M

Matt Garrish

I'm trying access www.perldoc.com to learn more about URI::Escape, but
sounds like it's temporarily down...

Please quote context when replying to a usenet message.

As to perldoc.com, it disappeared a long time ago. If you want to see the
documentation for a module, check your hard drive, go to
http://search.cpan.org or try http://perldoc.perl.org/ (although not being a
core module you won't find the URI::Escape documentation at the latter).

Matt
 
G

Gunnar Hjalmarsson

I used to encode values before sending thru a GET or POST request like
this:

$var =~ s/([\W])/"%".uc(sprintf("%2.2x",ord($1)))/eg;
$var =~ s/%20/+/g;

But recently, I realized that my browser IE6.0 is doing it differently.
For instance:
"ã" after encoded by IE6.0 is "%C3%A3"
"ã" after encoded by my script is "%E3"

So, my question is:

How can I encode like my browser does?

Change from UTF-8 to e.g. ISO-8859-1, or do:

use Encode;
my $unicode = Encode::encode('UTF-8', $var);
( $var = $unicode ) =~ s/([\W])/"%".uc(sprintf("%2.2x",ord($1)))/eg;
 
J

John Bokma

Thanks Brian.

I have tried it:


use URI::Escape;

$query = "ã";
$query = uri_escape($query);
print "$query\n";


But I still got the result "%E3" whereas what I wanted was "%C3%E3".

I guess that this has to do with the Content-Type of the document. IE
probably takes this in account when creating the encoded link.
 
A

Alan J. Flavell

Change from UTF-8 to e.g. ISO-8859-1,

If you're encoding in order to prepare a form submission to a server
(or more precisely, to a server-side process, such as maybe a CGI
script), then the usual arrangement is that the server(-side process)
expects the submission to be encoded according to the character
encoding with which the original form (html page) was sent out from
the server. So make sure that you know what that was.

Even this is bad enough: as the Encode::Supported page ruefully
remarks:

| it is beyond the power of words to describe the way HTML browsers
| encode non-ASCII form data.

See http://perldoc.perl.org/Encode/Supported.html
below the sub-heading "Encoding Classification".

Anyhow - let's suppose that the server-side software has already been
designed to cope with the multifarious things that browsers do with
the expected encoding(s) - but responding with any other encoding than
the one that they expect is very likely to confuse the server-side
script. So take care before unilaterally switching the encoding
around.

Maybe this is obvious, but I thought it worth making the point in so
many words.

best
 
G

Gunnar Hjalmarsson

Alan said:
If you're encoding in order to prepare a form submission to a server
(or more precisely, to a server-side process, such as maybe a CGI
script), then the usual arrangement is that the server(-side process)
expects the submission to be encoded according to the character
encoding with which the original form (html page) was sent out from
the server. So make sure that you know what that was.

Well, the browser generated URI-escaping, as described by the OP, made
me assume that it was UTF-8.
responding with any other encoding than
the one that they expect is very likely to confuse the server-side
script. So take care before unilaterally switching the encoding
around.

Right, and that wasn't my thought. Assuming that the web server's
default charset was set to UTF-8 (which I think is the default
configuration for e.g. Apache nowadays), you'd need to explicitly tell
the server to send an HTTP header along with the document in order to
set some other encoding. That's what I meant with "change from UTF-8 to
e.g. ISO-8859-1". Admittedly I could have been clearer. :)
 
A

Alan J. Flavell

Well, the browser generated URI-escaping, as described by the OP,
made me assume that it was UTF-8.

If that's confirmed, then utf-8 is indeed a good choice, because it
can represent every character which the client could want to submit.

On a positive and upbeat note for the New Year ;-) ...

Aside from possible client bugs - and many of those have been
corrected since the early days - that rather depressing assessment
which I quoted:

| It is beyond the power of words to describe the way HTML
| browsers encode non-ASCII form data.

- refers chiefly to browsers' attempts to cope with characters which
the user is attempting to submit, but which cannot be represented
directly in the currently-operative character encoding.

When utf-8 is the operative encoding, this shortcoming doesn't happen.
And, in recent years, most browsers have moved to support utf-8
encoding correctly in this regard (the last incapable browser of wide
importance was Netscape 4.*, and use of that has now faded pretty much
into insignificance).

Which is why, over the last couple of years, the various search engine
services' multilingual search pages (at google, altavista, etc.) have
all moved from offering support for queries in the user's choice from
a long selection of character encodings, to a single query page that's
sent out in utf-8 encoding and expects the query submission to come
back in that same encoding. This works well (with a suitable browser,
of course - what I'm saying is that I.M.E most recent-ish browser
versions /are/ now suitable, and the general adoption of utf-8 by the
search services proves that they think so too).
That's what I meant with "change from UTF-8 to e.g. ISO-8859-1".
Admittedly I could have been clearer. :)

Not to worry. I hope these notes will be useful to anyone who's
applying Perl in this kind of situation.

best
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,176
Messages
2,570,950
Members
47,503
Latest member
supremedee

Latest Threads

Top