Setting the encoding in the basic auth header

S

siddhartag

This is not strictly a python question, but I'm hoping someone here
has come across a similar situation.

I have a django app and I've protected some views with basic
authentication. The user can use any unicode character in the username
and password fields. When this happens, the data is not properly
encoded by the browser before transmission. How can I get the browser
to encode the data as utf-8 before sending it over? Is there some
header I need to send?

This is what I'm doing at the moment.

def getAuthenticateResponse():
response = HttpResponse()
response.status_code = 401
response.headers["WWW-Authenticate"] = 'Basic realm="Realm"'
return response

When I enter character \xf1 as the username which is outside ascii but
within iso-8859-1

Firefox 2.0 sends this as \xf1
IE 7 also sends this as \xf1
But the utf-8 encoding is \xc3\xb1

If I enter character 0BA4 (TAMIL LETTER TA) which is outside
iso-8859-1

Firefox 2 sends this as \xa4 (seems to drop the high byte)
IE 7 sends this as ?

It seems that both browsers are using the iso-8859-1 charset. Is there
any way I can get them to encode the data with utf-8 instead?

Thanks for any help.
 
?

=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=

When I enter character \xf1 as the username which is outside ascii but
within iso-8859-1

Firefox 2.0 sends this as \xf1
IE 7 also sends this as \xf1
But the utf-8 encoding is \xc3\xb1

If I enter character 0BA4 (TAMIL LETTER TA) which is outside
iso-8859-1

Firefox 2 sends this as \xa4 (seems to drop the high byte)
IE 7 sends this as ?

It seems that both browsers are using the iso-8859-1 charset. Is there
any way I can get them to encode the data with utf-8 instead?

Looking at your results, the answer seems to be "no". They don't use
Latin-1, instead, they use Unicode and just drop the row byte, sending
only the cell byte (independent on whether the input was Latin-1).

RFC 2617 specifies userid as *TEXT, without ever specifying what TEXT
is. Most likely, the authors of that specification did not consider
encodings.

Regards,
Martin
 
?

=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=

It seems that both browsers are using the iso-8859-1 charset. Is there
any way I can get them to encode the data with utf-8 instead?

As a further follow-up, see

https://bugzilla.mozilla.org/show_bug.cgi?id=41489

They explain that *TEXT is defined in RFC 2616, which specifies
that non-ASCII characters must be MIME-header-encoded. So
0BA4 should be encoded as '=?utf-8?b?4K6k?=', according to
the specification. No browser currently implements that.

Regards,
Martin
 
S

Siddharta .


Wow, thanks a lot for the link. Just had a look at it. The thread runs
from 2000 to 2007!! 7 years!! What a complete mess :)

Guess I'll just have to enforce ASCII usernames and passwords for now.
Hope the IETF get around soon to updating the spec. It's really weird
that you cant enter non-ascii characters in a basic auth dialog.

Thanks again.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,968
Messages
2,570,153
Members
46,699
Latest member
AnneRosen

Latest Threads

Top