Trying to set a cookie within a python script

D

Dave Angel

¯º¿Â said:
For example an 'a' char in iso-8859-1 is stored different than an 'a'
char in iso-8859-7 and an 'a' char of utf-8 ?
Nope, the ASCII subset is identical. It's the ones between 80 and ff
that differ, and of course not all of those. Further, some of the codes
that are one byte in 8859 are two bytes in utf-8.

You *could* just decide that you're going to hardwire the assumption
that you'll be dealing with a single character set that does fit in 8
bits, and most of this complexity goes away. But if you do that, do
*NOT* use utf-8.

But if you do want to be able to handle more than 256 characters, or
more than one encoding, read on.

Many people confuse encoding and decoding. A unicode character is an
abstraction which represents a raw character. For convenience, the first
128 code points map directly onto the 7 bit encoding called ASCII. But
before Unicode there were several other extensions to 256, which were
incompatible with each other. For example, a byte which might be a
European character in one such encoding might be a kata-kana character
in another one. Each encoding was 8 bits, but it was difficult for a
single program to handle more than one such encoding.

So along comes unicode, which is typically implemented in 16 or 32 bit
cells. And it has an 8 bit encoding called utf-8 which uses one byte for
the first 192 characters (I think), and two bytes for some more, and
three bytes beyond that.

You encode unicode to utf-8, or to 8859, or to ...
You decode utf-8 or 8859, or cp1252 , or ... to unicode
I use Python 2.4 and never used the u prefix.
Then you'd better hope you never manipulate those literals. For example,
the second character of some international characters expressed in utf8
may be a percent symbol, which would mess up string formatting.
i Still don't understand the difference between a 'string' and a
'string literal'
A string is an object containing characters. A string literal is one of
the ways you create such an object. When you create it that way, you
need to make sure the compiler knows the correct encoding, by using the
encoding: line at beginning of file.
If i save a file as iso-8859-1 but in some of my variabels i use greek
characters instead of telling the browser to change encoding and save
the file as utf-8 i can just use the u prefix like your examples to
save the variables as iso-8859-1 ?



http cliens send request to http server(apache) , apache call python
interpreter python call mysql to handle SQL queries right?

My question is what is the difference of the python's script output
and the web server's output to the http client?
The web server wraps a few characters before and after your html stream,
but it shouldn't touch the stream itself.
Who is producing the html code? the python output or the apache web
server after it receive the python's output?


see above.

I'm not sure whatr exaclty the do just yet.

For example if i say mymessage = "καλημέÏα" and the i say mymessage = u"καλημέÏα" then the 1st one is a greek encoding variable while the
2nd its a utf-8 one?
No, the first is an 8 bit copy of whatever bytes your editor happened to
save. The second is unicode, which may be either 16 or 32 bits per
character, depending on OS platform. Neither is utf-8.
So one script can be in some encoding and some parts of the script
like th2 2nd varible can be in another?
mymessage = u"καλημέÏα"

creates an object that is *not* encoded. Encoding is taking the unicode
stream and representing it as a stream of bytes, which may or may have
more bytes than the original has characters.
============================
Also can you please help me in my cookie problem as to why only the
else block executed each time and never the if?

here is the code:

Code:
if os.environ.get('HTTP_COOKIE') and cookie.has_key('visitor') =
'nikos':		#if visitor cookie exist
	print "ΑΠΟ ΤΗΠΕΠΟΜΕÎΗ ΕΠΙΣΚΕΨΗ ΣΟΥ ΘΑ ΣΕ ΥΠΟΛΟΓΙΖΩ ΩΣ ΕΠΙΣΚΕΠΤΗ
ΑΥΞΑÎΟÎΤΑΣ ΤΟΠΜΕΤΡΗΤΗ!"
	cookie['visitor'] = 'nikos', time() - 1 )		#this cookie will expire
now
else:
	print "ΑΠΟ ΔΩ ΚΑΙ ΣΤΟ ΕΞΗΣ ΔΕΠΣΕ ΕΙΔΑ, ΔΕΠΣΕ ΞΕΡΩ, ΔΕΠΣΕ ΑΚΟΥΣΑ!
ΘΑ ΕΙΣΑΙ ΠΛΕΟΠΟ ΑΟΡΑΤΟΣ ΕΠΙΣΚΕΠΤΗΣ!!"
	cookie['visitor'] = 'nikos', time() + 60*60*24*365 )		#this cookie
will expire in an year

How do i check if the cookie is set and why if set never gets unset?!
I personally haven't done any cookie code. If I were debugging this, I'd
factor out the multiple parts of that if statement, and find out which
one isn't true. From here I can't guess.

DaveA
 
D

Dotan Cohen

I don't understand your wording. Certainly the server launches the python
script, and captures stdout. It then sends that stream of bytes out over
tcp/ip to the waiting browser. You ask when does it become html ? I don't
think the question has meaning.

×HTML is just plain text. So the answer to the question is that
ideally, the plain text that is sent to stdout would already be HTML.

print ( "<title>My Greek Page</title>\n" )
 
M

MRAB

Dave said:
Nope, the ASCII subset is identical. It's the ones between 80 and ff
that differ, and of course not all of those. Further, some of the codes
that are one byte in 8859 are two bytes in utf-8.

You *could* just decide that you're going to hardwire the assumption
that you'll be dealing with a single character set that does fit in 8
bits, and most of this complexity goes away. But if you do that, do
*NOT* use utf-8.

But if you do want to be able to handle more than 256 characters, or
more than one encoding, read on.

Many people confuse encoding and decoding. A unicode character is an
abstraction which represents a raw character. For convenience, the first
128 code points map directly onto the 7 bit encoding called ASCII. But
before Unicode there were several other extensions to 256, which were
incompatible with each other. For example, a byte which might be a
European character in one such encoding might be a kata-kana character
in another one. Each encoding was 8 bits, but it was difficult for a
single program to handle more than one such encoding.
One encoding might be ASCII + accented Latin, another ASCII + Greek,
another ASCII + Cyrillic, etc. If you wanted ASCII + accented Latin +
Greek then you'd need more than 1 byte per character.

If you're working with multiple alphabets it gets very messy, which is
where Unicode comes in. It contains all those characters, and UTF-8 can
encode all of them in a straightforward manner.
So along comes unicode, which is typically implemented in 16 or 32 bit
cells. And it has an 8 bit encoding called utf-8 which uses one byte for
the first 192 characters (I think), and two bytes for some more, and
three bytes beyond that.
[snip]
In UTF-8 the first 128 codepoints are encoded to 1 byte.
 
D

Dave Angel

MRAB said:
Nope, the ASCII subset is identical. It's the ones between 80 and ff
that differ, and of course not all of those. Further, some of the
codes that are one byte in 8859 are two bytes in utf-8.

You *could* just decide that you're going to hardwire the assumption
that you'll be dealing with a single character set that does fit in 8
bits, and most of this complexity goes away. But if you do that, do
*NOT* use utf-8.

But if you do want to be able to handle more than 256 characters, or
more than one encoding, read on.

Many people confuse encoding and decoding. A unicode character is an
abstraction which represents a raw character. For convenience, the
first 128 code points map directly onto the 7 bit encoding called
ASCII. But before Unicode there were several other extensions to 256,
which were incompatible with each other. For example, a byte which
might be a European character in one such encoding might be a
kata-kana character in another one. Each encoding was 8 bits, but it
was difficult for a single program to handle more than one such
encoding.
One encoding might be ASCII + accented Latin, another ASCII + Greek,
another ASCII + Cyrillic, etc. If you wanted ASCII + accented Latin +
Greek then you'd need more than 1 byte per character.

If you're working with multiple alphabets it gets very messy, which is
where Unicode comes in. It contains all those characters, and UTF-8 can
encode all of them in a straightforward manner.
So along comes unicode, which is typically implemented in 16 or 32
bit cells. And it has an 8 bit encoding called utf-8 which uses one
byte for the first 192 characters (I think), and two bytes for some
more, and three bytes beyond that.
[snip]
In UTF-8 the first 128 codepoints are encoded to 1 byte.
Thanks for the correction. As I said, I wasn't sure. I did utf-8 encoder
and decoder about a dozen years ago, and I remember parts of it use the
top two bits specially. But I've checked now, and you're right, the
cutoff is 7f.

DaveA
 
Î

Îίκος

A string is an object containing characters. A string literal is one of
the ways you create such an object. When you create it that way, you
need to make sure the compiler knows the correct encoding, by using the
encoding: line at beginning of file.


mymessage = "καλημέÏα" <==== string
mymessage = u"καλημέÏα" <==== string literal?

So, a string literal is one of the encodings i use to create a string
object?

Can the encodign of a python script file be in iso-8859-7 which means
the file contents is saved to the hdd as greek-iso but the part of
this variabel value mymessage = u"καλημέÏα" is saved as utf-8 ot the
opposite?

have the file saved as utf-8 but one variuable value as greek
encoding?

Encodings still give me headaches. I try to understand them as
different ways to store data in a media.

Tell me something. What encoding should i pick for my scripts knowing
that only contain english + greek chars??
iso-8859-7 or utf-8 and why?

Can i save the sting lets say "Îίκος" in different encodings and still
print out correctly in browser?

ascii = the standard english character set only, right?
The web server wraps a few characters before and after your html stream,
but it shouldn't touch the stream itself.

So the pythoon compiler using the cgi module is the one that is
producing the html output that immediately after send to the web
server, right?

No, the first is an 8 bit copy of whatever bytes your editor happened to
save.

But since mymessage = "καλημέÏα" is a string containing greek
characaters why the editor doesn't save it as such?

It reminds me of varibles an valeus where if you say

a = 5 , a var becomes instantly an integer variable
while
a = 'hello' , become instantly a string variable

mymessage = u"καλημέÏα"

creates an object that is *not* encoded.

Because it isn't saved by the editor yet? In what satet is this object
in before it gets encoded?
And it egts encoded the minute i tell the editor to save the file?
Encoding is taking the unicode
stream and representing it as a stream of bytes, which may or may have
more bytes than the original has characters.


So this line mymessage = u"καλημέÏα" what it does is tell the browser
thats when its time to save the whole file to save this string as
utf-8?

If yes, then if were to save the above string as greek encoding how
was i suppose to right it?

Also if u ise the 'coding line' in the beggining of the file is there
a need for using the u literal?
I personally haven't done any cookie code. If I were debugging this, I'd
factor out the multiple parts of that if statement, and find out which
one isn't true. From here I can't guess.

I did what you say and foudn out that both of the if condition parts
were always false thast why the if code blck never got executed.

And it is alwsy wrong because the cookie never gets set.

So can you please tell me why this line

cookie['visitor'] = ( 'nikos', time() + 60*60*24*365 ) #this cookie
will expire in an year

never created a cookie?
 
B

Benjamin Kaplan

2010/8/3 Íßêïò said:
mymessage = "êáëçìÝñá"   <==== string
mymessage = u"êáëçìÝñá"  <==== string literal?

Not quite. A literal is the actual string in the file, those letters
between the quotes:
"êáëçìÝñá" <=== String literal (a literal value of the string/str type)
u"êáëçìÝñá" <=== Unicode literal (a literal value of the Unicode
type. The bytes on the page will be converted to unicode using the
file's encoding)
mymessage <==== String (not literal, because it's a value)
So, a string literal is one of the encodings i use to create a string
object?

Can the encodign of a python script file be in iso-8859-7 which means
the file contents is saved to the hdd as greek-iso but the part of
this variabel value mymessage = u"êáëçìÝñá" is saved as utf-8 ot the
opposite?

The compiler does not see u"êáëçìÝñá" on the page. All it sees is the
bytes ['0x75', '0x22', '0xea', '0xe1', '0xeb', '0xe7', '0xec', '0xdd',
'0xf1', '0xe1', '0x22']

Now the compiler knows that the sequence 0x75 0x22 (Stuff) 0x22 means
to create a Unicode literal. So it takes those bytes ('0xea', '0xe1',
'0xeb', '0xe7', '0xec', '0xdd', '0xf1', '0xe1') and decodes them using
the pages encoding, in your case ISO-8859-7. At this point, they don't
have an encoding. They aren't bytes as far as you are concerned, they
are code points. Internally, they're stored as either UTF-16 or UTF-32
depending on how Python was compiled, but that doesn't matter. You can
treat them as if they are characters.
have the file saved as utf-8 but one variuable value as greek
encoding?

Sure you can. A unicode literal will always have the encoding of the
file. But a string is just a sequence of bytes (forget about the
characters that show up on the page for now). If you do
"\xce\xba\xce\xb1\xce\xbb\xce\xb7\xce\xbc\xce\xad\xcf\x81\xce\xb1".encode('UTF-8')
Then Python will take that sequence of bytes and interpret them as
UTF-8. That will give you the same Unicode string you started out
with: u"êáëçìÝñá"
Encodings still give me headaches. I try to understand them as
different ways to store data in a media.

Tell me something. What encoding should i pick for my scripts knowing
that only contain english + greek chars??
iso-8859-7 or utf-8 and why?

Can i save the sting lets say "Íßêïò" in different encodings and still
print out correctly in browser?

ascii = the standard english character set only, right?

Yes.


So the pythoon compiler using the cgi module is the one that is
producing the html output that immediately after send to the web
server, right?

No. They both are in whatever encoding your file is using. But the
first one will be interpreted as a sequence of bytes. the second one
will be interpreted as a sequence of characters. For a single-byte
encoding like ISO-8859-7, it doesn't make a difference. But if you
were to encode it in UTF-8, the first one would have a length of 16
(because the Greek characters are all 2 bytes) and the 2nd one would
have a length of 8.
But since mymessage = "êáëçìÝñá" is a string containing greek
characaters why the editor doesn't save it as such?

Because you don't save characters, you save bytes.

\xce\xba\xce\xb1\xce\xbb\xce\xb7\xce\xbc\xce\xad\xcf\x81\xce\xb1 is
your String in UTF-8
\xea\xe1\xeb\xe7\xec\xdd\xf1\xe1 is that exact same string in ISO-8859-7

They are two different ways of representing the same characters

It reminds me of varibles an valeus where if you say

a = 5 , a var becomes instantly an integer variable
while
a = 'hello' , become instantly a string variable

mymessage = u"êáëçìÝñá"

creates an object that is *not* encoded.

Because it isn't saved by the editor yet? In what satet is this object
in before it gets encoded?
And it egts encoded the minute i tell the editor to save the file?
Encoding is taking the unicode
stream and representing it as a stream of bytes, which may or may have
more bytes than the original has characters.


So this line mymessage = u"êáëçìÝñá" what it does is tell the browser
thats when its time to save the whole file to save this string as
utf-8?

If yes, then if were to save the above string as greek encoding how
was i suppose to right it?

Also if u ise the 'coding line' in the beggining of the file is there
a need for using the u literal?
I personally haven't done any cookie code. If I were debugging this, I'd
factor out the multiple parts of that if statement, and find out which
one isn't true. From here I can't guess.

I did what you say and foudn out that both of the if condition parts
were always false thast why the if code blck never got executed.

And it is alwsy wrong because the cookie never gets set.

So can you please tell me why this line

cookie['visitor'] = ( 'nikos', time() + 60*60*24*365 )          #this cookie
will expire in an year

never created a cookie?
 
M

MRAB

Îίκος said:
On 3 ΑÏγ, 21:00, Dave Angel <[email protected]> wrote:
A string is an object containing characters. A string literal is one of
the ways you create such an object. When you create it that way, you
need to make sure the compiler knows the correct encoding, by using the
encoding: line at beginning of file.
[snip]
Tell me something. What encoding should i pick for my scripts knowing
that only contain english + greek chars??
iso-8859-7 or utf-8 and why?
This is easy to answer: UTF-8 with the:

# -*- coding: UTF-8 -*-

comment to tell Python that your script file is encoded in UTF-8.

I was once given a file in a language I don't know (translations for
display messages). Some of the text didn't look quite right. It took me
a while to figure out that it was written on a machine which used CP1250
and my machine used CP1252. If everybody used the same encoding then
such problems wouldn't occur, and UTF-8 can handle any characters which
are in Unicode: Latin, Greek, Cyrillic, Arabic, etc.
 
Í

Íßêïò

For the cookie problem iam tryign houts now and even this aint
working:

========================================
cookie = Cookie.SimpleCookie()

if os.environ.get('HTTP_COOKIE') and cookie.has_key('visitor') ==
'nikos': #if visitor cookie exist
print "Cookie Unset"
cookie['visitor'] = 'nikos'
cookie['visitor']['expires'] = -1 #this cookie will expire now
else:
print "Cookie is set!"
cookie['visitor'] = 'nikos'
cookie['visitor']['expires'] = 1000 #this cookie will expire now
========================================

i tried in IDLE enviroment as well and for some reason even with a
single number isnated of time() function the cookie is never set,
because the print of

result to

None

:(
 
D

Dotan Cohen

2010/8/4 Îίκος said:
Encodings still give me headaches. I try to understand them as
different ways to store data in a media.

Tell me something. What encoding should i pick for my scripts knowing
that only contain english + greek chars??
iso-8859-7 or utf-8 and why?

Always use UTF-8, every modern system supports it, and it will let you
use any arbitrary character that you need, such as maybe a smiley or a
Euro sign. You will avoid headaches with databases and files and all
sorts of other things that you don't yet expect. Declare it in the
HTTP header, and in the HTML meta tag.

Trust me, I maintain gibberish.co.il which specializes in encoding
problems. Just use UTF-8 everywhere and you will save a lot of
headaches.

Can i save the sting lets say "Îίκος" in different encodings and still
print out correctly in browser?

No.


ascii = the standard english character set only, right?

Pretty much, plus the numbers, some symbols, and a few nonprinting
characters. Read here:
http://en.wikipedia.org/wiki/Ascii
 
S

Steven D'Aprano

i tried in IDLE enviroment as well and for some reason even with a
single number isnated of time() function the cookie is never set,
because the print of


result to

None


What happens if you open up a NEW xterm and do this?

echo $HTTP_COOKIE


Or, to put it another way... are you sure that the environment variable
is actually being set?
 
D

Dave Angel

¯º¿Â said:
mymessage = "καλημέÏα" <==== string
mymessage = u"καλημέÏα" <==== string literal?

So, a string literal is one of the encodings i use to create a string
object?
No, both lines take a string literal, create an object, and bind a name
to that object. In the first case, the object is a string, and in the
second it's a unicode-string. But the literal is the stuff after the
equals sign in both these cases.

Think about numbers for a moment. When you say
salary = 4.1

you've got a numeric literal that's three characters long, and a name
that's six characters long. When the interpreter encounters this line,
it builds an object of type float, whose value approximates 4.1,
according to the language rules. It then binds the name salary to this
object.
Can the encodign of a python script file be in iso-8859-7 which means
the file contents is saved to the hdd as greek-iso but the part of
this variabel value mymessage ="καλημέÏα" is saved as utf-8 ot the
opposite?
A given file needs to have a single encoding, or you're in big trouble.
So a script file is encoded by the text editor in a single encoding
method, which is not saved to the file (except indirectly if you specify
BOM). It's up to you to add a line to the beginning to tell Python how
to decode the file. One decoding for one file.
have the file saved as utf-8 but one variuable value as greek
encoding?
Variables are not saved to source (script) files. Literals are in the file.
Encodings still give me headaches. I try to understand them as
different ways to store data in a media.

Tell me something. What encoding should i pick for my scripts knowing
that only contain english + greek chars??
iso-8859-7 or utf-8 and why?
Depends on how sure you are that your program will never need characters
outside your greek character set. Remember Y2K?
Can i save the sting lets say "Îίκος" in different encodings and still
print out correctly in browser?

ascii =he standard english character set only, right?



So the pythoon compiler using the cgi module is the one that is
producing the html output that immediately after send to the web
server, right?




But since mymessage =καλημέÏα" is a string containing greek
characaters why the editor doesn't save it as such?
Because the editor is editing text, not python objects. It's job is
solely to represent all your keystrokes in some consistent manner so
that they can be interpreted later by some other program, possibly a
compiler.
It reminds me of varibles an valeus where if you say

a = 5, a var becomes instantly an integer variable
while
a = 'hello' , become instantly a string variable




Because it isn't saved by the editor yet? In what satet is this object
in before it gets encoded?
And it egts encoded the minute i tell the editor to save the file?
You're confusing timeframes here. Notepad++ doesn't know Python, and
it's long gone by the time the compiler deals with that line. In
Notepad++, there are no python objects, encoded or not.
So this line mymessage = u"καλημέÏα" what it does is tell the browser
thats when its time to save the whole file to save this string as
utf-8?
No idea what you mean. The browser isn't saving anything; it doesn't
even get involved till after the python code has completed.
If yes, then if were to save the above string as greek encoding how
was i suppose to right it?

Also if u ise the 'coding line' in the beggining of the file is there
a need for using the u literal?
If you don't use the u literal, then don't even try to use utf-8. You'll
find that strings have the wrong lengths, and therefore subscripts and
formatting will sometimes fail in strange ways.
I personally haven't done any cookie code. If I were debugging this, I'd
factor out the multiple parts of that if statement, and find out which
one isn't true. From here I can't guess.

I did what you say and foudn out that both of the if condition parts
were always false thast why the if code blck never got executed.

And it is alwsy wrong because the cookie never gets set.

So can you please tell me why this line

cookie['visitor'] = 'nikos', time() + 60*60*24*365 ) #this cookie
will expire in an year

never created a cookie?
As I said, I've never coded with cookies. But to create a cookie, you
have to communicate with a browser, and that takes lots more than just
adding an item to a map. Further, your getenv() will normally give you
the state of the environment at the time your program was launched, so I
wouldn't expect it to change.

If I had to guess how cookies are done in CGI, I'd say that you probably
have to talk to the CGI server and terminate, and that afterwards
there'd be a new launch of your code, from which you could check that
environment variable to see the results of the cookie.


DaveA
 
D

Dotan Cohen

Depends on how sure you are that your program will never need characters
outside your greek character set. Remember Y2K?

Don't forget that the Euro symbol is outside the Greek character set.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,969
Messages
2,570,161
Members
46,708
Latest member
SherleneF1

Latest Threads

Top