Size of cookie and splitting strings

A

Andrew Poulos

If a cookie is limited to 4K, 4,096 bytes, in size, how can I tell if a
string will fit into one cookie (without requiring to be truncated/split
across more than one cookie)? Do I need to try to write it and then read
it back and see if it matches the original?

I'm happy to split the string but I've no idea how many characters 4K
may actually mean given that the page is in UTF-8.

Andrew Poulos
 
E

Evertjan.

Andrew Poulos wrote on 12 jul 2009 in comp.lang.javascript:
If a cookie is limited to 4K, 4,096 bytes, in size, how can I tell if a
string will fit into one cookie (without requiring to be truncated/split
across more than one cookie)?

If you know this reuirement, why not measure the size of the string to be
used with myString.length?
Do I need to try to write it and then read
it back and see if it matches the original?

No, you do not need th do anything, but the idea is sound,
first test toroughly.

I'm happy to split the string but I've no idea how many characters 4K
may actually mean given that the page is in UTF-8.

Again, you can test this.
Do not expect all browsers to behave [in the same way].

The page setting should not influence the Javascript test for the
myString.length value, methinks.
 
A

Andrew Poulos

Evertjan. said:
Andrew Poulos wrote on 12 jul 2009 in comp.lang.javascript:
If a cookie is limited to 4K, 4,096 bytes, in size, how can I tell if a
string will fit into one cookie (without requiring to be truncated/split
across more than one cookie)?

If you know this requirement, why not measure the size of the string to be
used with myString.length?
Do I need to try to write it and then read
it back and see if it matches the original?

No, you do not need to do anything, but the idea is sound,
first test thoroughly.

I'm happy to split the string but I've no idea how many characters 4K
may actually mean given that the page is in UTF-8.

Again, you can test this.
Do not expect all browsers to behave [in the same way].

The page setting should not influence the Javascript test for the
myString.length value, methinks.
How does the length of a string relate to the 4,096 bytes allowable in a
cookie. Will, for example, a string of 100 characters equal 100 bytes in
a cookie? Wikipedia tells me that UTF-8 encodes each character in 1 to 4
octets (8-bit bytes). Doesn't that mean a string 100 characters could be
as large as 400 bytes?

Andrew Poulos
 
E

Evertjan.

Andrew Poulos wrote on 12 jul 2009 in comp.lang.javascript:
How does the length of a string relate to the 4,096 bytes allowable in a
cookie. Will, for example, a string of 100 characters equal 100 bytes in
a cookie? Wikipedia tells me that UTF-8 encodes each character in 1 to 4
octets (8-bit bytes).

Bytes are 8 bits per definition.

[nibble 4 bits, word 16 bit, longword 32 bit]
Doesn't that mean a string 100 characters could be
as large as 400 bytes?

Ah yes, I see your point.

Perhaps the cookie is not the answer in your case.
Or be on the safe side by allowing a safe maximum length.
 
T

Thomas 'PointedEars' Lahn

Andrew said:
How does the length of a string relate to the 4,096 bytes allowable in a
cookie.

The cookie value is a string of characters. document.cookie yields all
name-value pairs that apply to the context document. So test the length of
document.cookie, taking into account possible delimiting whitespace, and the
length of the value to be stored in the cookie before you assign to
`document.cookie'.

Another or an additional way to determine if it worked is that you check the
value of document.cookie before and after assignment. If the values differ,
assignment was successful. Or maybe an exception is thrown, too.

See also: said:
Will, for example, a string of 100 characters equal 100 bytes in
a cookie? Wikipedia tells me that UTF-8 encodes each character in 1 to 4
octets (8-bit bytes). Doesn't that mean a string 100 characters could be
as large as 400 bytes?

Yes, if cookies were stored in UTF-8. Since they are probably not (that
would not be backwards-compatible, and it would counteract their purpose),
it would not matter. Go figure.

FWIW, using Storage instead might help.


PointedEars
 
E

Evertjan.

Richard Cornford wrote on 12 jul 2009 in comp.lang.javascript:
I also see the point, but the situation is probably nowhere near as bad
as that. Unicode has gone from a 16 bit set to a 32 bit set and in doing
so it has included the 'characters' for many additional (but obscure)
languages, such as "linier-B" (which is, more or less, an alternative
character set for writing ancient Greek). However, the 16 bit Unicode
set included (at least the vast majority of) the characters used in
current languages. So the odds are in favour of most applications being
able to get by without needing more than 2 bytes per character when
encoding strings as UTF-8.

However there should be worst case coping.

One could perhaps test the string for total byte length?

var bytes = 0,t;
for (var i=0 ; i < myStr.length ; i++) {
t = myStr.charCodeAt(i);
do
bytes++
while ((t/=256)>=1);
};
 
A

Andrew Poulos

Thomas said:
Andrew Poulos wrote:

Yes, if cookies were stored in UTF-8. Since they are probably not (that
would not be backwards-compatible, and it would counteract their purpose),
it would not matter. Go figure.

FWIW, using Storage instead might help.

The only google reference I could find was something about using
window.name to store data in. Is that what you're referring to?

Andrew Poulos
 
B

Bart Van der Donck

Andrew said:
How does the length of a string relate to the 4,096 bytes allowable in a
cookie. Will, for example, a string of 100 characters equal 100 bytes in
a cookie?

It equals to 100 only if all characters in the string are below code
point 256. These are the following:
http://en.wikipedia.org/wiki/File:ASCII_full.svg
Wikipedia tells me that UTF-8 encodes each character in 1 to 4 octets
(8-bit bytes). Doesn't that mean a string 100 characters could be as
large as 400 bytes?

Yes, UTF-8 characters have a variable byte length (min. 1, max. 4).
But that doesn't matter here. A cookie goes into the HTTP-header and
may only contain ASCII characters, so the string *must* be encoded
into ASCII before it can be sent as a cookie. This should preferably
be done with an UTF-8 percent-encoding; see 'encodeURI()' in
javascript. Afterwards you could check whether the encoded result is
still shorter than 4096 bytes.

var str = 'café \u1234';
var enc_str = encodeURI(str);
var report = 'Original string: ' + str + '\n'
+ 'Original string length: ' + str.length + '\n'
+ 'Encoded string: ' + enc_str + '\n'
+ 'Encoded string length: ' + enc_str.length;
alert(report);
if (enc_str.length > 4096) {
alert('cookie too long')
}
else {
alert('cookie not too long');
}

'decodeURI()' is the opposite of 'encodeURI()' and can be used to
retrieve the original value from the cookie. In theory, you could also
use HTML num/char entities; or anything you like, as long as both
parties en/decode in the same manner AND as long as the cookie-string
is ASCII.

But percent-encoding would be the recommended way.

Hope this helps,
 
T

Thomas 'PointedEars' Lahn

Bart said:
var str = 'café \u1234';
var enc_str = encodeURI(str);

Will apparently work in JavaScript 1.8.1 (Firefox 3.5 & friends), but not in
all instances or implementations, because it is not necessarily an URI (this
one clearly is not). One should use encodeURIComponent() here instead ...
[...]
'decodeURI()' is the opposite of 'encodeURI()' [...]

.... and decodeURIComponent(), respectively.


PointedEars
 
B

Bart Van der Donck

Thomas said:
Will apparently work in JavaScript 1.8.1 (Firefox 3.5 & friends), but notin
all instances or implementations, because it is not necessarily an URI (this
one clearly is not).

But it's also clearly not a component of an URI.
One should use encodeURIComponent() here instead ...

It doesn't matter which of the two. Both encodeURI() and
encodeURIComponent() have exactly the same browser support [*]. The
only difference is that encodeURIComponent() will not encode ~!*()'
and encodeURI() will not encode ~!@#$&*()=:/,;?+' [**]. All those
characters are absolutely safe to store inside a cookie, so in this
case it makes no difference; as long as the same decoding instruction
is used.

[*] http://pointedears.de/scripts/test/es-matrix/ - Mozilla:
javascript 1.5. & MS Jscript 5.5.
[**] http://xkr.us/articles/javascript/encode-compare/
 
D

Dr J R Stockton

In comp.lang.javascript message <4a599398$0$9769$5a62ac22@per-
qv1-newsreader-01.iinet.net.au>, Sun, 12 Jul 2009 17:41:08, Andrew
Poulos said:
If a cookie is limited to 4K, 4,096 bytes, in size, how can I tell if a
string will fit into one cookie (without requiring to be
truncated/split across more than one cookie)? Do I need to try to write
it and then read it back and see if it matches the original?


That must be safest, because it takes account of anything that may have
been forgotten, vary between systems, etc.

Also, the length of what is returned tells you where to start the next
substring.

Note that ECMA 262 Final Draft 5 says that charCodeAt returns a non-
negative integer less than 2^16. Current RFCs for cookies should define
the coding used.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,091
Messages
2,570,605
Members
47,225
Latest member
DarrinWhit

Latest Threads

Top