big binary string

L

Laurent Vogel

hello,

I'm writing a script which needs about 60 kbytes of text data.
Currently it looks like:
function1("small string1");
function2("small string2");
function1("small string3");
and so on. In order to have it loaded more quickly, I'm considering
actually puting all the data in one big string (something like:
1<small string1>2<small string2>1<small string3>...
) and compress it (using a naive compression approach, I can
shrink the 60 kb down to approximately 20 kb, plus 2 or 3 lines of
javascript code to decompress the stuff). Now the question is:

QUESTION:
---------

Is it okay to have a statement like:

mystring = "blablabla\
blablabla\
blablabla";

with 20 kilo bytes of binary blablabla ? What characters may I use in
the string? (can I use any bytes from 1 to 255, or should I restrict
to the printable subset of ISO latin 1) ?

Thanks for any answers,

Laurent Vogel
 
L

Lasse Reichstein Nielsen

Laurent Vogel said:
Is it okay to have a statement like:

mystring = "blablabla\
blablabla\
blablabla";

No. Javascript strings cannot span lines. You should do it as:
var mystring = "blabla...bla"+
"blablabl.. .." +
"blablabl.. .." +
"blablabl.. .." +
"blablabl.. .."...
although with that many lines, pure concatenation is probably
inefficient (quadratic time complexity). I think a better solution is
to build an array and join it in one go:
var myarray = ["blabalbal...bla",
"blabablablabl",
"bablb",
...
"lbalblab"];
var mystring = myarray.join("");
with 20 kilo bytes of binary blablabla ?
What characters may I use in the string? (can I use any bytes from 1
to 255, or should I restrict to the printable subset of ISO latin 1)?

Good question. That depends, among other things, on the encoding used
to send the page. ECMAScript says:
SourceCharacter ::
any Unicode character

StringLiteral ::
" DoubleStringCharacters_opt "
' SingleStringCharacters_opt '

DoubleStringCharacters ::
DoubleStringCharacter DoubleStringCharacters_opt

DoubleStringCharacter ::
SourceCharacter but not double-quote " or backslash \ or LineTerminator
\ EscapeSequence

so any Unicode character should work, if you can send it using the
encoding you use.

I would restrict myself to, e.g., ISO latin 1, and possibly the
printable subset. If you need codes outside of that, you can use
escapes: \xff (hex) or \0177 (octal), or for those that have an
escape, you can use it: \n. You probably don't want to use the unicode
escape, e.g., \u21e7.

/L
 
D

Dr John Stockton

JRS: In article <[email protected]>, seen
...
I can
shrink the 60 kb down to approximately 20 kb, plus 2 or 3 lines of
javascript code to decompress the stuff). Now the question is:
Is it okay to have a statement like:

mystring = "blablabla\
blablabla\
blablabla";

AIUI, \ is not part of the standard; it may not always work.

with 20 kilo bytes of binary blablabla ? What characters may I use in
the string? (can I use any bytes from 1 to 255, or should I restrict

0 to 255 ? But you would need to escape " \ and CR/LF (& LS, PS, FF?),
at the very very least.
to the printable subset of ISO latin 1) ?

Using characters 33 to 126 should be safe. IIRC, MIME uses 64 of them,
storing 3 arbitrary bytes in 4 legible ones, and PostScript can use 85
of them, storing 4 in 5. No lesser expansion from fully-compressed
seems worth the effort.

BUT :

Over all links other than dial-up and long-distance radio, 60kB will not
take very long; and, AIUI, the lower-speed links are generally data-
compressed by hardware. Probably, if you know the structure of your
data, you can compress more tightly; but I wonder whether you can make a
gain sufficient to justify the effort?


H'mmm - IIRC, "Basic English" can say anything that needs to be said,
generally, with 850 simple words; and for special topics 150 special
words. So if the text is so written, you could send a word-list plus
for each string the numbers for each word. Whether French can be so
treated I do not know.
 
L

Laurent Vogel

Thanks for the answers.
I don't quite understand why using ASCII 32 (space) in
strings should cause any problem. Just to be sure I will
restrict myself to the range 32-126.
As for whether compression can be worth the trouble,
frankly I don't know, but I maintain that I'm able to
shrink approximately 60k into 20k of comressed data (plus
three lines added for the decompression code). As an
example here is the decompressor together with the
(compressed) description of the compression format used.


<html><head></head><body><script>
/* here is the uncompressing routine */
function u(s){var n,i=0,g=function(){if((n=s.charCodeAt(i++)-32)>=64)
n=(n-64)*95+s.charCodeAt(i++)-32;},o,d="",j=0;for(;;){g();d+=s.substr(i,
n);
i+=n;j+=n;g();if(0==(o=n))return d;g();while(n-- >0)d+=d.charAt(j++
-o);}}

/* and here is sample compressed data */
document.write(u([
"X<p>This is an example of compression for Javascript.
The@0#mat`d$(\nvery s",
"i`f$%, and5$'best de^$5bed by the algorithm N%)mented in=%#\nuna\")?ng
rout",
"ine:</p>\n\n<p>Conceptuall`n& P4Hg will read objects one at a time from
\n`",
"`$'\"input\"au&Ealways append data to a \"buffer\". Wheaj&#jobbE$!d`x$
az% ",
"D() containsa`/!e`q&!.b-)(Now here`g$ bz- bP(<blockquote><pre>forever
{\n ",
"b?&(<i>n</i>1$ av' 3(/ bytes verbatimbT& bS+% into1& az' `v,&offset`h%
7$'q",
"uit if:/$== 0aLK&copiedaJ& `i. @& aP'+current end`l$! fz$ ah, az2',
allowe=",
"$(overlappeQ%&egionsb#%' (i.e.a9/$&lt;ag)%)\n}</cj$\"</c|+\"\n
dO%Lumbers ",
"smaller than 64 are encoded as ASCII /$% 32 +af%!n[%*. Bigger
\n0&!s[-!ugC%",
"3two printable ascii^'#s (c9% a!.1to \n126 included)ii& i]* i\\%-crude,
but",
" itc,&#s a<& b1% i<0 a $%. Notg_$ hZ$ c_< j 3\"onj (4 provides run
lengthbV",
"& `c$ kg$$freeh9*!Ol)$)urse, perb*%\"nchG% ff$(e poor wiJ$ a.$)ared
withff$",
"!llC)-ors\nlike gzipbZ&%honorcX% af-2atio can be achiev`q$ jJ( b5+ j$'
jC**",
"many repealf$*substringsb-& "].join("")));
</script></body></html>


regards,

Laurent Vogel
-- remove "ima" and "ictor" to get my email address
 
S

Steve van Dongen

Thanks for the answers.
I don't quite understand why using ASCII 32 (space) in
strings should cause any problem. Just to be sure I will
restrict myself to the range 32-126.
As for whether compression can be worth the trouble,
frankly I don't know, but I maintain that I'm able to
shrink approximately 60k into 20k of comressed data (plus
three lines added for the decompression code). As an
example here is the decompressor together with the
(compressed) description of the compression format used.


<html><head></head><body><script>
/* here is the uncompressing routine */
function u(s){var n,i=0,g=function(){if((n=s.charCodeAt(i++)-32)>=64)
n=(n-64)*95+s.charCodeAt(i++)-32;},o,d="",j=0;for(;;){g();d+=s.substr(i,
n);
i+=n;j+=n;g();if(0==(o=n))return d;g();while(n-- >0)d+=d.charAt(j++
-o);}}

Have you done performance testing of compressing data like this vs.
using uncompressed data over various link speeds? I'd be very wary of
the above. String concatenation is one of the slowest operations you
can do in Javascript due to naive memory allocation, and when you
start concatenating large strings repeatedly in a loop... Well, lets
just say that we used to have some scripts that did that. We were
dealing with data sizes up in the 200-400K range and the script took
15 minutes to run. After reducing the number of string concatenations
where one of the two operands was very large the script only took 2.5
minutes. You need to test whether this is worthwhile. I believe John
is correct and it will not be.

Regards,
Steve
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,093
Messages
2,570,607
Members
47,227
Latest member
bluerose1

Latest Threads

Top