JavaScript string value serialization to XML

P

Pavils Jurjans

Hello,

I have been developing an Ajax-style framework for couple of years now.
Now I am reworking some parts of it. The problem was that I used to use
JSON for JavaScript value serialization/deserialization, but this
approach has some limitations when it comes to large data - the
JavaScript at the client side has too much load working with it. So, I
thought, I'd rework the serialization to XML format.

The idea is short-as-possible bare-bones serialization:

For boolean values:
<bool value="true"/>

For numeric values:
<num value="1234.567"/>

For string values:
<str value="ABC"/>

For arrays:
<arr>
<bool index="0" value="true"/>
<num index="1" value="777"/>
<str index="2" value="abc"/>
</arr>

For objects:
<obj>
<bool key="dataFieldA" value="true"/>
<num key="dataFieldB" value="777"/>
<str key="dataFieldC" value="abc"/>
</obj>

For classes:
<obj class="Person">
<str key="name" value="John Smith"/>
<str key="age" value="30"/>
</obj>


etc. You can see where I am landing.

The problem I am stuck with is that in JavaScript, you can have
perfectly legal string with fancy charcodes:

var str = String.fromCharCode(7);

The problem is that the following XML does not validate:

<str value=""/>

You can test this out online (MSIE):
http://www.jurjans.lv/dhtml/XMLparse.html

Right now, it seems that the only exit is to have another layer of
encoding/decoding, handled by JavaScript, when building.parsing the XML
data, something like

<str value="\x07"/>

It makes me somewhat sad, because it brings back to having
JavaScript-based string encoding/decoding (ie, using string.replace),
that might be time and load consuming when doing large strings.

Maybe someone has good ideas how to handle these special charcode
values?

Regards,
Pavils
 
P

Pavils Jurjans

This doesn't solve the problem - your example doesn't parse either. The
problem is that  isn't valid character, so if I have such
JavaScript string, I can't encode in XML it with simple means.
 
M

Martin Honnen

Pavils Jurjans wrote:

The idea is short-as-possible bare-bones serialization:

For boolean values:
<bool value="true"/>

For numeric values:
<num value="1234.567"/>

For string values:
<str value="ABC"/>

It is up to you to make up your own notation but you might want to look
into existing schema languages and/or into SOAP web services.

The problem I am stuck with is that in JavaScript, you can have
perfectly legal string with fancy charcodes:

var str = String.fromCharCode(7);

The problem is that the following XML does not validate:

<str value=""/>

That is not a question of validation (where an XML instance document is
checked against a grammar defined by a DTD or schema), rather that is
well-formedness problem, with XML 1.0 that character is not allowed,
whether you use a numeric character reference or not.
With XML 1.0 you would need to encode such characters, e.g. base64
encode them.
 
P

Pavils Jurjans

That is not a question of validation (where an XML instance document is
checked against a grammar defined by a DTD or schema), rather that is
well-formedness problem, with XML 1.0 that character is not allowed,
whether you use a numeric character reference or not.

True, I have been using the wrong term.
With XML 1.0 you would need to encode such characters, e.g. base64
encode them.

Ok, so that's the answer. Another layer of JavaScript-driven
encoding/decoding will be needed. I will try to find the most effective
(speed-wise) way how to do that in both directions. My application has
a form where user can navigate through pages of record entries. When he
select page, Ajax call is made to server, and response contains HTML
that must be injected in the appropriate DIV tag. The thing is that
this HTML is full with special characters, and currently JavaScript has
problems extracting it from the response XML - it's just too slow.
 
P

Pavils Jurjans

Sorry, my suggestion will
do perfectly well for getting legal non-ascii characters into your xml, but
it won't let you put in control characters.

Truth been said. It's just that there's no difference - either I store
the string value in attribute or in tag content. It' s a heated debate
for differeny XML-apostles, I'm just judging that the "string variable"
would not need another nested XML elements, so I opt to place the value
in "value" attribute. In a way, I wouldn't like that in example..
<str>some stuff<element>sdfsd</element></str>
... the "element" tag is treated as part of the string value, because
it's actually part of XML doc. Because of that, I want to store the
value in attribute, where it's just not possible to create nested tags.
And, probably, I am right now entering the domain of heated debate :)
 
P

Pavils Jurjans

encodeURIComponent/decodeURIComponent would probably be a good start for
this.

Now this is a fine suggestion! This function has completely escaped my
attention! This probably will do fine. The "bad" thing here is that it
will encode all the unicode characters into at least 6 bytes per
character - although they would be completely Ok for UTF-8 encoded XML
file. But, those extra 4 bytes (6 bytes in encodeURIComponent encoded
form, while only 2 in UTF-8 encoded form) for every unicode character
probably is very small price to pay because of improved speed of
processing.

Thanks!

Regards,

Pavils
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,995
Messages
2,570,236
Members
46,825
Latest member
VernonQuy6

Latest Threads

Top