XML compression

E

Ed Beroset

I have an XML file that I want to squeeze down as small as possible for
storage in an embedded device. I want it to still be a valid XML file
(and not something like a binary ASN.1 encoding of an XML file) but it
does not need to carry the long tags it currently has as long as I
create an XSLT which will put it back into the right form. What I had
in mind was something like this:

<original-xml-fragment>
<very-long-and-verbose-tag name="Long tag 1">
<more-information-is-stored-here name="stuff 1"/>
</very-long-and-verbose-tag>
<very-long-and-verbose-tag name="Long tag 2">
<more-information-is-stored-here name="stuff 2"/>
<valuable-additional-information name="foo"/>
</very-long-and-verbose-tag>
</original-xml-fragment>

I'm thinking of transforming it to this:

<o><v n="Long tag 1"><m n="stuff 1"/></v><v n="Long tag 2"><m n="stuff
2"/><v2 n="foo"/></v></o>

My question is, has someone already generated an XSLT that would
abbreviate tags in this kind of way AND generate the corresponding
"decoder" XSLT which would reconstitute the original. I have ideas
about how to do it using a procedural language, but I would like to do
it entirely with XSL transforms if I can.

The only part that I don't really know how to do is to automatically
generate short, unique abbreviations for each of the tags. I *could*
specify them all manually once, but I'd prefer an automatic solution to
simplify maintenance.

Ed
 
J

Joris Gillis

I have an XML file that I want to squeeze down as small as possible for
storage in an embedded device. I want it to still be a valid XML file
(and not something like a binary ASN.1 encoding of an XML file) but it
does not need to carry the long tags it currently has as long as I
create an XSLT which will put it back into the right form. What I had
in mind was something like this:

<original-xml-fragment>
<very-long-and-verbose-tag name="Long tag 1">
<more-information-is-stored-here name="stuff 1"/>
</very-long-and-verbose-tag>
<very-long-and-verbose-tag name="Long tag 2">
<more-information-is-stored-here name="stuff 2"/>
<valuable-additional-information name="foo"/>
</very-long-and-verbose-tag>
</original-xml-fragment>

I'm thinking of transforming it to this:

<o><v n="Long tag 1"><m n="stuff 1"/></v><v n="Long tag 2"><m n="stuff
2"/><v2 n="foo"/></v></o>

My question is, has someone already generated an XSLT that would
abbreviate tags in this kind of way AND generate the corresponding
"decoder" XSLT which would reconstitute the original. I have ideas
about how to do it using a procedural language, but I would like to do
it entirely with XSL transforms if I can.

The only part that I don't really know how to do is to automatically
generate short, unique abbreviations for each of the tags. I *could*
specify them all manually once, but I'd prefer an automatic solution to
simplify maintenance.

Hi,

I've created this little stylesheet that will map all unique nodes names and give them abbreviations. It might be handy as an intermediate step towards a solution for your - btw very interesting- question.

<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

<xsl:eek:utput method="xml" indent="yes"/>

<xsl:key name="name" match="*|@*" use="local-name()"/>

<xsl:template match="/">
<name-mapping>
<xsl:for-each select="//*[generate-id()=generate-id(key('name',local-name()))]|//@*[generate-id()=generate-id(key('name',local-name()))]">
<name>
<xsl:attribute name="s"><xsl:number value="position()" format="a"/></xsl:attribute>
<xsl:value-of select="local-name()"/>
</name>
</xsl:for-each>
</name-mapping>
</xsl:template>

</xsl:stylesheet>



this will generate the following output:

<name-mapping>
<name s="a">original-xml-fragment</name>
<name s="b">very-long-and-verbose-tag</name>
<name s="c">name</name>
<name s="d">more-information-is-stored-here</name>
<name s="e">valuable-additional-information</name>
</name-mapping>

regards,
 
J

Joris Gillis

My question is, has someone already generated an XSLT that would
this will generate the following output:

<name-mapping>
<name s="a">original-xml-fragment</name>
<name s="b">very-long-and-verbose-tag</name>
<name s="c">name</name>
<name s="d">more-information-is-stored-here</name>
<name s="e">valuable-additional-information</name>
</name-mapping>
Hi, again

given that it is allowed to use two steps of tranformation, you can do this:
Unleash the above stylesheet on the verbose XML and let it output to a file named 'name-map.xml'.

When you apply the following stylesheet, the verbose XML will be reduced.

<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

<xsl:eek:utput method="xml" indent="yes"/>

<xsl:template match="*">
<xsl:variable name="name" select="local-name()"/>
<xsl:element name="{document('name-map.xml')//name[.=$name]/@s}">
<xsl:apply-templates select="@*"/>
<xsl:apply-templates/>
</xsl:element>
</xsl:template>

<xsl:template match="@*">
<xsl:variable name="name" select="local-name()"/>
<xsl:attribute name="{document('name-map.xml')//name[.=$name]/@s}">
<xsl:value-of select="."/>
</xsl:attribute>
</xsl:template>

</xsl:stylesheet>


The reduced form will look like this:
<a>
<b c="Long tag 1">
<d c="stuff 1"/>
</b>
<b c="Long tag 2">
<d c="stuff 2"/>
<e c="foo"/>
</b>
</a>


And this stylesheet will expand it again to the original verbose form:

<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

<xsl:eek:utput method="xml" indent="yes"/>

<xsl:template match="*">
<xsl:variable name="name" select="local-name()"/>
<xsl:element name="{document('name-map.xml')//name[@s=$name]}">
<xsl:apply-templates select="@*"/>
<xsl:apply-templates/>
</xsl:element>
</xsl:template>

<xsl:template match="@*">
<xsl:variable name="name" select="local-name()"/>
<xsl:attribute name="{document('name-map.xml')//name[@s=$name]}">
<xsl:value-of select="."/>
</xsl:attribute>
</xsl:template>

</xsl:stylesheet>


I hope this is useful.

regards,
 
E

Ed Beroset

Joris Gillis wrote:
[big snip of useful, working XSLT]
I hope this is useful.

It's more than useful -- it's superb! Thanks very much. When I figure
out how to combine them into a single step, I'll post the result.

Ed
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,999
Messages
2,570,243
Members
46,836
Latest member
login dogas

Latest Threads

Top