Histogram in XSLT 1.0

S

shaun roe

I should like to count the frequency of strings embedded in a longer
string, space separated. Specifically, I have:


<phiModule>
5 5 5 5 6 6 6 6 7 7 7
7 8 8 8 8 8 5 5 5 6 6
6 7 7 7 7 7 7 7 7 8 8
8 8 8 8 8 8 8 9 9 9 9
6 7 7 7 8 8 8 8 9 9 9
9 9 9 9 9 10 10 10 10 10 10
11 11 11 11 11 9 9 9 9 9 9
9 10 10 10 10 10 10 11 11 11 11
11 11 11 11 11 11 11 12 12 13 13
13 13 13 13 13 13
</phiModule>

And I should like to count the number of each phi value, eventually
outputting a text like:

Phi module 6 was hit 4 times.
(and so on for all the other phi values)

THe phi values are limited to a range 0-51, but I dont know what phi
values will appear in a given file.

Has anyone tackled something like this? I have to use xslt 1.0, so
tokenize, grouping etc becomes a bit tedious...

cheers

shaun
 
X

xmlator

I have to use xslt 1.0

Why?
tokenize, grouping etc becomes a bit tedious...

Sure does. Kinda sounds like an arbitrary academic
homework assignment, in which case an arbitrary
academic solution should suffice.

I guess I would look into transforming the numeric
list into a set of XML nodes (e.g. "<Foo Phi='xx'/>")
and then for-each of the possible values of phi, just
emit a count of the nodes having the attribute of
the corresponding value.

Good luck,
Ron Burk
www.xmlator.com
 
J

Joseph Kesselman


Presumably because a 2.0 processor isn't available in the target
environment (not very surprising).

Since XSLT's string (as opposed to structural) manipulation capabilities
are relatively limited, I agree that the two-stage approach (convert it
into something XSLT can count easily, then count) may be simplest. That
second pass can be done in 1.0 without a separate styling pass with a
bit of help from the exslt nodeset extension function; this isn't
actually part of 1.0 but it is widely supported for exactly this sort of
two-pass solution.

Of course you're going to have to do a recursive parse pass to pull the
individual integers out of that text string and convert them. So another
approach would be to write the recursion to count them directly and
generate a report when it runs out of input. You know there's a limited
range, so you can have an explicit parameter for each value to carry the
count (so far) down through the recursion. Since this would be
tail-recursion, a good XSLT processor would be able to optimize it into
a tolerably efficient loop.

Another fix, of course, would be to change whatever is generating this
data to produce it in a more XML/XSLT-friendly format in the first
place, avoiding the need for conversion or tokenizing.
 
D

Dimitre Novatchev

Using FXSL 1 this is straightforward:

When this transformation:

<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:ext="http://exslt.org/common"
exclude-result-prefixes="ext"
<xsl:import href="strSplit-to-Words.xsl"/>

<xsl:eek:utput indent="yes" omit-xml-declaration="yes"/>

<xsl:key name="kWordByVal" match="word" use="."/>

<xsl:template match="/">
<xsl:variable name="vrtfwordNodes">
<words>
<xsl:call-template name="str-split-to-words">
<xsl:with-param name="pStr" select="/"/>
<xsl:with-param name="pDelimiters"
select="',
'"/>
</xsl:call-template>
</words>
</xsl:variable>

<xsl:variable name="vwordNodes"
select="ext:node-set($vrtfwordNodes)"/>

<xsl:for-each select="$vwordNodes">
<xsl:for-each select="$vwordNodes/*/*[.]
[generate-id()
=
generate-id(key('kWordByVal',.)[1])
]">
<xsl:sort data-type="number"/>

<xsl:value-of select=
"concat('Phi module ', .,
' was hit ',
count(key('kWordByVal',.)),
' times
'
)"
/>
</xsl:for-each>
</xsl:for-each>
</xsl:template>

</xsl:stylesheet>

is applied on your input document:

<phiModule>
5 5 5 5 6 6 6 6 7 7 7
7 8 8 8 8 8 5 5 5 6 6
6 7 7 7 7 7 7 7 7 8 8
8 8 8 8 8 8 8 9 9 9 9
6 7 7 7 8 8 8 8 9 9 9
9 9 9 9 9 10 10 10 10 10 10
11 11 11 11 11 9 9 9 9 9 9
9 10 10 10 10 10 10 11 11 11 11
11 11 11 11 11 11 11 12 12 13 13
13 13 13 13 13 13
</phiModule>

the wanted result is produced:

Phi module was hit 1 times
Phi module 5 was hit 7 times
Phi module 6 was hit 8 times
Phi module 7 was hit 15 times
Phi module 8 was hit 18 times
Phi module 9 was hit 19 times
Phi module 10 was hit 12 times
Phi module 11 was hit 16 times
Phi module 12 was hit 2 times
Phi module 13 was hit 8 times


Cheers,
Dimitre Novatchev
 
S

shaun roe

Joseph Kesselman said:
Presumably because a 2.0 processor isn't available in the target
environment (not very surprising).

Since XSLT's string (as opposed to structural) manipulation capabilities
are relatively limited, I agree that the two-stage approach (convert it
into something XSLT can count easily, then count) may be simplest. That
second pass can be done in 1.0 without a separate styling pass with a
bit of help from the exslt nodeset extension function; this isn't
actually part of 1.0 but it is widely supported for exactly this sort of
two-pass solution.

Of course you're going to have to do a recursive parse pass to pull the
individual integers out of that text string and convert them. So another
approach would be to write the recursion to count them directly and
generate a report when it runs out of input. You know there's a limited
range, so you can have an explicit parameter for each value to carry the
count (so far) down through the recursion. Since this would be
tail-recursion, a good XSLT processor would be able to optimize it into
a tolerably efficient loop.

Another fix, of course, would be to change whatever is generating this
data to produce it in a more XML/XSLT-friendly format in the first
place, avoiding the need for conversion or tokenizing.

Thanks for the ingenious suggestions and solutions. I should explain the
context, maybe you will find it interesting; I am working on the Silicon
Tracker for the Atlas experiment at CERN. I am seriously considering
proposing XSLT (dare I say Ajax?) as a remote monitoring solution for
the experiment, the idea being that only the Firefox web browser would
be needed to see the results. An example of a cosmic ray event is here:

http://sroe.home.cern.ch/sroe/svg/combined.svg

(generated by XSLT)

Thus I am restricted to what might be achievable in Firefox, with or
without some scripting incorporated. The file I showed generates the
kind of event display I link to, so is not *really* ideal for
histogramming, but seems (at present) to be the only xml result file
available as an RPC request from our analysis program. I'm trying to
discover how much I can do with it. In particular, where SVG production
is not an option I want a text summary of the event.
I can see I should make an effort to get the more amenable result files
available on request via a web service, but your suggestions have given
me some ideas for working with what I've got...

cheers

shaun
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,007
Messages
2,570,266
Members
46,863
Latest member
montyonthebonty

Latest Threads

Top