Help deciphering use of square brackets within translate function

J

johkar

To me, the inner translate function below would find any instance of
square brackets with a single space separating them and replace it
with a single underscore but that doesn't seem to be what is
happening. Do the square brackets have some sort of
significance...like a regular expression? I didn't think so since
they were quoted. I think the original developer's intent was to
replace spaces with underscores so that the element name syntax would
be valid XML.

However if the XML contains:

<rate>FCTR[ ]2</rate> it transforms into <FCTR2>0.04200</FCTR2>
without any underscore at all.

<xsl:variable name="tag" select="translate(translate(rateCode/text
(),'[ ]','_'),'.','__')"/>
<xsl:element name="{$tag}">
<xsl:value-of select="rateTag/text()"/>
</xsl:element>

Any insight into this would be appreciated.
 
J

johkar

johkar said:
To me, the inner translate function below would find any instance of
square brackets with a single space separating them and replace it
with a single underscore but that doesn't seem to be what is
happening.  Do the square brackets have some sort of
significance...like a regular expression?  I didn't think so since
they were quoted.  I think the original developer's intent was to
replace spaces with underscores so that the element name syntax would
be valid XML.

The second argument to translate() is interpreted as a set of
characters, not as an expression.  The first argument is scanned
character by character, and each character is tested to see if it
appears anywhere in the second argument.  If it does not appear in the
second argument, it's copied to the output string without change.  If
it does appear in the second argument at position N, then it's
replaced in the output by the character at position N of the third
argument.  If the third argument is less than N characters long,
the character is omitted from the output.

This behavior will be familiar to some people from analogous
functions and operations in Snobol, Spitbol, Rexx, and
IBM 360 assembler, as well as (I think) some other languages.
However if the XML contains:
<rate>FCTR[   ]2</rate>   it transforms into  <FCTR2>0.04200</FCTR2>
without any underscore at all.
<xsl:variable name="tag" select="translate(translate(rateCode/text
(),'[ ]','_'),'.','__')"/>
<xsl:element name="{$tag}">
   <xsl:value-of select="rateTag/text()"/>
</xsl:element>
Any insight into this would be appreciated.

The inner call, to

    translate(rateCode/text(),'[ ]','_')

should translate left square bracket to underscore and omit any
blanks or right square brackets.  In your example, the element is
named rate, not rateCode, which means we cannot tell what the
first argument of the inner call to translate() is.  If the
XML actually contains <rateCode>FCTR[   ]2</rateCode>, then I
would expect the call to produce "FCTR_2".  The outer call is then

    translate("FCTR_2",'.','__')

and since no full stops appear in the first argument, the result
would be FCTR_2.  This is what you say you expected, but the
reasoning is rather different.

If there is a rateCode element in the vicinity, then the result
will depend on its content.

Are you sure you transcribed both the input and the XSLT
correctly here?

hth

--
****************************************************************
* C. M. Sperberg-McQueen, Black Mesa Technologies LLC
*http://www.blackmesatech.com
*http://cmsmcq.com/mib
*http://balisage.net
****************************************************************

Thanks for the reply, I am still not quite getting why the rateCode is
getting transformed into FCTR2 without the underscore. You stated
that rateCode "should translate left square bracket to underscore and
omit any blanks or right square brackets"...why would you expect
that...just trying to understand.

<rateCode>FCTR[ ]2</rateCode>
<rateTag>0.04200</rateTag>

transforms into

<FCTR2>0.04200</FCTR2> without any underscore at all.


<xsl:variable name="tag" select="translate(translate(rateCode/text
(),'[ ]','_'),'.','__')"/>
<xsl:element name="{$tag}">
<xsl:value-of select="rateTag/text()"/>
</xsl:element>
 
J

johkar

The second argument to translate() is interpreted as a set of
characters, not as an expression.  The first argument is scanned
character by character, and each character is tested to see if it
appears anywhere in the second argument.  If it does not appear in the
second argument, it's copied to the output string without change.  If
it does appear in the second argument at position N, then it's
replaced in the output by the character at position N of the third
argument.  If the third argument is less than N characters long,
the character is omitted from the output.
This behavior will be familiar to some people from analogous
functions and operations in Snobol, Spitbol, Rexx, and
IBM 360 assembler, as well as (I think) some other languages.
However if the XML contains:
<rate>FCTR[   ]2</rate>   it transforms into  <FCTR2>0.04200</FCTR2>
without any underscore at all.
<xsl:variable name="tag" select="translate(translate(rateCode/text
(),'[ ]','_'),'.','__')"/>
<xsl:element name="{$tag}">
   <xsl:value-of select="rateTag/text()"/>
</xsl:element>
Any insight into this would be appreciated.
The inner call, to
    translate(rateCode/text(),'[ ]','_')
should translate left square bracket to underscore and omit any
blanks or right square brackets.  In your example, the element is
named rate, not rateCode, which means we cannot tell what the
first argument of the inner call to translate() is.  If the
XML actually contains <rateCode>FCTR[   ]2</rateCode>, then I
would expect the call to produce "FCTR_2".  The outer call is then
    translate("FCTR_2",'.','__')
and since no full stops appear in the first argument, the result
would be FCTR_2.  This is what you say you expected, but the
reasoning is rather different.
If there is a rateCode element in the vicinity, then the result
will depend on its content.
Are you sure you transcribed both the input and the XSLT
correctly here?

--
****************************************************************
* C. M. Sperberg-McQueen, Black Mesa Technologies LLC
*http://www.blackmesatech.com
*http://cmsmcq.com/mib
*http://balisage.net
****************************************************************

Thanks for the reply, I am still not quite getting why the rateCode is
getting transformed into FCTR2 without the underscore.  You stated
that rateCode "should translate left square bracket to underscore and
omit any blanks or right square brackets"...why would you expect
that...just trying to understand.

<rateCode>FCTR[   ]2</rateCode>
<rateTag>0.04200</rateTag>

transforms into

<FCTR2>0.04200</FCTR2> without any underscore at all.

<xsl:variable name="tag" select="translate(translate(rateCode/text
(),'[ ]','_'),'.','__')"/>
<xsl:element name="{$tag}">
   <xsl:value-of select="rateTag/text()"/>
</xsl:element>- Hide quoted text -

- Show quoted text -

Also, if rateCode has spaces in it, it is converted to:

<FCTR2>0.04200</FCTR2>
 
J

johkar

johkar said:
Thanks for the reply, I am still not quite getting why the rateCode is
getting transformed into FCTR2 without the underscore.  You stated
that rateCode "should translate left square bracket to underscore and
omit any blanks or right square brackets"...why would you expect
that...just trying to understand.

As I wrote in my previous note,

    The first argument is scanned character by character, and each
    character is tested to see if it appears anywhere in the second
    argument.  If it does not appear in the second argument, it's
    copied to the output string without change.  If it does appear in
    the second argument at position N, then it's replaced in the
    output by the character at position N of the third argument.  If
    the third argument is less than N characters long, the character
    is omitted from the output.

That was my attempt to explain why I would expect that.  If it
didn't help, let me try again.

The inner call to translate() has three arguments:

  (1) the string "FCTR[   ]2"
  (2) the string "[ ]"
  (3) the string "_"

The function constructs an output string by walking through the input
string character by character and possibly adding something to the
output string.

Character 1:  "F".

  Look for an "F" in argument 2.  Find none.
  Place "F" in the output string, which is now "F".

Character 2:  "C".

  Look for a "C" in argument 2.  Find none.
  Place "C" in the output string, which is now "FC".

Character 3:  "T".

  Look for a "T" in argument 2.  Find none.
  Place "T" in the output string, which is now "FCT".

Character 4:  "R".

  Look for an "R" in argument 2.  Find none.
  Place "R" in the output string, which is now "FCTR".

Character 5:  "[".

  Look for an occurrence of "[" in argument 2.  Find one
  at position 1.  Look up the character in position 1
  of argument 3; find "_".  Add that character ("_")
  to the output string, which is now "FCTR_".

Character 6:  " ".

  Look for an occurrence of " " in argument 2.  Find one
  at position 2.  Look up the character in position 2
  of argument 3; discover that there isn't one (argument
  3 is one character long).  So add nothing to the
  output string; it remains "FCTR_".

Character 7:  " ".

  Look for an occurrence of " " in argument 2.  Find one
  at position 2.  Look up the character in position 2
  of argument 3; discover that there isn't one (argument
  3 is one character long).  So add nothing to the
  output string; it remains "FCTR_".

Character 8:  " ".

  Look for an occurrence of " " in argument 2.  Find one
  at position 2.  Look up the character in position 2
  of argument 3; discover that there isn't one (argument
  3 is one character long).  So add nothing to the
  output string; it remains "FCTR_".

Character 9:  "]".

  Look for an occurrence of "]" in argument 2.  Find one
  at position 3.  Look up the character in position 3
  of argument 3; discover that there isn't one (argument
  3 is one character long).  So add nothing to the
  output string; it remains "FCTR_".

Character 10:  "2".

  Look for an occurrence of "2" in argument 2.  Find none.
  So add "2" to the output string, which is now "FCTR_2".

End of argument 1: return the output string, now "FCTR_2".

The details for characters 5 through 9 should make clear that what
translate() may be expected to do with second and third arguments of
"[ ]" and "_" is to replace "[" with "_" and delete " " and "]".
<rateCode>FCTR[   ]2</rateCode>
<rateTag>0.04200</rateTag>
transforms into
<FCTR2>0.04200</FCTR2> without any underscore at all.
<xsl:variable name="tag" select="translate(translate(rateCode/text
(),'[ ]','_'),'.','__')"/>
<xsl:element name="{$tag}">
   <xsl:value-of select="rateTag/text()"/>
</xsl:element>

Interesting.  When I cut and paste your code fragment into a
stylesheet and run it on your input, both xsltproc and Saxon give me

 <FCTR_2>0.04200</FCTR_2>

not

 <FCTR2>0.04200</FCTR2>

Similar tests show that the XSLT processors in Safari, Opera,
and Firefox all produce "FCTR_2" not "FCTR2" from the input you
describe.

Two questions:  

(1) Are you sure that the code you quote is actually the code that is
producing the output you quote?  Try replacing

  <xsl:variable name="tag"
   select="translate(translate(rateCode/text(),'[ ]','_'),'.','__')"/>

with

  <xsl:variable name="tag" select="hi_mom"/>

to see if your output is still <FCTR2>0.04200</FCTR2> or changes to
<hi_mom>0.04200</hi_mom>.

(2) If your output does change, indicating that the code you
quote really is the code doing the work, then I become curious:
What XSLT processor are you using?

HTH

Michael Sperberg-McQueen

--
****************************************************************
* C. M. Sperberg-McQueen, Black Mesa Technologies LLC
*http://www.blackmesatech.com
*http://cmsmcq.com/mib
*http://balisage.net
****************************************************************

Ok, a light bulb has finally gone off thanks to your detailed
explanation. I appreciate the effort in your reply. I was confused a
bit regarding which of my tests produced the FCTR2 without the
underscore. XMLSpy's default processor, Microsoft's MSXML and Xalan
all produce FCTR_2 using the example given. I was doing some testing
with multiple spaces using the same argument 2 and it produced FCTR2.
Sorry for the confusion, but I understand the how and why now. Thanks.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,994
Messages
2,570,223
Members
46,812
Latest member
GracielaWa

Latest Threads

Top