is this possible with xlst?

D

dave_kajinsky

Hello all,

I would like to do something with xlst, perhaps I can get help from
this group? Here it goes:

I've got 2 files: an ascii file and a file that identifies tokens in
the ascii file.

For example the ascii file could be:

This is just a simple text.

And the token file identifies some tokens in the ascii file (but NOT
necessarily all of them), like that:

<tokens>
<token pos="6...7"/>
<token pos="14...14"/>
</tokens>

(note: always 3 dots between the numbers)

The first token referrs to "is" (6th and 7th chars in text), the
second token to "a" (14th char in text).

Is it possible to write an xlst style sheet that takes both files as
input and transforms them into:

<transformed_tokens>
<transformed_token text="is"/>
<transformed_token text="a"/>
</transformed_tokens>

Perhaps I'm pushing xlst too far and it's impossible? Anyway I prefer
to check by asking here.

-- dave
 
J

Joseph Kesselman

1) XSLT 1.0 expects XML files as input, and so won't operate directly on
an ASCII file. You might be able to kluge around that by referencing the
plaintext file as an External Parsed Entity from a front-end XML file,
but then the ASCII would have to obey all the rules for XML parsed
character content (in particular escaping < and > and & characters).

2) String processing in XSLT is possible, but that really isn't what
XSLT is set up for. You're likely to wind up having to write some
recursive logic for even fairly basic string-search-and-replace tasks,
since XSLT is nonprocedural and doesn't have simple character-scan
loops. (See the XSLT FAQ website for examples of how to do that kind of
task.)

XSLT 2.0 improves matters slightly, but still it isn't really the right
tool for this task ... at least, not in most cases. (I can see where you
might want to use XSLT to do this if you're trying to do it in an
environment specialized for XSLT; I might write a stylesheet if I was
doing this on a Datapower appliance, for example.)

But in general, I would suggest that you write a bit of simple code in
your preferred programming langauge, using an off-the-shelf XML
serializer for that language to ensure that the output syntax is
correct, and write your own string tokenization and data extraction routine.
 
D

David Carlisle

Is it possible to write an xlst style sheet that takes both files as
input and transforms them into:
text.txt:
This is just a simple text.


text.xml:
<!DOCTYPE text [<!ENTITY text SYSTEM "text.txt">]>
<text>&text;</text>


tokens.xml:
<tokens>
<token pos="6...7"/>
<token pos="14...14"/>
</tokens>


tokens.xsl:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

<xsl:strip-space elements="*"/>
<xsl:eek:utput indent="yes"/>

<xsl:variable name="text" select="string(document('text.xml'))"/>

<xsl:template match="tokens">
<transformed_tokens>
<xsl:apply-templates/>
</transformed_tokens>
</xsl:template>


<xsl:template match="token">
<xsl:variable name="a" select="number(substring-before(@pos,'...'))"/>
<xsl:variable name="b" select="number(substring-after(@pos,'...'))"/>
<transformed_tokens text="{substring($text,$a,1+ $b - $a)}"/>
</xsl:template>


</xsl:stylesheet>


produces:
$ saxon tokens.xml tokens.xsl
<?xml version="1.0" encoding="utf-8"?>
<transformed_tokens>
<transformed_tokens text="is"/>
<transformed_tokens text="a"/>
</transformed_tokens>



Although I'd agree with Joseph that if your actual requirements are
anything much mre complicated than this, XSLT 1 isn't really the
language of choice. (XSLT2 may be, or not depending...)
 
D

Dimitre Novatchev

XSLT 2.0 improves matters slightly, but still it isn't really the right
tool for this task ... at least, not in most cases. (I can see where you
might want to use XSLT to do this if you're trying to do it in an
environment specialized for XSLT; I might write a stylesheet if I was
doing this on a Datapower appliance, for example.)



This task is trivial to do in XSLT.



In fact, it has been shown that many much more complicated text processing
tasks are performed quite efficiently using FXSL 2.0 and XSLT 2.0. The
solutions typically are compact and easy to understand. Just a few examples:



Spelling checking Othello with speed of more than 500 words per second.

http://www.stylusstudio.com/xsllist/200504/post80170.html



Producing a concordance of the Bible (New Testament)

http://www.stylusstudio.com/xsllist/200511/post00560.html



Text justification.

http://www.thethameens.com/lists/xsl-list/archives/200504/msg01314.html

http://sourceware.org/ml/xsl-list/2001-12/msg00651.html



Finding anagrams.

http://dnovatchev.spaces.live.com/Blog/cns!44B0A32C2CCF7488!357.entry



Parser for LR(1) context-free languages.

http://dnovatchev.spaces.live.com/blog/cns!44B0A32C2CCF7488!367.entry





Cheers,

Dimitre Novatchev.







1) XSLT 1.0 expects XML files as input, and so won't operate directly on
an ASCII file. You might be able to kluge around that by referencing the
plaintext file as an External Parsed Entity from a front-end XML file,
but then the ASCII would have to obey all the rules for XML parsed
character content (in particular escaping < and > and & characters).

2) String processing in XSLT is possible, but that really isn't what
XSLT is set up for. You're likely to wind up having to write some
recursive logic for even fairly basic string-search-and-replace tasks,
since XSLT is nonprocedural and doesn't have simple character-scan
loops. (See the XSLT FAQ website for examples of how to do that kind of


XSLT 2.0 improves matters slightly, but still it isn't really the right
tool for this task ... at least, not in most cases. (I can see where you
might want to use XSLT to do this if you're trying to do it in an
environment specialized for XSLT; I might write a stylesheet if I was
doing this on a Datapower appliance, for example.)

But in general, I would suggest that you write a bit of simple code in
your preferred programming langauge, using an off-the-shelf XML
serializer for that language to ensure that the output syntax is
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,995
Messages
2,570,236
Members
46,822
Latest member
israfaceZa

Latest Threads

Top