Marking words in a text

H

Hvid Hat

Hello

How should I go about marking certain words in a text? I've got a list of
words:

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="Mark.xsl"?>
<Words>
<Word>
<Acronym>XML</Acronym>
<Description>eXtensible Markup Language</Description>
</Word>
<Word>
<Acronym>SGML</Acronym>
<Description>Standard Generalized Markup Language</Description>
</Word>
<Word>
<Acronym>ISO</Acronym>
<Description>International Organization for Standardization</Description>
</Word>
</Words>

I want the words (acronyms) above to be marked within bold-tags in the text
below:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:eek:utput method="xml"
version="1.0" encoding="UTF-8" indent="yes"/> <xsl:template match="Words">
XML is a simple, very flexible text format derived from SGML (ISO 8879)
</xsl:template>
</xsl:stylesheet>

Can someone help me on my way? :)
 
M

Martin Honnen

Hvid said:
I want the words (acronyms) above to be marked within bold-tags in the text
below:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:eek:utput method="xml"
version="1.0" encoding="UTF-8" indent="yes"/> <xsl:template match="Words">
XML is a simple, very flexible text format derived from SGML (ISO 8879)
</xsl:template>
</xsl:stylesheet>

That "text" is an XSLT stylesheet with output method="xml" so it is not
clear what you want to achieve? Do you want to take your acronym list
and transform it to HTML to be rendered in a browser?

That is possible with

<xsl:stylesheet
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="1.0">

<xsl:eek:utput method="html" indent="yes"/>

<xsl:template match="Words">
<html lang="en">
<head>
<title>List of Acronymns</title>
<style tyype="text/css">
dt { font-weight: bold; }
</style>
</head>
<body>
<dl>
<xsl:apply-templates select="Word"/>
</dl>
</body>
</html>
</xsl:template>

<xsl:template match="Word">
<dt>
<xsl:value-of select="Acronym"/>
</dt>
<dd>
<xsl:value-of select="Description"/>
</dd>
</xsl:template>

</xsl:stylesheet>
 
P

Peter Flynn

Hvid said:
Hello

How should I go about marking certain words in a text? I've got a list of
words:

If you mean you want to automate the application of markup to a
document, by matching each word against your list of acronyms, then it's
probably possible in XSLT (easier in XSLT2 than 1.0) but difficult when
you need to handle things like "in XML's model" where the "word" is not
delimited by spaces or markup boundaries. You'd have to use a recursive
template to isolate each word in turn and test it against your list,
which would be slow.

///Peter
 
J

Joseph J. Kesselman

Peter said:
You'd have to use a recursive
template to isolate each word in turn and test it against your list,
which would be slow.

Or have the stylesheet invoke an extension function written in a
language better suited to this task.

Personally, I think you should make this the author's responsibility.
Maybe use the (slow) find-words-and-tag-them as an authoring tool to
help them do so... but encourage them to use appropriate markup in the
first place rather than trying to reverse-engineer their text.
 
H

Hvid Hat

If you mean you want to automate the application of markup to a
document, by matching each word against your list of acronyms, then it's
probably possible in XSLT (easier in XSLT2 than 1.0) but difficult when
you need to handle things like "in XML's model" where the "word" is not
delimited by spaces or markup boundaries. You'd have to use a recursive
template to isolate each word in turn and test it against your list,
which would be slow.

I'm just playing around with XSLT to improve my skills so the performance is
not important. I'll give it a try but if anyone can help me on my way, I'd be
appreciated :)

What if I wanted to mark up relating words in some text? Say I wanted to mark
up countries consiting of more words, e.g. Faroe Islands, South Africa, New
Zealand etc. Then I couldn't isolate each word in the text and make a
comparision. Would I have to use a mix of contains, substring-before,
substring-after?
 
H

Hvid Hat

Peter Flynn wrote:
Or have the stylesheet invoke an extension function written in a
language better suited to this task.

I've written a few small extension functions in C#. I thought about writing
an extension function to solve the problem. Any ideas on how to approach the
problem. Create a comma-separated list of the words and pass the word list
and the text to an extension function and have the function mark up the words
and return the marked up text? Is it possible to access the XML containing
the words from the extension function so I could make a List<string> within
my extension function? Perhaps send the XML containing the words as a node
set or something. Does it make sense? :)
Personally, I think you should make this the author's responsibility.
Maybe use the (slow) find-words-and-tag-them as an authoring tool to
help them do so... but encourage them to use appropriate markup in the
first place rather than trying to reverse-engineer their text.

I agree. I would make it an authoring tool but currently I'm just playing
around with XSLT to improve my skills.
 
J

Joseph J. Kesselman

Hvid said:
What if I wanted to mark up relating words in some text?

This is a programming problem first, then an XSLT problem. Figure out
how you would solve it in any other programming language, so you have
the problem well-formed and well-understood. Then figure out how to
solve it nonprocedurally. Then implement that in XSLT... or decide not
to do so, if it really isn't a problem well-suited to XSLT (as this may
not be.)
 
P

Peter Flynn

Hvid said:
I'm just playing around with XSLT to improve my skills so the performance is
not important. I'll give it a try but if anyone can help me on my way, I'd be
appreciated :)

What if I wanted to mark up relating words in some text? Say I wanted to mark
up countries consiting of more words, e.g. Faroe Islands, South Africa, New
Zealand etc. Then I couldn't isolate each word in the text and make a
comparision. Would I have to use a mix of contains, substring-before,
substring-after?

No, you'd pay someone to open the document in an XML editor and do it by
hand.

Really. If you want to apply reliable content markup on names (people,
places, things), it's a *human* task.

///Peter
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,994
Messages
2,570,223
Members
46,812
Latest member
GracielaWa

Latest Threads

Top