Converting Case - Umlauts?

J

jose.jeria

I use the following to convert uppercase to lowercase:

translate($queryString, 'ABCDE...', 'abcde...')

But how can i convert the case for umlauts? öåä etc
 
M

Martin Honnen

I use the following to convert uppercase to lowercase:

translate($queryString, 'ABCDE...', 'abcde...')

But how can i convert the case for umlauts? öåä etc

Pretty much the same, each character in the second argument to translate
is replaced by the character at the same index in the third argument so
you simply need to make sure you have all characters you care about in
upper case as the second argument and the same characters in the same
order as the third argument e.g. global variables

<xsl:variable
name="iso88591UpperCaseLetters"
select="ABCDEFGHIJKLMNOPQRSTUVWXYZÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖ×ØÙÚÛÜÝ" />
<xsl:variable
name="iso88591LowerCaseLetters"
select="abcdefghijklmnopqrstuvwxyzàáâãäåæçèéêëìíîïðñòóôõö×øùúûüý" />

then use e.g.

translate($queryString, $iso88591UpperCaseLetters,
$iso88591LowerCaseLetters)
 
M

Martin Honnen

Martin Honnen wrote:

<xsl:variable
name="iso88591UpperCaseLetters"
select="ABCDEFGHIJKLMNOPQRSTUVWXYZÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖ×ØÙÚÛÜÝ" />
<xsl:variable
name="iso88591LowerCaseLetters"
select="abcdefghijklmnopqrstuvwxyzàáâãäåæçèéêëìíîïðñòóôõö×øùúûüý" />

Should be

<xsl:variable
name="iso88591UpperCaseLetters"
select="'ABCDEFGHIJKLMNOPQRSTUVWXYZÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖ×ØÙÚÛÜÝ'" />
<xsl:variable
name="iso88591LowerCaseLetters"
select="'abcdefghijklmnopqrstuvwxyzàáâãäåæçèéêëìíîïðñòóôõö×øùúûüý'" />

of course.
 
A

Andreas Prilop

name="iso88591UpperCaseLetters"
select="'ABCDEFGHIJKLMNOPQRSTUVWXYZÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖ×ØÙÚÛÜÝ'" /> ^
name="iso88591LowerCaseLetters"
select="'abcdefghijklmnopqrstuvwxyzàáâãäåæçèéêëìíîïðñòóôõö×øùúûüý'" />
^

The multiplication sign (×) isn't exactly a letter.
However, "sharp s" and "y with diaeresis" are.
 
A

Alan J. Flavell

The multiplication sign (×) isn't exactly a letter.
Granted...

However, "sharp s" and "y with diaeresis" are.

What you going to do with them then, in an iso-8859-1 context? ;-)
 
A

Andreas Prilop

What you going to do with them then, in an iso-8859-1 context? ;-)

When converting from lower-case to upper-case, "ß" becomes "SS".
"ÿ" might become "Y" without accents in ISO-8859-1.

But this leads me to a more interesting ... err ... case:

In Greek, there are no accents when a word is written in capitals.
For example (I use romanization here):
"Ellás" has an accent on "alpha", whereas
"ELLAS" has no accent on "Alpha".
Therefore "Alpha" might be considered as an upper-case form
of "alpha with tonos".

Even the proper name "Álan" converts to "ALAN" in caps.
Therefore "Alpha" might be considered as an upper-case form
of "Alpha with tonos". Strange? Yes.

I cannot find anything about this in
http://www.unicode.org/Public/UNIDATA/CaseFolding.txt
 
J

jose.jeria

Would it be possible solving this issue using UTF-8? When using UTF-8
these charachters apperas as question marks.
 
M

Martin Honnen

Andreas said:
^

The multiplication sign (×) isn't exactly a letter.

Right, I was simply to lazy to copy anything by hand from a list of
defined letters and generated those strings programmatically from
character codes. For the XPath use with the translate function it does
not matter semantically as long as the second and the third argument
have the same length and that sign × is at the same position in both
arguments, then no conversion/translation happens.
However, "sharp s" and "y with diaeresis" are.

But using XPath 1.0 translate it is only possible to translate one
character into another but not one into a sequence of others so for ß to
SS translatation the suggested approach with translate is not going to work.

I guess I just need to be more careful to name my variables and not have
them reference a standard when the variable use is not quite up to the
standard :).
 
S

Shmuel (Seymour J.) Metz

In <[email protected]>, on
11/01/2005
at 04:24 AM, (e-mail address removed) said:
Would it be possible solving this issue using UTF-8? When using UTF-8
these charachters apperas as question marks.

Are you sure that you are using the correct octets for UTF-8? If each
character only takes one octet then you're probably storing the data
as ISO-8859-1 or -15, e.g.,

a" ä E4
e" ë EB
i" ï EF
o" ö F6
u" ü FC
A" Ä C4
E" Ë CB
I" Ï CF
O" Ö D6
U" Ü DC

--
Shmuel (Seymour J.) Metz, SysProg and JOAT
<http://patriot.net/~shmuel>

Unsolicited bulk E-mail subject to legal action. I reserve the right
to publicly post or ridicule any abusive E-mail. Reply to domain
Patriot dot net user shmuel+news to contact me. Do not reply to
(e-mail address removed)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads


Members online

No members online now.

Forum statistics

Threads
474,001
Messages
2,570,254
Members
46,850
Latest member
VMRKlaus8

Latest Threads

Top