Converting Case - Umlauts?

jose.jeria · Oct 26, 2005

I use the following to convert uppercase to lowercase:

translate($queryString, 'ABCDE...', 'abcde...')

But how can i convert the case for umlauts? öåä etc

Martin Honnen · Oct 26, 2005

I use the following to convert uppercase to lowercase:

translate($queryString, 'ABCDE...', 'abcde...')

But how can i convert the case for umlauts? öåä etc

Pretty much the same, each character in the second argument to translate
is replaced by the character at the same index in the third argument so
you simply need to make sure you have all characters you care about in
upper case as the second argument and the same characters in the same
order as the third argument e.g. global variables

<xsl:variable
name="iso88591UpperCaseLetters"
select="ABCDEFGHIJKLMNOPQRSTUVWXYZÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖ×ØÙÚÛÜÝ" />
<xsl:variable
name="iso88591LowerCaseLetters"
select="abcdefghijklmnopqrstuvwxyzàáâãäåæçèéêëìíîïðñòóôõö×øùúûüý" />

then use e.g.

translate($queryString, $iso88591UpperCaseLetters,
$iso88591LowerCaseLetters)

jose.jeria · Oct 26, 2005

This doesnt work, I am using UTF-8.

http://www.jeria.net/XSLT/

type in "ägy" and press submit, you will get a "ablotron error on line
11: XML parser error 4: not well-formed (invalid token)" error.

Xml and xslt files can be found here
http://www.jeria.net/XSLT/xml/

jose.jeria · Oct 26, 2005

Oh, sorry, it now works, changed to ISO-8859-1

Thanks

Martin Honnen · Oct 26, 2005

Martin Honnen wrote:

<xsl:variable
name="iso88591UpperCaseLetters"
select="ABCDEFGHIJKLMNOPQRSTUVWXYZÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖ×ØÙÚÛÜÝ" />
<xsl:variable
name="iso88591LowerCaseLetters"
select="abcdefghijklmnopqrstuvwxyzàáâãäåæçèéêëìíîïðñòóôõö×øùúûüý" />

Should be

<xsl:variable
name="iso88591UpperCaseLetters"
select="'ABCDEFGHIJKLMNOPQRSTUVWXYZÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖ×ØÙÚÛÜÝ'" />
<xsl:variable
name="iso88591LowerCaseLetters"
select="'abcdefghijklmnopqrstuvwxyzàáâãäåæçèéêëìíîïðñòóôõö×øùúûüý'" />

of course.

Andreas Prilop · Oct 26, 2005

name="iso88591UpperCaseLetters"
select="'ABCDEFGHIJKLMNOPQRSTUVWXYZÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖ×ØÙÚÛÜÝ'" /> ^
name="iso88591LowerCaseLetters"
select="'abcdefghijklmnopqrstuvwxyzàáâãäåæçèéêëìíîïðñòóôõö×øùúûüý'" />

^

The multiplication sign (×) isn't exactly a letter.
However, "sharp s" and "y with diaeresis" are.

Alan J. Flavell · Oct 26, 2005

The multiplication sign (×) isn't exactly a letter.
Granted...

However, "sharp s" and "y with diaeresis" are.

What you going to do with them then, in an iso-8859-1 context? ;-)

Andreas Prilop · Oct 27, 2005

What you going to do with them then, in an iso-8859-1 context? ;-)

When converting from lower-case to upper-case, "ß" becomes "SS".
"ÿ" might become "Y" without accents in ISO-8859-1.

But this leads me to a more interesting ... err ... case:

In Greek, there are no accents when a word is written in capitals.
For example (I use romanization here):
"Ellás" has an accent on "alpha", whereas
"ELLAS" has no accent on "Alpha".
Therefore "Alpha" might be considered as an upper-case form
of "alpha with tonos".

Even the proper name "Álan" converts to "ALAN" in caps.
Therefore "Alpha" might be considered as an upper-case form
of "Alpha with tonos". Strange? Yes.

I cannot find anything about this in
http://www.unicode.org/Public/UNIDATA/CaseFolding.txt

jose.jeria · Nov 1, 2005

Would it be possible solving this issue using UTF-8? When using UTF-8
these charachters apperas as question marks.

Martin Honnen · Nov 1, 2005

Andreas said:
^

The multiplication sign (×) isn't exactly a letter.

Right, I was simply to lazy to copy anything by hand from a list of
defined letters and generated those strings programmatically from
character codes. For the XPath use with the translate function it does
not matter semantically as long as the second and the third argument
have the same length and that sign × is at the same position in both
arguments, then no conversion/translation happens.

However, "sharp s" and "y with diaeresis" are.

But using XPath 1.0 translate it is only possible to translate one
character into another but not one into a sequence of others so for ß to
SS translatation the suggested approach with translate is not going to work.

I guess I just need to be more careful to name my variables and not have
them reference a standard when the variable use is not quite up to the
standard

.

Shmuel (Seymour J.) Metz · Nov 1, 2005

In <[email protected]>, on
11/01/2005
at 04:24 AM, (e-mail address removed) said:

Would it be possible solving this issue using UTF-8? When using UTF-8
these charachters apperas as question marks.

Are you sure that you are using the correct octets for UTF-8? If each
character only takes one octet then you're probably storing the data
as ISO-8859-1 or -15, e.g.,

a" ä E4
e" ë EB
i" ï EF
o" ö F6
u" ü FC
A" Ä C4
E" Ë CB
I" Ï CF
O" Ö D6
U" Ü DC

--
Shmuel (Seymour J.) Metz, SysProg and JOAT
<http://patriot.net/~shmuel>

Unsolicited bulk E-mail subject to legal action. I reserve the right
to publicly post or ridicule any abusive E-mail. Reply to domain
Patriot dot net user shmuel+news to contact me. Do not reply to
(e-mail address removed)

Help with statement Select Case in BASIC	7	Apr 19, 2022
umlauts	9	Oct 17, 2009
Converting an Array to a String in JavaScript	7	Sep 22, 2023
Enable DarkMode not working properly	1	Mar 9, 2023
Help me	2	Aug 2, 2022
pyodbc utf-8	6	Dec 7, 2012
Converting several Markdown files into DOCX with pandoc	4	Feb 1, 2023
Test case	1	May 10, 2023

Converting Case - Umlauts?

jose.jeria

Martin Honnen

jose.jeria

jose.jeria

Martin Honnen

Andreas Prilop

Alan J. Flavell

Andreas Prilop

jose.jeria

Martin Honnen

Shmuel (Seymour J.) Metz

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads