conversion of string to all lower case

Richard Herring · Oct 28, 2004

Ioannis Vranos said:
Nope.

Yes, actually. German sz ligature is code point 223 in ISO8859-1,
better known as the Latin-1 character set, and was for example the
standard character set and encoding for HTML up to 3.2.

TC++PL says it well:

"A char variable is of the natural size to hold a character on a given
machine (typically a byte)".

And how many C++ implementations do you know of where char is less than
8 bits? ISO8859-1 has only 256 code points and can happily be
accommodated in 8-bit chars.

ISO/IEC 14882 says it better:
"Objects declared as characters (char) shall be large enough to store
any member of the implementation's basic character set."

.... and it's quite possible that that basic character set *is*
ISO8859-1.

"A type wchar_t is provided to hold characters of a larger character
set such as Unicode.

Yes. So what? ISO8859-1 is not Unicode, and wchar_t is not necessary to
hold it.

It is a distinct type. The size of wchar_t is implementation-defined
and large enough to hold the largest character set supported by the
implementationâ€™s locale (see 21.7, C.3.3)."

To give an example, in Windows GUI applications, char is guaranteed to
work only for English characters, for any other language you should use
wchar_t.

Even if that were true (it isn't - there's an API for messing with
"code pages"), what does the Windows GUI have to do with standard C++?

Richard Herring · Oct 28, 2004

Ioannis Vranos said:
This discussion can't reach a reasonable conclusion. Just give more
thought on the subject.

There is a perfectly reasonable conclusion, which had already been
stated before you tried to contradict it. That is, that there is an
insoluble problem with applying character-by-character toupper() to some
alphabets, which is present whether you use char or wchar_t.

Ioannis Vranos · Oct 28, 2004

Richard said:
There is a perfectly reasonable conclusion, which had already been
stated before you tried to contradict it. That is, that there is an
insoluble problem with applying character-by-character toupper() to some
alphabets, which is present whether you use char or wchar_t.

OK I can accept this. In any case toupper(), tolower() of <cctype>, and
towupper(), towlower() of <cwctype>, are all guaranteed to work for
languages with one to one, lower-case to upper-case correspondence, that
fit in the supported characters, and is up to the programmer to take
this decision.

Even if Greek is provided in the extended ASCII character set, and char
implementation is unsigned in my platform, when I use Greek or some
language other than English, I am using wchar_t which is Unicode in my
system and fits it 100% (wchar_t is the largest character set supported
by any platform).

So for languages with one to one correspondence of lower-case to
upper-case characters, these facilities are guaranteed to work.

In any case, I am 100% certain the OP was talking about English anyway.

Did you know that there is a match-case function in python?	4	Dec 17, 2023
Test case	1	May 10, 2023
How to get expertise in "cyber security" or from where to start for this?	0	Apr 20, 2024
Tasks	1	Nov 29, 2022
SQL Connection string regex pattern to parse sections	1	May 9, 2024
My Status, Ciphertext	2	Nov 28, 2023
help with upper and lower case conversion	9	Dec 17, 2006
Converting an Array to a String in JavaScript	7	Sep 22, 2023

conversion of string to all lower case

Richard Herring

Richard Herring

Ioannis Vranos

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads