conversion of string to all lower case

R

Richard Herring

Ioannis Vranos said:

Yes, actually. German sz ligature is code point 223 in ISO8859-1,
better known as the Latin-1 character set, and was for example the
standard character set and encoding for HTML up to 3.2.
TC++PL says it well:

"A char variable is of the natural size to hold a character on a given
machine (typically a byte)".

And how many C++ implementations do you know of where char is less than
8 bits? ISO8859-1 has only 256 code points and can happily be
accommodated in 8-bit chars.

ISO/IEC 14882 says it better:
"Objects declared as characters (char) shall be large enough to store
any member of the implementation's basic character set."

.... and it's quite possible that that basic character set *is*
ISO8859-1.
"A type wchar_t is provided to hold characters of a larger character
set such as Unicode.

Yes. So what? ISO8859-1 is not Unicode, and wchar_t is not necessary to
hold it.
It is a distinct type. The size of wchar_t is implementation-defined
and large enough to hold the largest character set supported by the
implementation’s locale (see 21.7, C.3.3)."
To give an example, in Windows GUI applications, char is guaranteed to
work only for English characters, for any other language you should use
wchar_t.

Even if that were true (it isn't - there's an API for messing with
"code pages"), what does the Windows GUI have to do with standard C++?
 
R

Richard Herring

Ioannis Vranos said:
This discussion can't reach a reasonable conclusion. Just give more
thought on the subject.

There is a perfectly reasonable conclusion, which had already been
stated before you tried to contradict it. That is, that there is an
insoluble problem with applying character-by-character toupper() to some
alphabets, which is present whether you use char or wchar_t.
 
I

Ioannis Vranos

Richard said:
There is a perfectly reasonable conclusion, which had already been
stated before you tried to contradict it. That is, that there is an
insoluble problem with applying character-by-character toupper() to some
alphabets, which is present whether you use char or wchar_t.



OK I can accept this. In any case toupper(), tolower() of <cctype>, and
towupper(), towlower() of <cwctype>, are all guaranteed to work for
languages with one to one, lower-case to upper-case correspondence, that
fit in the supported characters, and is up to the programmer to take
this decision.


Even if Greek is provided in the extended ASCII character set, and char
implementation is unsigned in my platform, when I use Greek or some
language other than English, I am using wchar_t which is Unicode in my
system and fits it 100% (wchar_t is the largest character set supported
by any platform).


So for languages with one to one correspondence of lower-case to
upper-case characters, these facilities are guaranteed to work.


In any case, I am 100% certain the OP was talking about English anyway.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,183
Messages
2,570,968
Members
47,518
Latest member
TobiasAxf

Latest Threads

Top