Any forseeable disasters?

Jack Klein · Aug 8, 2004

Yes. wchar_t is a built in type, so the above looks for trouble. Not to
mention that it is not needed in the first place since in most systems
wchar_t is enough sufficient to store Unicode characters. After all, it
was wide character sets it was created for.

Unicode was originally a 16-bit encoding, and quite a few
implementations provide a 16-bit wchar_t. This is most likely the
reason that Java's type 'char' was defined as 16 bits. But Unicode
has grown to more than 64K defined values, and can no longer fit into
individual 16-bit types without state dependent encoding.

Is there some reason why you suddenly feel the need to add so much
superfluous white space between the end of your text and your
signature line? Why don't you just learn to use a proper signature
delimiter, as specified by the appropriate RFCs? It is not hard at
all, I have been doing it for many years.

A proper signature line consists of the four character sequence:

'-', '-', ' ', '\n'

JKop · Aug 8, 2004

David Hilsee posted:

But why? Do you just like being different from every other C++
programmer on the planet?

No, I like to be reminded that it's signed. Maybe my brain's wired a bit
weird, but when I don't see "signed", I tend to think that it's unsigned. I
would have assumed that that was the default, but ofcourse it's not.

-JKop

JKop · Aug 8, 2004

Jack Klein posted:

No, minimum range of wchar_t must be the same as the minimum range of
char. And that must be either -127 to 127, or 0 to 255. There is no
integer type in C++ which may have a range of only 0 to 127.

If you're intelligent enough to post that, you should be able to draw the
conclusion from it that I did.

It's implementation-defined whether or not a char is unsigned or signed.
Looking at the differences:

signed char : -127 to 127

unsigned char : 0 to 225

They overlap at 0 to 127. Concordantly:

signed main()
{
char = -5; //Implementation-defined

char = 130; //Implementation-defined

char = 0; //No problem

char = 127; //No problem

char = 128; //Implementation-defined

char = -1; //Implementation-defined
}

Therefore, the minimum range for char is 0 to 127. As wchar_t may be based
upon *any* of the integral types, it may be based on char, and as such its
minimum range is 0 to 127.

-JKop

David Hilsee · Aug 8, 2004

JKop said:
David Hilsee posted:

No, I like to be reminded that it's signed. Maybe my brain's wired a bit
weird, but when I don't see "signed", I tend to think that it's unsigned. I
would have assumed that that was the default, but ofcourse it's not.

Your brain will probably get re-wired over time. In your example code in
another thread, you wrote "int" instead of "signed", so I bet the
assimilati... er, re-wiring has already begun.

JKop · Aug 8, 2004

David Hilsee posted:

Your brain will probably get re-wired over time. In your example code
in another thread, you wrote "int" instead of "signed", so I bet the
assimilati... er, re-wiring has already begun.

It just seems to me that positive numbers are much more the norm. Negative
numbers are "more special". Think about it, even in school, I didn't learn
about negative numbers until I was about 10 or 11. So from that, positive
numbers come first, then negative numbers. I would've made int unsigned, and
if you wanted a signed integer, then: signed int.

Anyway, looks like resistence is futile! :-D

-JKop

Ioannis Vranos · Aug 8, 2004

JKop said:
If you're intelligent enough to post that, you should be able to draw the
conclusion from it that I did.

Actually the range of char is either that of signed char either that of
unsigned char. And there is numeric_limits<char> to know which one is
currently implemented.

So Jack is right.

It's implementation-defined whether or not a char is unsigned or signed.
Looking at the differences:

signed char : -127 to 127

unsigned char : 0 to 225

They overlap at 0 to 127. Concordantly:

Consequently forget the rest and use either signed char or unsigned char
explicitly if you want to be range specific or numeric_limits to take
run-time decisions.

Also whcar_t is not based on any integral type, it is a built in type of
its own. So in theory it can be of value range different than the rest
types.

Also "The size of wchar_t is implementation-defined and *large enough*
to hold the *largest character set* supported by the implementation’s
locale" as mentioned in TC++PL, guarantees that you will never have
problems storing Unicode or any other wide character set supported by a
system.

The rest is non-sense.

Regards,

Ioannis Vranos

http://www23.brinkster.com/noicys

Ioannis Vranos · Aug 8, 2004

Jack said:
Unicode was originally a 16-bit encoding, and quite a few
implementations provide a 16-bit wchar_t. This is most likely the
reason that Java's type 'char' was defined as 16 bits. But Unicode
has grown to more than 64K defined values, and can no longer fit into
individual 16-bit types without state dependent encoding.

However in TC++PL is mentioned:

"The size of wchar_t is implementation-defined and *large enough* to
hold the *largest character set* supported by the implementation’s locale".

Isn't it valid?

Is there some reason why you suddenly feel the need to add so much
superfluous white space between the end of your text and your
signature line?

Yes, to occupy few more bytes in my messages and make you run out of
memory.

Why don't you just learn to use a proper signature
delimiter, as specified by the appropriate RFCs? It is not hard at
all, I have been doing it for many years.

A proper signature line consists of the four character sequence:

'-', '-', ' ', '\n'

What if I make it "--\t\n"?

Regards,

Ioannis Vranos

http://www23.brinkster.com/noicys

JKop · Aug 8, 2004

Actually the range of char is either that of signed char either that of
unsigned char. And there is numeric_limits<char> to know which one is
currently implemented.

So Jack is right.

I disagree.

The minimum range of a char is 0 to 127. By this I mean the following:

A) If you find a C++ compiler that cannot store the values from 0 to 127 in
a char, then you haven't got a C++ compiler.

B) If you find a C++ compiler that can store -3 in a char, then that's very
good, but the Standard provides no such assurance. If you find a C++
compiler that can store 130 in a char, then that's very good, but the
Standard provides no such assurance.

0 to 127 are the only values you can reliably store in a char when you're
writing portable code. As such 0 to 127 is the minimum range for a char.

Page 82 of the Standard:

3.9.1 Fundamental types

5 Type wchar_t is a distinct type whose values can represent distinct codes
for all members of the largest
extended character set specified among the supported locales (22.1.1). Type
wchar_t shall have the same
size, signedness, and alignment requirements (3.9) as one of the other
integral types, called its underlying
type.

My rationale:

A) char is an integral type

As such, wchar_t can possibly have the same size, signedness and alignment
requirements as char.

As such, the minimum range for wchar_t is 0 to 127.

As such, you cannot reliably store -1 in a wchar_t in portable code, nor can
you store 130 in a wchar_t in portable code.

As regards "the supported locales", the Standard gives no guarantee that
Unicode exists as a supported local. As such, you cannot reliably use a
wchar_t to store a Unicode character when writing portable code.

-JKop

Ron Natalie · Aug 8, 2004

Ioannis Vranos said:
Also whcar_t is not based on any integral type, it is a built in type of
its own. So in theory it can be of value range different than the rest
types.

Incorrect. While wchar_t is a distinct type (i.e., not a typedef so that
it can participate in overloading distinctly from integers), it has the same
representation of some integral type (called it's underlying type),.

See 3.9.1/5

Ioannis Vranos · Aug 8, 2004

JKop said:
As regards "the supported locales", the Standard gives no guarantee that
Unicode exists as a supported local. As such, you cannot reliably use a
wchar_t to store a Unicode character when writing portable code.

I can't understand what you mean in the above. How can you use a Unicode
in a system not supporting Unicodes?

Regards,

Ioannis Vranos

http://www23.brinkster.com/noicys

=?ISO-8859-1?Q?Tobias_G=FCntner?= · Aug 9, 2004

JKop said:
0 to 127 are the only values you can reliably store in a char when you're
writing portable code. As such 0 to 127 is the minimum range for a char.

Well...
CMIIW, but isn't it required that sizeof(char) <= sizeof(short) <=
sizeof(int) <= sizeof(long)?

This implies that the minimum range for _every_ integral numeric type is
0..127 (*). According to this, we don't really need distinct data types
at all: No integral type can be expected to store values outside the
range 0..127, so if we want to write /really/ portable code, we can
never use values outside this range anyway. It would even be impossible
to have a string that is longer than 127 characters... I think you'll
agree that programming under such restrictions is rather unpleasant
(especially if most numbers are >127).

All data types reflect what is supported (maybe "natural" is a better
word) on a machine. If you need to handle 32bit numbers on a 8bit
processor and the compiler does not support 32bit numbers or some
workarounds for that processor, you're out of luck.

IMHO there are no machine independent data types; just pick the data
type that is appropriate, i.e. char for strings, wchar_t for unicode
strings. Or simply make a typedef unsigned long my_wchar_t; if you
really need more than what your current platform offers.

What I'm trying to say: No matter what code you write, you always have
to know the platform that your code is supposed to run on. There is no
point in writing code that might compile or even run on a pocket
calculator if your program will later run only on high-end PCs.

(*) i.e. the overlapping range for signed/unsigned char. I don't know if
it's even smaller than that. What about a 4bit processor?

JKop · Aug 9, 2004

Tobias Güntner posted:

Well...
CMIIW, but isn't it required that sizeof(char) <= sizeof (short) <=
sizeof(int) <= sizeof(long)?

This implies that the minimum range for _every_ integral

numeric type. According to this, we don't really need

Char: Minimum 8-Bit
Short: Minimum 16-Bit
Int: Minimum 16-Bit
Long: Minimum 32-Bit

The Standard says some bullshit like "the same minimums
from the Standard C, refer to chapter BLAH of the C
Standard". Standard C specifies the above limits.

Bullshit I know, C++ and C are two separate languages.

-JKop

Ioannis Vranos · Aug 9, 2004

JKop said:
Char: Minimum 8-Bit
Short: Minimum 16-Bit
Int: Minimum 16-Bit
Long: Minimum 32-Bit

Where does the standard mention this?

The Standard says some bullshit like "the same minimums
from the Standard C, refer to chapter BLAH of the C
Standard". Standard C specifies the above limits.

C90 or C99? Because C++ retains C90 as a subset except from the parts
where things are defined otherwise.

Regards,

Ioannis Vranos

http://www23.brinkster.com/noicys

Old Wolf · Aug 10, 2004

Ioannis Vranos said:
Where does the standard mention this?

It says INT_MIN <= -32767 and INT_MAX >= 32767, ie. there are at
least 65535 distinct values for int, therefore at least 16
bits of storage are required. Similar reasoning applies to
the other types.

Integer Literal Question	0	Dec 28, 2012
Any integer number is always 32 bits	2	May 14, 2012
wchar_t is useless	18	Nov 21, 2011
Weird Behavior with Rays in C and OpenGL	4	Feb 13, 2024
Portable list of unsigned integer types	15	Oct 13, 2009
Qsort() is messing with my entire Code!!!	0	Apr 25, 2022
Portable Code that supports Unicode	13	Feb 28, 2006
Qsort() messing with my entire Code	0	Apr 25, 2022

Any forseeable disasters?

Jack Klein

JKop

JKop

David Hilsee

JKop

Ioannis Vranos

Ioannis Vranos

JKop

Ron Natalie

Ioannis Vranos

=?ISO-8859-1?Q?Tobias_G=FCntner?=

JKop

Ioannis Vranos

Old Wolf

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads