Any forseeable disasters?

J

JKop

Let's say you want to store a character of the Unicode
character system. You want a 32-Bit unsigned integer for
this, but wchar_t isn't guaranteed to be 32-Bit.

Is there any forseeable disasters to putting this at the
beginning of your translation unit:

#define wchar_t unsigned long

The only one I can think of is function overloading:

void Blah(wchar_t) {}
void Blah(unsigned long) {}

-JKop
 
M

Mike Wahler

JKop said:
Let's say you want to store a character of the Unicode
character system. You want a 32-Bit unsigned integer for
this,

Unicode uses sixteen bits. So since a byte must be at least
eight bits wide, any two byte sequence is large enough (and
might be larger) to represent any Unicode character.
but wchar_t isn't guaranteed to be 32-Bit.
Right.


Is there any forseeable disasters to putting this at the
beginning of your translation unit:

#define wchar_t unsigned long

Yes. You're not allowed to #define a keyword.

-Mike
 
G

Gianni Mariani

JKop said:
Let's say you want to store a character of the Unicode
character system. You want a 32-Bit unsigned integer for
this, but wchar_t isn't guaranteed to be 32-Bit.

Is there any forseeable disasters to putting this at the
beginning of your translation unit:

#define wchar_t unsigned long

The only one I can think of is function overloading:

void Blah(wchar_t) {}
void Blah(unsigned long) {}

-JKop

Unicode has the following encodings

utf-7 - multibyte but no bytes have values > 127
utf-8 - multibyte (1-6 bytes per char)
utf-16 - 16 bit "code" - multi-value code points - see "surrogate pairs"
- utf-16 coves 2^20 + 2^16 code points
ucs-4 - 32 bit codes

On most platforms where sizeof(wchar_t)==4, the wchar_t encoding is
ucs-4 while cases where sizeof(wchar_t)==2, the encoding is utf-16.

It's just so much easier to deal with utf-8 for other reasons as well.

Endianness of utf-16 and ucs-4 mean that the encoding is stateful which
makes for all kinds of issues when reading and writing to files.

Consider using utf-8. It might mean that you don't need to do anything!

G
 
S

Serge Paccalin

Le samedi 7 août 2004 à 22:25:55, Mike Wahler a écrit dans
comp.lang.c++ :
Unicode uses sixteen bits. So since a byte must be at least
eight bits wide, any two byte sequence is large enough (and
might be larger) to represent any Unicode character.

Wrong. There are about 95,000 characters in Unicode today. How do you
fit all of them in 16 bits?

Yes. You're not allowed to #define a keyword.

Wrong again. Never seen the following?

#define for if (0) {} else for

--
___________ 2004-08-07 23:32:29
_/ _ \_`_`_`_) Serge PACCALIN -- sp ad mailclub.net
\ \_L_) Il faut donc que les hommes commencent
-'(__) par n'être pas fanatiques pour mériter
_/___(_) la tolérance. -- Voltaire, 1763
 
P

Paul Mensonides

Wrong again. Never seen the following?

#define for if (0) {} else for

You can legally define a keyword, but you can't legally define a keyword *and*
include any standard headers (17.4.3.1.1/2 - "A translation unit that includes a
header shall not contain any macros that define names declared or defined in
that header. Nor shall such a translation unit define macros for names
lexically identical to keywords.").

Regards,
Paul Mensonides
 
J

JKop

Mike Wahler posted:
for this,

Unicode uses sixteen bits. So since a byte must be at least
eight bits wide, any two byte sequence is large enough (and
might be larger) to represent any Unicode character.


Minimum bitness of wchar_t = 8 bits.

Minimum range of wchar_t = 0 to 127.


-JKop
 
I

Ioannis Vranos

JKop said:
Let's say you want to store a character of the Unicode
character system. You want a 32-Bit unsigned integer for
this, but wchar_t isn't guaranteed to be 32-Bit.

Is there any forseeable disasters to putting this at the
beginning of your translation unit:

#define wchar_t unsigned long



Yes. wchar_t is a built in type, so the above looks for trouble. Not to
mention that it is not needed in the first place since in most systems
wchar_t is enough sufficient to store Unicode characters. After all, it
was wide character sets it was created for.






Regards,

Ioannis Vranos

http://www23.brinkster.com/noicys
 
I

Ioannis Vranos

JKop said:
Minimum bitness of wchar_t = 8 bits.

Minimum range of wchar_t = 0 to 127.


Actually it is sizeof(char) <= sizeof(wchar_t) <= sizeof(long)



From TC++PL 3 on page 72:


"A type wchar_ t is provided to hold characters of a larger character
set such as Unicode. It is a distinct type. The size of wchar_ t is
implementation-defined and large enough to hold the largest character
set supported by the implementation’s locale (see §21.7, §C.3.3). The
strange name is a leftover from C. In C, wchar_ t is a typedef (§4.9.7)
rather than a built-in type. The suffix _t was added to distinguish
standard typedefs."






Regards,

Ioannis Vranos

http://www23.brinkster.com/noicys
 
I

Ioannis Vranos

JKop said:
Minimum bitness of wchar_t = 8 bits.

Minimum range of wchar_t = 0 to 127.



Fixed some spaces and added some asterisks:




Actually it is sizeof(char) <= sizeof(wchar_t) <= sizeof(long)



From TC++PL 3 on page 72:


"A type wchar_t is provided to hold characters of a larger character set
such as Unicode. It is a distinct type. The size of wchar_t is
implementation-defined and *large enough* to hold the *largest character
set* supported by the implementation’s locale (see §21.7, §C.3.3). The
strange name is a leftover from C. In C, wchar_t is a typedef (§4.9.7)
rather than a built-in type. The suffix _t was added to distinguish
standard typedefs."






Regards,

Ioannis Vranos

http://www23.brinkster.com/noicys
 
J

JKop

The smallest integral type with the smallest range is:

char

8-bit
0 to 127

The type wchar_t maps to one of the other integral types. It can map to
char. As such:

wchar_t
8-Bit
0 to 127


signed main()
{
wchar_t = 127;

++wchar_t;

//The above statment is implementation-defined.
//It may cause undefined behaviour.
}


-JKop
 
I

Ioannis Vranos

JKop said:
The smallest integral type with the smallest range is:

char

8-bit
0 to 127


Yes but please pay attention to this particular sentence:

"The size of wchar_t is implementation-defined and *large enough* to
hold the *largest character set* supported by the implementation’s locale".


The type wchar_t maps to one of the other integral types. It can map to
char. As such:

wchar_t
8-Bit
0 to 127


signed main()



Strictly speaking I am not sure that the above can be considered
well-defined and portable, since main() is a special function.






Regards,

Ioannis Vranos

http://www23.brinkster.com/noicys
 
R

Ron Natalie

Wrong again. Never seen the following?

#define for if (0) {} else for
More correctly stated, it is undefined behavior to redefine a keyword
if you use any of the standard headers.
 
J

JKop

Ioannis Vranos posted:
Strictly speaking I am not sure that the above can be considered
well-defined and portable, since main() is a special function.

int main() {}

signed int main() {}

signed main() {}

int main(void) {}

signed int main(void) {}

signed main(void) {}


The above 6 are identical.


-JKop
 
I

Ioannis Vranos

JKop said:
int main() {}

signed int main() {}

signed main() {}

int main(void) {}

signed int main(void) {}

signed main(void) {}


The above 6 are identical.


Actually, int, signed int, signed are the same type. However since
main() is *not* a normal function and in the standard only the int one
is mentioned, I do not know for sure that they are the same. However
they may be, perhaps someone else can shed more light on this.






Regards,

Ioannis Vranos

http://www23.brinkster.com/noicys
 
R

Ron Natalie

Ioannis Vranos said:
main() is *not* a normal function and in the standard only the int one
is mentioned, I do not know for sure that they are the same. However
they may be, perhaps someone else can shed more light on this.

The standard says that main must return int (all of the alternatives listed
meet this requirement) and otherwise as long as the two specific signatures
provided in the standard are supported, it is implementation defined.

That is, the standard says what the return type must be, NOT what the
declaration must look like.
 
I

Ioannis Vranos

Ron said:
The standard says that main must return int (all of the alternatives listed
meet this requirement) and otherwise as long as the two specific signatures
provided in the standard are supported, it is implementation defined.

That is, the standard says what the return type must be, NOT what the
declaration must look like.



Ok, then signed main() is valid. :)






Regards,

Ioannis Vranos

http://www23.brinkster.com/noicys
 
D

David Hilsee

JKop said:
Ioannis Vranos posted:


int main() {}

signed int main() {}

signed main() {}

int main(void) {}

signed int main(void) {}

signed main(void) {}


The above 6 are identical.

Well, except when considering the number of keystrokes required to type each
of them and the probability that using each of them will causing confusion.
They all may do the same thing, but int main() or int main(void) look like
the best choices when you take that information into consideration.
 
J

JKop

Well, except when considering the number of keystrokes
required to type
each of them and the probability that using each of them will causing
confusion. They all may do the same thing, but int main() or int
main(void) look like the best choices when you take that information
into consideration.

I myself prefer:

signed main()
{

}


-JKop
 
J

Jack Klein

Mike Wahler posted:



Minimum bitness of wchar_t = 8 bits.

Minimum range of wchar_t = 0 to 127.

No, minimum range of wchar_t must be the same as the minimum range of
char. And that must be either -127 to 127, or 0 to 255. There is no
integer type in C++ which may have a range of only 0 to 127.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,174
Messages
2,570,940
Members
47,486
Latest member
websterztechnologies01

Latest Threads

Top