unicode string to C string?

B

Bint

Hello, is Unicode part of C/C++? I have some unicode data, and I need to
get some kind of ASCii C string from it. Is that possible?

Thanks
B
 
A

Alf P. Steinbach

* Bint:
Hello, is Unicode part of C/C++?

No, unfortunately not, if you're talking /support/ for Unicode, except
that characters can be denoted via their Unicode character codes.

I have some unicode data, and I need to
get some kind of ASCii C string from it. Is that possible?

Yes.

You will need to decide on what to do about non-ASCII characters.

However, in practice, if the Unicode data is in the Basic Multilingual
Plane (original 16-bit Unicode), then all you need to do technically is
to check that most significant byte is zero, then retain only the least
significant byte, because ASCII is a subset of Unicode. And in
practice, I think you can do that via narrow() in the standard library.
Read documentation. :)


Cheers, & hth.,

- Alf
 
A

Alf P. Steinbach

* Alf P. Steinbach:
* Bint:

No, unfortunately not, if you're talking /support/ for Unicode, except
that characters can be denoted via their Unicode character codes.



Yes.

You will need to decide on what to do about non-ASCII characters.

However, in practice, if the Unicode data is in the Basic Multilingual
Plane (original 16-bit Unicode), then all you need to do technically is
to check that most significant byte is zero, then retain only the least
significant byte, because ASCII is a subset of Unicode. And in
practice, I think you can do that via narrow() in the standard library.
Read documentation. :)

Forgot to mention, if you don't use narrow() but DIY, then also need to
check that only lowest 7 bits of least significant byte are non-zero,
i.e., in practice (and I assumed 8-bit bytes above), that most
significant bit of least significant byte is zero. Which with signed
char is in practice the same as checking that it's non-negative.

Assuming you really want ASCII.

If Latin-1 is acceptable, don't need to check most significant bit.


Cheers, & hth.,

- Alf
 
C

cr88192

Bint said:
Hello, is Unicode part of C/C++? I have some unicode data, and I need to
get some kind of ASCii C string from it. Is that possible?

I may also mention, that depending on ones' needs, UTF-8 may be worth
looking into.

reason:
UTF-8, for plain ASCII characters, is exactly the same as the ASCII version;
it can also preserve the unicode range, by encoding non-ASCII characters as
multiple bytes, and so, very often, we can work with UTF-8 data in much the
same way as with plain ASCII.

also, unlike Unicode (in particular, UTF-16), in most cases we can utilize
ASCII and UTF-8 strings interchangably.


the main cost, however, is that in most common compilers, char is signed by
default (causing there to be negative char values, or us having to cast to
'unsigned char' in cases where we actually care), but this is a pretty minor
issue most of the time.


just my opinion mostly...
 
R

Rui Maciel

Bint said:
Hello,  is Unicode part of C/C++?  I have some unicode data, and I need to
get some kind of ASCii C string from it.  Is that possible?

I don't believe it is, at least explicitly. As far as I know, the C99
standard implemented the wchar_t type and the wchar.h library in order to
allow the implementation of some sort of hard unicode support.

Regardint the Unicode to ASCII conversion, you should have in mind that
ASCII is a subset of the UTF8 standard. That means that if you want to
convert information described with the UTF8 format to ASCII, you do run the
danger of losing information. So, why not abandon ASCII and instead
implement support for UTF-8? Some operating systems already did that.


Rui Maciel
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,184
Messages
2,570,973
Members
47,527
Latest member
RoxanneTos

Latest Threads

Top