unicode string to C string?

Bint · Jan 11, 2008

Hello, is Unicode part of C/C++? I have some unicode data, and I need to
get some kind of ASCii C string from it. Is that possible?

Thanks
B

Alf P. Steinbach · Jan 11, 2008

* Bint:

Hello, is Unicode part of C/C++?

No, unfortunately not, if you're talking /support/ for Unicode, except
that characters can be denoted via their Unicode character codes.

I have some unicode data, and I need to
get some kind of ASCii C string from it. Is that possible?

Yes.

You will need to decide on what to do about non-ASCII characters.

However, in practice, if the Unicode data is in the Basic Multilingual
Plane (original 16-bit Unicode), then all you need to do technically is
to check that most significant byte is zero, then retain only the least
significant byte, because ASCII is a subset of Unicode. And in
practice, I think you can do that via narrow() in the standard library.
Read documentation.

Cheers, & hth.,

- Alf

Alf P. Steinbach · Jan 11, 2008

* Alf P. Steinbach:

* Bint:

No, unfortunately not, if you're talking /support/ for Unicode, except
that characters can be denoted via their Unicode character codes.

Yes.

You will need to decide on what to do about non-ASCII characters.

However, in practice, if the Unicode data is in the Basic Multilingual
Plane (original 16-bit Unicode), then all you need to do technically is
to check that most significant byte is zero, then retain only the least
significant byte, because ASCII is a subset of Unicode. And in
practice, I think you can do that via narrow() in the standard library.
Read documentation.

Forgot to mention, if you don't use narrow() but DIY, then also need to
check that only lowest 7 bits of least significant byte are non-zero,
i.e., in practice (and I assumed 8-bit bytes above), that most
significant bit of least significant byte is zero. Which with signed
char is in practice the same as checking that it's non-negative.

Assuming you really want ASCII.

If Latin-1 is acceptable, don't need to check most significant bit.

Cheers, & hth.,

- Alf

Bint · Jan 11, 2008

thanks!

cr88192 · Jan 11, 2008

Bint said:
Hello, is Unicode part of C/C++? I have some unicode data, and I need to
get some kind of ASCii C string from it. Is that possible?

I may also mention, that depending on ones' needs, UTF-8 may be worth
looking into.

reason:
UTF-8, for plain ASCII characters, is exactly the same as the ASCII version;
it can also preserve the unicode range, by encoding non-ASCII characters as
multiple bytes, and so, very often, we can work with UTF-8 data in much the
same way as with plain ASCII.

also, unlike Unicode (in particular, UTF-16), in most cases we can utilize
ASCII and UTF-8 strings interchangably.

the main cost, however, is that in most common compilers, char is signed by
default (causing there to be negative char values, or us having to cast to
'unsigned char' in cases where we actually care), but this is a pretty minor
issue most of the time.

just my opinion mostly...

Rui Maciel · Jan 11, 2008

Bint said:
Hello, Â is Unicode part of C/C++? Â I have some unicode data, and I need to
get some kind of ASCii C string from it. Â Is that possible?

I don't believe it is, at least explicitly. As far as I know, the C99
standard implemented the wchar_t type and the wchar.h library in order to
allow the implementation of some sort of hard unicode support.

Regardint the Unicode to ASCII conversion, you should have in mind that
ASCII is a subset of the UTF8 standard. That means that if you want to
convert information described with the UTF8 format to ASCII, you do run the
danger of losing information. So, why not abandon ASCII and instead
implement support for UTF-8? Some operating systems already did that.

Rui Maciel

Sherman Pendley · Jan 11, 2008

Bint said:
Hello, is Unicode part of C/C++? I have some unicode data, and I need to
get some kind of ASCii C string from it. Is that possible?

A very popular Unicode libary is IBM's ICU:

<http://www.icu-project.org/>

sherm--

Copy string from 2D array to a 1D array in C	1	Nov 1, 2023
C code String And Comparison	2	Dec 27, 2022
Python3 string slicing	2	Dec 11, 2023
SQL Connection string regex pattern to parse sections	1	May 9, 2024
Noob question about mathematical addition vs. "string addition" in C#	1	Mar 6, 2022
Measuring a string of text	1	Sep 15, 2022
Hardcoding a Unicode String(looks not work)	4	Jun 26, 2011
Rearranging .ply file via C++ String Parsing	0	Dec 14, 2019

unicode string to C string?

Bint

Alf P. Steinbach

Alf P. Steinbach

Bint

cr88192

Rui Maciel

Sherman Pendley

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads