arabic_caracters

D

devdris

Please, help me to print text with arabic font using lccwin32 C
compiler.

Thanks.

Driss
 
C

Clever Monkey

devdris said:
Please, help me to print text with arabic font using lccwin32 C
compiler.
Not really enough information here, but if you need this sort of
processing you need to think about wide or multi-byte characters. I
assume that this compiler toolchain supports and documents the
implementation wide/mb chars.
 
S

Simon Biber

Clever said:
Not really enough information here, but if you need this sort of
processing you need to think about wide or multi-byte characters. I
assume that this compiler toolchain supports and documents the
implementation wide/mb chars.

Not necessarily. Legacy character sets like Windows 1256, ISO 8859-6,
IBM-864 and MacArabic are all single byte character sets. I would
recommend going with UTF-8 in most cases though, which is a multibyte
character set. Or you could use UTF-16 or UTF-32 which work with wide
characters.
 
C

Clever Monkey

Simon said:
Not necessarily. Legacy character sets like Windows 1256, ISO 8859-6,
IBM-864 and MacArabic are all single byte character sets. I would
recommend going with UTF-8 in most cases though, which is a multibyte
character set. Or you could use UTF-16 or UTF-32 which work with wide
characters.
I was suggesting UTF-8 was the way to go. This means wide chars, correct?
 
S

Simon Biber

Clever said:
I was suggesting UTF-8 was the way to go. This means wide chars, correct?

No, it doesn't mean wide chars necessarily.

UTF-8 data is generally stored as strings in C (arrays of char
terminated by a null character).

A UTF-8 data stream may or may not have multi-byte characters. The size
of each character can vary. However, ASCII characters from 0 to 127
always occupy a single byte. Any byte in a UTF-8 data stream that has a
value from 0 to 127 must be a single character, not part of a multi-byte
character. Thus the null character ('\0') can still be used in the
normal way to terminate a string. The strlen() function is useful for
determining the number of bytes that a UTF-8 string takes, but not the
number of characters.

Functions like isalpha() or tolower() are no longer useful for UTF-8
because they need to operate on more than one byte at a time. Converting
a character from upper to lower case or vice versa may even change the
number of bytes that a particular character takes up.

Here's how I would go about converting the UTF-8 character "A" to
lowercase, on a system where there is a locale available such that
multibyte encoding is UTF-8.

/* The locale name in the line below must correspond
to a valid UTF-8 locale on your implementation */
setlocale(LC_ALL, "en_US.UTF-8");

/* utf8 array contains the string "A" with enough space to
store any multibyte character plus the null character */
char utf8[MB_CUR_MAX + 1] = "A";

/* the tmp variable will contain the wide character */
wchar_t tmp;

/* The first multibyte character found in utf8 is
converted to a wide character and stored in tmp */
mbtowc(&tmp, utf8, strlen(utf8));

/* tmp is replaced by a lowercase version of itself */
tmp = towlower(tmp);

/* tmp is converted to a multibyte character sequence
and stored in utf8, followed by a null character */
utf8[wctomb(utf8, tmp)] = 0;

I believe there is no issue with utf8 being written to twice in the
statement above, as there is a sequence point just before the return of
any library functions.
 
D

Dik T. Winter

> A UTF-8 data stream may or may not have multi-byte characters. The size
> of each character can vary. However, ASCII characters from 0 to 127
> always occupy a single byte. Any byte in a UTF-8 data stream that has a
> value from 0 to 127 must be a single character, not part of a multi-byte
> character. Thus the null character ('\0') can still be used in the
> normal way to terminate a string. The strlen() function is useful for
> determining the number of bytes that a UTF-8 string takes, but not the
> number of characters.

size_t utf8strlen(const char *s) {
size_t l = 0;
while(*s != 0) {
if((*s & 0x0c0) != 0x080) l++;
s++;
}
return l;
}

But this code allows for representations that are not formally allowed in
UTF-8.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads


Members online

No members online now.

Forum statistics

Threads
474,184
Messages
2,570,978
Members
47,561
Latest member
gjsign

Latest Threads

Top