arabic_caracters

devdris · Jul 14, 2006

Please, help me to print text with arabic font using lccwin32 C
compiler.

Thanks.

Driss

Clever Monkey · Jul 14, 2006

devdris said:
Please, help me to print text with arabic font using lccwin32 C
compiler.

Not really enough information here, but if you need this sort of
processing you need to think about wide or multi-byte characters. I
assume that this compiler toolchain supports and documents the
implementation wide/mb chars.

Simon Biber · Jul 14, 2006

Clever said:
Not really enough information here, but if you need this sort of
processing you need to think about wide or multi-byte characters. I
assume that this compiler toolchain supports and documents the
implementation wide/mb chars.

Not necessarily. Legacy character sets like Windows 1256, ISO 8859-6,
IBM-864 and MacArabic are all single byte character sets. I would
recommend going with UTF-8 in most cases though, which is a multibyte
character set. Or you could use UTF-16 or UTF-32 which work with wide
characters.

Clever Monkey · Jul 14, 2006

Simon said:
Not necessarily. Legacy character sets like Windows 1256, ISO 8859-6,
IBM-864 and MacArabic are all single byte character sets. I would
recommend going with UTF-8 in most cases though, which is a multibyte
character set. Or you could use UTF-16 or UTF-32 which work with wide
characters.

I was suggesting UTF-8 was the way to go. This means wide chars, correct?

Simon Biber · Jul 14, 2006

Clever said:
I was suggesting UTF-8 was the way to go. This means wide chars, correct?

No, it doesn't mean wide chars necessarily.

UTF-8 data is generally stored as strings in C (arrays of char
terminated by a null character).

A UTF-8 data stream may or may not have multi-byte characters. The size
of each character can vary. However, ASCII characters from 0 to 127
always occupy a single byte. Any byte in a UTF-8 data stream that has a
value from 0 to 127 must be a single character, not part of a multi-byte
character. Thus the null character ('\0') can still be used in the
normal way to terminate a string. The strlen() function is useful for
determining the number of bytes that a UTF-8 string takes, but not the
number of characters.

Functions like isalpha() or tolower() are no longer useful for UTF-8
because they need to operate on more than one byte at a time. Converting
a character from upper to lower case or vice versa may even change the
number of bytes that a particular character takes up.

Here's how I would go about converting the UTF-8 character "A" to
lowercase, on a system where there is a locale available such that
multibyte encoding is UTF-8.

/* The locale name in the line below must correspond
to a valid UTF-8 locale on your implementation */
setlocale(LC_ALL, "en_US.UTF-8");

/* utf8 array contains the string "A" with enough space to
store any multibyte character plus the null character */
char utf8[MB_CUR_MAX + 1] = "A";

/* the tmp variable will contain the wide character */
wchar_t tmp;

/* The first multibyte character found in utf8 is
converted to a wide character and stored in tmp */
mbtowc(&tmp, utf8, strlen(utf8));

/* tmp is replaced by a lowercase version of itself */
tmp = towlower(tmp);

/* tmp is converted to a multibyte character sequence
and stored in utf8, followed by a null character */
utf8[wctomb(utf8, tmp)] = 0;

I believe there is no issue with utf8 being written to twice in the
statement above, as there is a sequence point just before the return of
any library functions.

Dik T. Winter · Jul 14, 2006

> A UTF-8 data stream may or may not have multi-byte characters. The size
> of each character can vary. However, ASCII characters from 0 to 127
> always occupy a single byte. Any byte in a UTF-8 data stream that has a
> value from 0 to 127 must be a single character, not part of a multi-byte
> character. Thus the null character ('\0') can still be used in the
> normal way to terminate a string. The strlen() function is useful for
> determining the number of bytes that a UTF-8 string takes, but not the
> number of characters.

size_t utf8strlen(const char *s) {
size_t l = 0;
while(*s != 0) {
if((*s & 0x0c0) != 0x080) l++;
s++;
}
return l;
}

But this code allows for representations that are not formally allowed in
UTF-8.

arabic_caractere	2	Jul 14, 2006
Translater + module + tkinter	1	Feb 16, 2023
Porting C software	171	Aug 23, 2007
Need assistance finetuning HTML, CSS, Javascript - sticky header issue	3	Feb 25, 2022
Div placement	2	Jul 17, 2023
Language OR software(coding)?	0	Apr 17, 2020
warning, lccwin32 virus	13	Apr 13, 2010
Interfering CSS	1	Feb 9, 2024

arabic_caracters

devdris

Clever Monkey

Simon Biber

Clever Monkey

Simon Biber

Dik T. Winter

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads