C
Claude Yih
Hi, everyone. The other day I wrote a piece of code to see how the
UTF-8 text file was saved in the memory. Although the code worked
seemingly well, I still cannot understand a problem about the output
format.
The code is like this (I ignored the #include and #define here) :
int main(void)
{
FILE* fp = NULL;
char buffer[MAX_SIZE];
int byteRead = 0;
char* cp = NULL;
memset(buffer, '\0', sizeof(buffer));
if ((fp = fopen(FILE_NAME, "r")) != NULL)
{
byteRead = fread(buffer, sizeof(char), MAX_SIZE, fp);
if (byteRead != 0)
{
cp = (char *)buffer;
while ((*cp) != '\0')
{
printf("%#X\t", *cp++);
}
}
else
{
}
fclose(fp);
}
else
{
perror("open");
}
return 0;
}
The UTF-8 text file it reads in contains five Japanese characters
("ã‚ã„ã†ãˆãŠ") and four ascii character ("abcd"). And the output
is as below:
0XFFFFFFEF 0XFFFFFFBB 0XFFFFFFBF 0XFFFFFFE3
0XFFFFFF81
0XFFFFFF82 0XFFFFFFE3 0XFFFFFF81 0XFFFFFF84
0XFFFFFFE3
0XFFFFFF81 0XFFFFFF86 0XFFFFFFE3 0XFFFFFF81
0XFFFFFF88
0XFFFFFFE3 0XFFFFFF81 0XFFFFFF8A 0X61 0X62 0X63
0X64
Well,The result is fine, but I don't understand how do those "FFFFFF"
come out? In my opinion, printf("%#X\t", *cp++) will only output the
content of one byte, but the output seems like four bytes. I don't know
why. Can anybody tell my how does it happen? Thanx~~~
UTF-8 text file was saved in the memory. Although the code worked
seemingly well, I still cannot understand a problem about the output
format.
The code is like this (I ignored the #include and #define here) :
int main(void)
{
FILE* fp = NULL;
char buffer[MAX_SIZE];
int byteRead = 0;
char* cp = NULL;
memset(buffer, '\0', sizeof(buffer));
if ((fp = fopen(FILE_NAME, "r")) != NULL)
{
byteRead = fread(buffer, sizeof(char), MAX_SIZE, fp);
if (byteRead != 0)
{
cp = (char *)buffer;
while ((*cp) != '\0')
{
printf("%#X\t", *cp++);
}
}
else
{
}
fclose(fp);
}
else
{
perror("open");
}
return 0;
}
The UTF-8 text file it reads in contains five Japanese characters
("ã‚ã„ã†ãˆãŠ") and four ascii character ("abcd"). And the output
is as below:
0XFFFFFFEF 0XFFFFFFBB 0XFFFFFFBF 0XFFFFFFE3
0XFFFFFF81
0XFFFFFF82 0XFFFFFFE3 0XFFFFFF81 0XFFFFFF84
0XFFFFFFE3
0XFFFFFF81 0XFFFFFF86 0XFFFFFFE3 0XFFFFFF81
0XFFFFFF88
0XFFFFFFE3 0XFFFFFF81 0XFFFFFF8A 0X61 0X62 0X63
0X64
Well,The result is fine, but I don't understand how do those "FFFFFF"
come out? In my opinion, printf("%#X\t", *cp++) will only output the
content of one byte, but the output seems like four bytes. I don't know
why. Can anybody tell my how does it happen? Thanx~~~