Unicode (UTF-8)

  • Thread starter Luigi Donatello Asero
  • Start date
L

Luigi Donatello Asero

Frank Olieu said:
_Jonathan N. Little_ skrev | wrote | crivit (27-05-2006 02:04):


In XP you need to replace /all/ instances of notepad.exe (backed up original
versions), and at some point, answer 'yes' if XP asks you whether you really
want to keep the 'counterfeit' (AFAIR).
But that's not really 'hacking', is it?

Luigi:

You mean by law? :)


Licence agreement.
 
L

Luigi Donatello Asero

Jukka K. Korpela said:
I think Luigi Asero has refused to understand the principles of character
encoding. That might explain part of the phenomenon.

Freedom of speech is very important
Undoubtedly. Pretty much anything that can be expressed as written text in
computer-readable form can be encoded in UTF-8. More exactly, all Unicode
text can be encoded in UTF-8.

http://www.unicode.org/faq/font_keyboard.html#7
Unicode has several
A few myriads. The exact number depends on your ontology of symbols. (Does a
symbol exist if it is known from one single written document only? What
about two?)


No, because not all Chinese symbols have (yet) been included into Unicode.

Fine.
Which solution do you choose when you write in Chinese?
 
J

Jonathan N. Little

Frank said:
_Jonathan N. Little_ skrev | wrote | écrivit (27-05-2006 02:04):


In XP you need to replace /all/ instances of notepad.exe (backed up original
versions), and at some point, answer 'yes' if XP asks you whether you really
want to keep the 'counterfeit' (AFAIR).
But that's not really 'hacking', is it?

You have to disable the SFC first, replace all the hidden compressed
backup versions, if you do not do it exactly or promptly the old Notepad
will be restored...yeah I'd call it a hack.
 
F

Frank Olieu

_Luigi Donatello Asero_ skrev | wrote | écrivit (27-05-2006 16:26):
Licence agreement.

I don't think the licence agreement disallows installing third party
applications... yet!
 
T

Toby Inkster

Alan said:
I just tried saving the BBC News Chinese front page in the two
encodings:
81672 May 27 13:34 bbc-chinese-utf16.html
43759 May 27 13:33 bbc-chinese-utf8.html

In case that might be an unfair choice, I tried a Bank of China site:
141476 May 27 13:43 boc-tw-utf16.html
73144 May 27 13:44 boc-tw-utf8.html

Though both of these are probably a bit more markup-heavy/content-light
than I would care to write. On pages with a higher signal-to-noice ratio,
you'll probably find UTF-16 fares better.
 
A

Alan J. Flavell

Though both of these are probably a bit more
markup-heavy/content-light than I would care to write.

Very well - you're free to offer your own examples - I can't read
Chinese anyway. But that still doesn't address the other points I
made, about browser and search engine support, etc.

Anyway, here's a couple of W3C formal documents in "Simplified
Chinese" translation, picked more or less at random, again after
character encoding conversion using Mozilla Composer:

204638 May 28 11:58 rdfconcepts-utf16.html
119672 May 28 11:59 rdfconcepts-utf8.html

93112 May 28 12:02 XHTML10-gb2312.html (original)
168120 May 28 12:03 XHTML10-utf16.html
102106 May 28 12:03 XHTML10-utf8.html

Keep in mind that in going from utf8 to utf16, you are typically
saving one in three bytes per character of Chinese payload, but you
are doubling the number of bytes for markup, URLs etc. It's a
delicate tradeoff! I haven't yet seen a real web page where utf16
wins (I don't *think* there's anything fundamentally wrong with the
way I'm doing this), but you're free to produce examples.
 
L

Luigi Donatello Asero

Toby Inkster said:
Though both of these are probably a bit more markup-heavy/content-light
than I would care to write. On pages with a higher signal-to-noice ratio,
you'll probably find UTF-16 fares better.



I have noticed that
www.baidu.com uses
charset=gb2312
(see
http://www.baidu.com/s?lm=0&si=&rn=10&ie=gb2312&ct=0&wd=Stonetech+Shanghai&pn=10&cl=3 )
and there are both Chinese and English character on that page.
I have saved several pages as php and php does not support Unicode, so far..
Besides, Windows XP let me write Chinese in Word Pad, so one solution might
be to buy a new ftp program which works on XP.
Which one would you recommend which is possible to pay by invoice in Sweden?
Or does PC-Linq work even as FTP program?
And by the way should I call the files in Chinese by any name with Chinese
characters or Latin characters??
www.baidu.com seems to be with Latin Characters.. and my domain names are
not with Chinese signs but are there any domains with Chinese characters?
 
T

Toby Inkster

Luigi said:
buy a new ftp program which works on XP.
Which one would you recommend which is possible to pay by invoice in Sweden?

WS_FTP LE or Filezilla.

Both have a cost of 0 Swedish krona, which you can pay by invoice if you
like.
 
D

dorayme

Toby Inkster said:
Both have a cost of 0 Swedish krona, which you can pay by invoice if you
like.

I don't think Luigi would agree to the sense of an empty list or
class and so he won't believe you can pay no krona. And since you
can't pay more, he will conclude you cannot obtain it... er...
 
L

Luigi Donatello Asero

L

Luigi Donatello Asero

Toby Inkster said:
Yes, but non-ASCII domain names are still fairly poorly supported so far.

To check if your browser supports them, try:

http://www.bücher.ch/

(It should redirect to http://www.buecher.de/)

As far as I understand,
I do not own any browser.....
I may only use one browser or the other to navigate on the Internet under
the terms of the respective
licence agreement.
 
J

Jonathan N. Little

Luigi said:
As far as I understand,
I do not own any browser.....
I may only use one browser or the other to navigate on the Internet under
the terms of the respective
licence agreement.

Who's EULA?!
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,995
Messages
2,570,226
Members
46,815
Latest member
treekmostly22

Latest Threads

Top