nkf #guess1 and #guess2 on html files

U

Une bévue

may be i'm not using correctly nkf #guess1 but it gaves me return type 3
(suuposed to be UTF-8) for ISO-8859-1 encoded files.

it gaves me also 3 for UTF-8 encoded files ???

my code is simply :

NKF.guess1(string)

with string=<whole file content>

also sometimes guess1 disaggreed with guess2 ???

whare could i find a table giving the encoding versus returned values
???
 
Y

YANAGAWA Kazuhisa

In Message-Id: <1hcnady.1rrszh87n73rfN%[email protected]>
may be i'm not using correctly nkf #guess1 but it gaves me return type 3
(suuposed to be UTF-8) for ISO-8859-1 encoded files.

it gaves me also 3 for UTF-8 encoded files ???

Unfortunately NKF is just for Japanese tool, so you can't use it for
general code conversion / guessing, I think.
 
U

Une bévue

YANAGAWA Kazuhisa said:
Unfortunately NKF is just for Japanese tool, so you can't use it for
general code conversion / guessing, I think.

ok, fine, i need just a tool in order to discriminate between ISO-8859-1
and UTF-8 (as a first step) without using the meta content-type charset
in the html file, which isn't reliable, for example a Ruby Cocoa site
(<http://www.rubycocoa.com/the-rubification-of-rtw>) says it's
ISO-8859-1 encoding (in the meta tag) but it is in fact UTF-8 (said by
Firefox and text editor and also http headers...)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,202
Messages
2,571,057
Members
47,668
Latest member
SamiraShac

Latest Threads

Top