Encoding detection in the html parser from libxml2

icoba · Feb 7, 2006

Hi,

I am parsing html documents using the html parser from libxml2, and if
the encoding is included in the document it works perfectly but if it
is not, I think it does not work well (probably because I am doing
something wrong).

As it is said in http://xmlsoft.org/encoding.html the parser should
detect the encoding. So I tested it putting an utf-8 word in a file and
it does not detect it (it generates a wrong string). Example:
reducciÃ³n --> reducciÃÂ³n.

I just use the parser as a SAX parser because I do not need a tree, so
to parse the file I use the htmlParseChunk() function and I create the
context with htmlCreatePushParser().

Is it posible that the encoding detection does not work with
htmlParseChunk? If it is so, what method should I use?

Thanks, Cesar

Keyboard event detection in C#	1	Feb 8, 2023
Using sax libxml2 html parser	1	Jan 5, 2007
Uploading images - binary or unsupported text encoding	2	Dec 24, 2022
HTML Parser	3	Jul 2, 2013
Changing .html in URL	3	Jul 11, 2022
Parser	11	Apr 27, 2014
libxml2 with C++ exception	2	Jul 13, 2011
c# Webbrowser logged in detection problem(login works)	0	Dec 23, 2018

Encoding detection in the html parser from libxml2

icoba

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads