HTML::Entities & UTF8

H

howa

Hello, consider my CGI program:

#=======================
use HTML::Entities;
use utf8;

print STDERR encode_entities( $q->param("q") ), "\n";

#=======================

The incoming page was set to UTF8 and the parameter is Chinese UTF8
characters, but the output of the above codes give me:


中國人277


I have already used utf8 and don't know what else I missed.

Any idea?
 
P

Peter J. Holzer

Hello, consider my CGI program:

#=======================
use HTML::Entities;
use utf8;

print STDERR encode_entities( $q->param("q") ), "\n";

#=======================

The incoming page was set to UTF8 and the parameter is Chinese UTF8
characters, but the output of the above codes give me:


中國人277


I have already used utf8 and don't know what else I missed.

You missed two things:

1) "use utf8" means: "The source code of this program is in UTF-8". It
has no effect on the behaviour of your code at run-time. Since your
code (at least the obviously incomplete snippet you posted) contains
only ASCII characters, it has no effect at all.

2) The CGI module (I assume that $q is a CGI object) returns parameters
as raw byte strings (unless . You have to decode them yourself:

print STDERR encode_entities( decode( 'UTF-8', $q->param("q") ) ), "\n";

Alternatively, if you have a rather new version of the CGI module,
you can set the charset to 'utf-8':

$q->charset('utf-8');
print STDERR encode_entities( $q->param("q") ), "\n";

should work (this is completely undocumented and works only with
UTF-8 and also only if utf-8 is written in lower case. Beware!

hp
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,995
Messages
2,570,236
Members
46,825
Latest member
VernonQuy6

Latest Threads

Top