T
Tuxedo
In reading and printing a file that may contain UTF-8 characters and print
it into a web browser, my first attempt is:
#!/usr/bin/perl -w
use warnings;
use strict;
use CGI qwstandard);
print "Content-type: text/plain; charset=UTF-8\n\n";
open my $fh, "<:encoding(UTF-8)", 'UTF-8-demo.txt';
binmode STDOUT, ':utf-8';
while (my $line = <$fh>) {
print $line;
}
The example file is this one:
http://www.cl.cam.ac.uk/~mgk25/ucs/examples/UTF-8-demo.txt
Of course, different browsers and systems have different result depending
on supported characters in the UTF-8 range (I guess) and while most
characters in the above UTF-8-demo.txt display when reading the file as
above, some characters towards the end of the page, being the ones
following the lowercase basic Latin alphabet, i.e. the British pound sign,
the copyright symbol and the remaining 9 characters on that same line, do
not to display in an up-to-date web browser with the above read and print
procedure, while they do display as they should when accessing the
UTF-8-demo.txt file directly in a same browser via the above URL. If
however I omit the "encoding(UTF-8)" part after my $fh I find that those
particular characters print correctly.
While I guess UTF-8 compatibility is generally a broad topic, what are the
better or worse ways to read and print UTF-8 for maximum success in typical
web browsers?
Sorry if the question is a bit basic and has been asked times before, but
any comments and examples are always much appreciated.
Many thanks,
Tuxedo
it into a web browser, my first attempt is:
#!/usr/bin/perl -w
use warnings;
use strict;
use CGI qwstandard);
print "Content-type: text/plain; charset=UTF-8\n\n";
open my $fh, "<:encoding(UTF-8)", 'UTF-8-demo.txt';
binmode STDOUT, ':utf-8';
while (my $line = <$fh>) {
print $line;
}
The example file is this one:
http://www.cl.cam.ac.uk/~mgk25/ucs/examples/UTF-8-demo.txt
Of course, different browsers and systems have different result depending
on supported characters in the UTF-8 range (I guess) and while most
characters in the above UTF-8-demo.txt display when reading the file as
above, some characters towards the end of the page, being the ones
following the lowercase basic Latin alphabet, i.e. the British pound sign,
the copyright symbol and the remaining 9 characters on that same line, do
not to display in an up-to-date web browser with the above read and print
procedure, while they do display as they should when accessing the
UTF-8-demo.txt file directly in a same browser via the above URL. If
however I omit the "encoding(UTF-8)" part after my $fh I find that those
particular characters print correctly.
While I guess UTF-8 compatibility is generally a broad topic, what are the
better or worse ways to read and print UTF-8 for maximum success in typical
web browsers?
Sorry if the question is a bit basic and has been asked times before, but
any comments and examples are always much appreciated.
Many thanks,
Tuxedo