R
Rasmus Villemoes
Hi group
I am having trouble with the open pragma. If I understand "perldoc
open", I should be able to set the default output mode to utf8 by
saying "open OUT => ':utf8';". However, it seems that I still need to
explicitly append :utf8 when openening a file for output. It is far
more likely that there is something I have misunderstood...
=== utf8test.pl ===
#!/usr/bin/perl
use strict;
use warnings;
use open OUT => ':utf8';
use encoding 'utf8';
use Devel:eek;
use HTML::Entities;
my $html = 'Nørre Allé';
#Dump($html);
decode_entities($html);
Dump($html);
open T, '>', 't1.txt';
print T "$html\n";
close T;
open T, '>:utf8', 't2.txt';
print T "$html\n";
close T;
=== end ===
This is the output I get:
$ ./utf8test.pl && file t*.txt && ls -l t*.txt
SV = PV(0x1801660) at 0x180c720
REFCNT = 1
FLAGS = (PADBUSY,PADMY,POK,pPOK,UTF8)
PV = 0x301f90 "N\303\270rre All\303\251"\0 [UTF8 "N\x{f8}rre All\x{e9}"]
CUR = 12
LEN = 25
t1.txt: ISO-8859 text
t2.txt: UTF-8 Unicode text
-rw------- 1 burner burner 11 Jan 24 12:27 t1.txt
-rw------- 1 burner burner 13 Jan 24 12:27 t2.txt
It doesn't matter if I remove the "use open". If I remove "use
encoding", the only difference is that $html doesn't have a UTF8 flag
(but t2.txt still is valid utf8). Interestingly, if I Dump $html
before the decode_entities, the second Dump produces a few more lines
("MAGIC" stuff).
I have $LANG = da_DK.iso8859-1 and "This is perl, v5.8.6 built for
darwin-thread-multi-2level".
I am having trouble with the open pragma. If I understand "perldoc
open", I should be able to set the default output mode to utf8 by
saying "open OUT => ':utf8';". However, it seems that I still need to
explicitly append :utf8 when openening a file for output. It is far
more likely that there is something I have misunderstood...
=== utf8test.pl ===
#!/usr/bin/perl
use strict;
use warnings;
use open OUT => ':utf8';
use encoding 'utf8';
use Devel:eek;
use HTML::Entities;
my $html = 'Nørre Allé';
#Dump($html);
decode_entities($html);
Dump($html);
open T, '>', 't1.txt';
print T "$html\n";
close T;
open T, '>:utf8', 't2.txt';
print T "$html\n";
close T;
=== end ===
This is the output I get:
$ ./utf8test.pl && file t*.txt && ls -l t*.txt
SV = PV(0x1801660) at 0x180c720
REFCNT = 1
FLAGS = (PADBUSY,PADMY,POK,pPOK,UTF8)
PV = 0x301f90 "N\303\270rre All\303\251"\0 [UTF8 "N\x{f8}rre All\x{e9}"]
CUR = 12
LEN = 25
t1.txt: ISO-8859 text
t2.txt: UTF-8 Unicode text
-rw------- 1 burner burner 11 Jan 24 12:27 t1.txt
-rw------- 1 burner burner 13 Jan 24 12:27 t2.txt
It doesn't matter if I remove the "use open". If I remove "use
encoding", the only difference is that $html doesn't have a UTF8 flag
(but t2.txt still is valid utf8). Interestingly, if I Dump $html
before the decode_entities, the second Dump produces a few more lines
("MAGIC" stuff).
I have $LANG = da_DK.iso8859-1 and "This is perl, v5.8.6 built for
darwin-thread-multi-2level".