S
sln
Below is my sample code. This works but if I could just get
a byte string from a *possible* utf8 string with anything simpler
than this, I would be a happy camper.
In the real app, I have no control over how the sample is generated.
Its likely read from PerlIO with whatever encoding layers are applied.
I don't want to have to worry about that, just get it back to a byte
string for analysis.
Thanks alot.
-sln
--------------------------
use strict;
use warnings;
my $sample = "unicode->\x{feff}\x{21000}\x{21000}";
print "\nUTF string, length = ".length($sample).", '$sample' :\n ";
for (map {ord $_} split //, $sample) {
printf ("%x ",$_);
}
print "\n";
my ($bytes, $offset) = ('',0);
for (map {ord $_} split //, $sample)
{
my @ar = ();
while ($_ > 0) {
push @ar, $_ & 0xff;
$_ >>= 8;
}
for (reverse @ar) {
vec ($bytes, $offset++, 8) = $_;
}
}
print "\nByte converted, length = ".length($bytes).", '$bytes' :\n ";
for (map {ord $_} split //, $bytes) {
printf ("%02x ",$_);
}
print "\n";
__END__
Wide character in print at btest.pl line 6.
UTF string, length = 12, 'unicode->n++=íÇÇ=íÇÇ' :
75 6e 69 63 6f 64 65 2d 3e feff 21000 21000
Byte converted, length = 17, 'unicode->¦ ?? ?? ' :
75 6e 69 63 6f 64 65 2d 3e fe ff 02 10 00 02 10 00
a byte string from a *possible* utf8 string with anything simpler
than this, I would be a happy camper.
In the real app, I have no control over how the sample is generated.
Its likely read from PerlIO with whatever encoding layers are applied.
I don't want to have to worry about that, just get it back to a byte
string for analysis.
Thanks alot.
-sln
--------------------------
use strict;
use warnings;
my $sample = "unicode->\x{feff}\x{21000}\x{21000}";
print "\nUTF string, length = ".length($sample).", '$sample' :\n ";
for (map {ord $_} split //, $sample) {
printf ("%x ",$_);
}
print "\n";
my ($bytes, $offset) = ('',0);
for (map {ord $_} split //, $sample)
{
my @ar = ();
while ($_ > 0) {
push @ar, $_ & 0xff;
$_ >>= 8;
}
for (reverse @ar) {
vec ($bytes, $offset++, 8) = $_;
}
}
print "\nByte converted, length = ".length($bytes).", '$bytes' :\n ";
for (map {ord $_} split //, $bytes) {
printf ("%02x ",$_);
}
print "\n";
__END__
Wide character in print at btest.pl line 6.
UTF string, length = 12, 'unicode->n++=íÇÇ=íÇÇ' :
75 6e 69 63 6f 64 65 2d 3e feff 21000 21000
Byte converted, length = 17, 'unicode->¦ ?? ?? ' :
75 6e 69 63 6f 64 65 2d 3e fe ff 02 10 00 02 10 00