utf8, length and syswrite are killing me

A

A. Farber

Hello,

I have a russian card game at
http://apps.facebook.com/video-preferans/
which I've recently moved from using urlencoded data
to XML data in UTF-8. Since then it often hangs
for the users and I suspect, that my subroutine:

sub enqueue {
my $child = shift;
my $data = shift;
my $fh = $child->{FH};
my $response = $child->{RESPONSE};

# flash.net.Socket.readUTF() expects 16-bit prefix in network
order
my $prefix = pack 'n', length $data;

# append to the end of the outgoing queue
push @{$response}, $prefix . $data;
}

packs wrong number of bytes for cyrillic messages.

I'm using perl v5.10.0 at OpenBSD 4.5 and
"perldoc -tf length" suggests using
length(Encoding::encode_utf8(EXPR))

But when I put the line:

use Encode::Encoding;
....
my $prefix = pack 'n', length(Encoding::encode_utf8($data));

then it borks with

Undefined subroutine &Encoding::encode_utf8 called at Child.pm line
229.

Any help please?

Also I have to mention, that when users chat
in Russian, my server just passes their cyrillic
messages around (with sysread - poll - syswrite).

But for their cyrillic words in my program (I "use utf8;")
I have to call utf8::encode($cyrillic_word) before I can
write it away with syswrite or it would die ("wide char").

I've tried moving utf8::encode($data) into the
enqueue subroutine above but it doesn' allow me
(maybe because parts of $data are not utf8??)

Regards
Alex
 
S

sln

Hello,

I have a russian card game at
http://apps.facebook.com/video-preferans/
which I've recently moved from using urlencoded data
to XML data in UTF-8. Since then it often hangs
for the users and I suspect, that my subroutine:

sub enqueue {
my $child = shift;
my $data = shift;
my $fh = $child->{FH};
my $response = $child->{RESPONSE};

# flash.net.Socket.readUTF() expects 16-bit prefix in network
order
my $prefix = pack 'n', length $data;

# append to the end of the outgoing queue
push @{$response}, $prefix . $data;
}

packs wrong number of bytes for cyrillic messages.
If '$data' is still a Perl string,
I would encode() to UTF-8 octets then
push @outarray, pack ('n a*', length($octets), $octets);
But, you could do it a couple of different ways. Basically
you want the length to be of the encoded data, not the length
of the perl string (if it's in Perl character semantics).

You really don't want to push '$prefix . $data' if $data is
not yet encoded utf-8. If it is already encoded utf-8, then
the length would be correct because its already bytes (octets),
not character semantics.

You should read the Unicode docs: perluniintro, perlunicode, unicode, etc.
Each have links that take you to each other documentation.

Below is some examples of a couple of ways to do it. See what works
for you.

-sln

----------------------
use strict;
use warnings;
use Encode;

binmode (STDOUT, ':encoding(UTF-8)');

##
my $perlstring = "This is a string <\x{2100}>...";
my $utf8octets = encode('UTF-8', $perlstring);
my $packd_string = pack('n', length($utf8octets));
my $unpackd_string = unpack('n', $packd_string);
print "** Perl string : '$perlstring', length = ", length($perlstring),"\n\n";
print "UTF-8 octets: '$utf8octets', length = ", length($utf8octets),"\n\n";
print "Packed length of encoded string is $unpackd_string\n\n";

##
my $len_plus_octets = $packd_string . $utf8octets;
print "Length.UTF-8 octets: '$len_plus_octets'\n\n";

##
my $packd_all = pack ('n a*', length($utf8octets), $utf8octets);
print "Packed all : '$packd_all', length = ",length($packd_all),"\n\n";

##
my ($len,$octets) = unpack ('n a*', $packd_all);
print "Unpacked all : '$octets', length = ",length($octets),"\n";
print " : read packed length = $len\n\n";
my $decoded_string = decode('UTF-8', $octets);
print "** Perl string : '$decoded_string', length = ", length($decoded_string), "\n\n";
if ($decoded_string eq $perlstring) {
print "** Perl strings are equal.\n";
}
else {
print "** Perl strings are not equal.\n";
}
__END__
** Perl string : 'This is a string <GäÇ>...', length = 23

UTF-8 octets: 'This is a string <+ó-ä-Ç>...', length = 25

Packed length of encoded string is 25

Length.UTF-8 octets: ' ?This is a string <+ó-ä-Ç>...'

Packed all : ' ?This is a string <+ó-ä-Ç>...', length = 27

Unpacked all : 'This is a string <+ó-ä-Ç>...', length = 25
: read packed length = 25

** Perl string : 'This is a string <GäÇ>...', length = 23

** Perl strings are equal.
 
A

A. Farber

Thank you! I've ended up with encode($data) and after that the
length() gives me the number of bytes for the syswrite (I hope)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,968
Messages
2,570,152
Members
46,698
Latest member
LydiaHalle

Latest Threads

Top