Special encoded character

Y

Yoann Wyffels

Hi,

I catch text from a telnet session (with Net::Telnet Module).
Unfortunately, in what I catch, I've got some strange characters which are
encode like this: "\195\169.....".
For exemple: \195 = é

I don't know what's encode's name it is...? Do you have an idea ?
And do you know how to transform them into normal character ?

Thx a lot,
Regards,
Yoann.
 
Y

Yoann Wyffels

This script might help...but it might corrupt your terminal.
remember "reset".

#!/usr/bin/perl -C
# by Tim Toady of perlmonks
#show-unicode.pl

binmode STDOUT, ":utf8";
$pat = "@ARGV";
if (ord $pat > 256) {
$pat = sprintf("%04x", ord $pat);
}
elsif (ord $pat > 128) { # arg in sneaky UTF-8
$pat = sprintf("%04x", unpack("U0U",$pat));
}

@names = split /^/, do 'unicore/Name.pl';

for (@names) {
if (/$pat/io) {
$hex = hex($_);
print chr($hex),"\t",$_;
}
}

__END__



Hi,

Don't really understand the script, can u explain ?

Thx.
Yoann.
 
A

Aaron Sherman

Don't really understand the script, can u explain ?

Well, it's hard to explain. The assumption that he was making is that
your problem is related to the remote side sending you Unicode
characters in UTF-8 encoding. That is, characters that normally would
not fit in the 0..255 range that you are reading from Net::Telnet.

The -C is described in the documentation:

"... the standard I/O handles and the default "open()" layer are
UTF-8-fied but only if the locale environment variables indicate a
UTF-8 locale. This behaviour follows the implicit (and problematic)
UTF-8 behaviour of Perl 5.8.0."

This means that when you read in something like 0xd0 followed by 0x94
then you should interpret that, not as two characters, but as the
Unicode character U+0414 or "Д" (appologies if that doesn't
display on your terminal, it's the CYRILLIC CAPITAL LETTER DE).

If you're running under windows, this might be needed (though I would
think that -C would have taken care of that).

This is the same as "$pat = join '', @ARGV", so we're combining the
comand-line arguments into one string.

If the first character of the string from the command-line is a
Unicode character outside of the range 0..255, then transform $pat
into a string representation of the hex value of the first
character... honestly, I'm starting to lose it myself here...

Huh wa? You've turned on Unicode support with the -C, so ord(anything)
is going to return the correct ordinal value of the character... This
looks like it's assumed to be the first byte of a multi-byte
character, which cannot happen when you're using UTF-8 strings (you'll
see the whole, wide character as one value, and ord will return that
code-point).

Here, we read in the content of a program called "unicore/Name.pl",
evaluate it as Perl code and split that returned string on line
breaks.

Ok, nuff said, that's not going to help you at all since you don't
even have "unicore/Name.pl", whatever that is.

What will help you is this: first, try turning off UTF8 on the input
channel. Do this by putting "-C0" at the beginning of your program
like so:

#!/usr/bin/perl -C0

Then see if you get better results. If that doesn't help, then go the
other way around:

#!/usr/bin/perl -CSD

If that doesn't work, I'm not sure what's going on, and you will want
to take up the previously mentioned tactic of talking to the sysadmin
of the remote machine and asking what the heck you're getting back.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,161
Messages
2,570,892
Members
47,430
Latest member
7dog123

Latest Threads

Top