Special encoded character

Yoann Wyffels · Oct 19, 2004

Hi,

I catch text from a telnet session (with Net::Telnet Module).
Unfortunately, in what I catch, I've got some strange characters which are
encode like this: "\195\169.....".
For exemple: \195 = é

I don't know what's encode's name it is...? Do you have an idea ?
And do you know how to transform them into normal character ?

Thx a lot,
Regards,
Yoann.

Yoann Wyffels · Oct 22, 2004

This script might help...but it might corrupt your terminal.

remember "reset".

#!/usr/bin/perl -C
# by Tim Toady of perlmonks
#show-unicode.pl

binmode STDOUT, ":utf8";
$pat = "@ARGV";
if (ord $pat > 256) {
$pat = sprintf("%04x", ord $pat);
}
elsif (ord $pat > 128) { # arg in sneaky UTF-8
$pat = sprintf("%04x", unpack("U0U",$pat));
}

@names = split /^/, do 'unicore/Name.pl';

for (@names) {
if (/$pat/io) {
$hex = hex($_);
print chr($hex),"\t",$_;
}
}

__END__

Hi,

Don't really understand the script, can u explain ?

Thx.
Yoann.

Aaron Sherman · Oct 23, 2004

Don't really understand the script, can u explain ?

Well, it's hard to explain. The assumption that he was making is that
your problem is related to the remote side sending you Unicode
characters in UTF-8 encoding. That is, characters that normally would
not fit in the 0..255 range that you are reading from Net::Telnet.

The -C is described in the documentation:

"... the standard I/O handles and the default "open()" layer are
UTF-8-fied but only if the locale environment variables indicate a
UTF-8 locale. This behaviour follows the implicit (and problematic)
UTF-8 behaviour of Perl 5.8.0."

This means that when you read in something like 0xd0 followed by 0x94
then you should interpret that, not as two characters, but as the
Unicode character U+0414 or "Д" (appologies if that doesn't
display on your terminal, it's the CYRILLIC CAPITAL LETTER DE).

If you're running under windows, this might be needed (though I would
think that -C would have taken care of that).

This is the same as "$pat = join '', @ARGV", so we're combining the
comand-line arguments into one string.

If the first character of the string from the command-line is a
Unicode character outside of the range 0..255, then transform $pat
into a string representation of the hex value of the first
character... honestly, I'm starting to lose it myself here...

Huh wa? You've turned on Unicode support with the -C, so ord(anything)
is going to return the correct ordinal value of the character... This
looks like it's assumed to be the first byte of a multi-byte
character, which cannot happen when you're using UTF-8 strings (you'll
see the whole, wide character as one value, and ord will return that
code-point).

Here, we read in the content of a program called "unicore/Name.pl",
evaluate it as Perl code and split that returned string on line
breaks.

Ok, nuff said, that's not going to help you at all since you don't
even have "unicore/Name.pl", whatever that is.

What will help you is this: first, try turning off UTF8 on the input
channel. Do this by putting "-C0" at the beginning of your program
like so:

#!/usr/bin/perl -C0

Then see if you get better results. If that doesn't help, then go the
other way around:

#!/usr/bin/perl -CSD

If that doesn't work, I'm not sure what's going on, and you will want
to take up the previously mentioned tactic of talking to the sysadmin
of the remote machine and asking what the heck you're getting back.

Displaying 'umlaut' character	15	Sep 22, 2010
convert an encoded string	2	Apr 10, 2006
decoding keyboard input when using curses	6	May 30, 2009
CGI.pm and special characters in hidden inputs	15	Dec 29, 2004
XML::Parser and character references	0	Sep 16, 2005
problem with ssh -> authentication failed	1	Nov 10, 2008
AJAX vs form submission (character encoding)	2	Jan 26, 2012
Questions about character entities in XML and PCI security compliance	7	Aug 7, 2008

Special encoded character

Yoann Wyffels

Yoann Wyffels

Aaron Sherman

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads