How to avoid \x{...} when converting unicode to latin1?

J

Jochen Lehmeier

Hello,

here is a test script that outputs a unicode string which cannot be
represented in latin1 to a latin1-encoded file:

my $unicode="hello \x{010d} world";
binmode STDOUT,":encoding(latin1)";
print $unicode,"\n";

The output of Perl 5.8.8 is:

"\x{010d}" does not map to iso-8859-1 at test.pl line 3.
hello \x{010d} world

So far so good. The output is perfectly fine, as expected.

Is there a way to achieve the output "hello ? world" instead? Having
non-representable characters replaced by the \x{} notation does not help
much, in my case. Non-technical users will not understand this, but they
will understand the "?" (or even latin1-\xbf, the inverse question mark).

Important: I would like to achieve this on the I/O level, either while
opening the file handle, or even with a global setting that catches all
handles opened later. It would be trivial to remove the \x{} using regular
expressions, but that would mean I'd have to make lots of changes to my
scripts.

Thanks in advance!
 
B

Bo Lindbergh

Read the documentation for PerlIO::encoding and note the part about
$PerlIO::encoding::fallback. You clear the PERLQQ bit like this:

use PerlIO::encoding;

$PerlIO::encoding::fallback &= ~ Encode::pERLQQ();
binmode(STDOUT, ":encoding(iso-latin-1)");
print "Hello, World!\x{2122}\n";

If you want to silence the warning, clear the WARN_ON_ERR bit as well:

$PerlIO::encoding::fallback &=
~ (Encode::pERLQQ() | Encode::WARN_ON_ERR());


/Bo Lindbergh
 
J

Jochen Lehmeier

Read the documentation for PerlIO::encoding and note the part about
$PerlIO::encoding::fallback.

I did notice that part, but the documentation is quite terse around that
topic; as far as I can tell it is not possible to arrive at your solution
from there. The PERLQQ and other constants seem to be documented in Encode
(albeit as "Encode::FB_PERLQQ", not "Encode::pERLQQ()", and in the
Malformed Data chapter - my problem is not related to malformed data at
all) while $PerlIO::encoding::fallback is mentioned in the very short
PerlIO::encoding. Neither place mentions the other, as far as I can tell.

If whoever maintains that documentation reads this - maybe you could add
Bo's example in the appropriate place. Not complaining, just pointing out
a source for trouble.
You clear the PERLQQ bit like this:
$PerlIO::encoding::fallback &= ~ Encode::pERLQQ();

Awesome - problem solved.
If you want to silence the warning, clear the WARN_ON_ERR bit as well:

$PerlIO::encoding::fallback &=
~ (Encode::pERLQQ() | Encode::WARN_ON_ERR());

Aye, useful.

Do you happen to know whether it is possible to change that replacement
character from "?" to another one, as well?

Thank you!
 
B

Bo Lindbergh

Jochen Lehmeier said:
Do you happen to know whether it is possible to change that replacement
character from "?" to another one, as well?

Not easily. Setting $PerlIO::encoding::fallback to a coderef ought to work,
but it makes PerlIO::encoding do very strange things. You might have to
define your own encoding (see Encode::Encoding).


/Bo Lindbergh
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,983
Messages
2,570,187
Members
46,747
Latest member
jojoBizaroo

Latest Threads

Top