Ben,
Why? Is your source in latin2?
I'm sorry. The 3rd line is:
use encoding "latin1";
Err... what does yor script do, and in what ways is in not working?
I started with GAWK and used a2p to change it to Perl. I think I know
that the @Fld line isn't allowing it to be Unicode. I have hunted
through the Perl docs concerning my problem and I haven't come up with
an answer. What do you think?
# Perl - a2p - Combines many changes to the Zapotec-Spanish
dictionary.
# Scott Starker
binmode(STDOUT, ":utf8");
binmode(STDIN, ":utf8");
use encoding "latin1";
# ${^WIDE_SYSTEM_CALLS} = 1;
$[ = 1; # set array base to 1
$, = " "; # set output field separator
$\ = "\n"; # set output record separator
$AlreadyGN = 0;
$notes = 0;
$gnsgnFirstLine = 0;
$anyline = 0;
$position = 0;
$lxline = '';
$mldef = '';
$seline = '';
$line = '';
$beg = '';
$end = '';
# This program takes out the "lx"'s that are alone on the line ("\k").
while (<>) {
chomp; # strip record separator
@Fld = split("\x{0020}", $_, 9999); # " "
print "\x{002a}";
# if ($Fld[1] eq " \\ l x") {
# if ($Fld[1] eq "\x{005c}\x{006c}\x{0078}") { # "\\lx"
if ($Fld[1] eq "\x{005c}\x{005c}\x{006c}\x{0078}") { # "\\lx"
print "\x{002a}\x{002a}";
$s = "\x{002d}", s/$s/\^\x{007e}/g; # "-"
# Make "tone" un-bolded
$Fld[2] = "\x{007c}\x{0062}" . $Fld[2]; # "\x{007c}\x{0062}"
s/\x{005b}/\x{007c}\x{0072}\x{005b}/g; # If "[" or "," exist
s/\x{005d}/\x{005d}\x{007c}\x{0062}/g;
s/\x{005d}\x{007c}\x{0062}\x{00b8}\x{0020}/\x{005d}\x{00b8}\x{0020}\x{007c}\x{0062}/g;
$Fld[$#Fld] = $Fld[$#Fld] . "\x{007c}\x{0072}";
$position = index($Fld[$#Fld], "\x{005d}");
$lxline = $_;
..
..
..
Scott