K
kettle
Hi,
I was wondering how I ought to be handling character range
translations in python.
What I want to do is translate fullwidth numbers and roman alphabet
characters into their halfwidth ascii equivalents.
In perl I can do this pretty easily with tr:
tr/\x{ff00}-\x{ff5e}/\x{0020}-\x{007e}/;
and I think the string.translate method is what I need to use to
achieve the equivalent in python. Unfortunately the maktrans method
doesn't seem to accept character ranges and I'm also having trouble
with it's interpretation of length. What I came up with was to first
fudge the ranges:
my_test_string = u"$B#A#B#C#D#E#F#G(B"
f_range = "".join([unichr(x) for x in
range(ord(u"\uff00"),ord(u"\uff5e"))])
t_range = "".join([unichr(x) for x in
range(ord(u"\u0020"),ord(u"\u007e"))])
then use these as input to maketrans:
my_trans_string =
my_test_string.translate(string.maketrans(f_range,t_range))
Traceback (most recent call last):
File "<stdin>", line 1, in ?
UnicodeEncodeError: 'ascii' codec can't encode characters in position
0-93: ordinal not in range(128)
but it generates an encoding error... and if I encodethe ranges in
utf8 before passing them on I get a length error because maketrans is
counting bytes not characters and utf8 is variable width...
my_trans_string =
my_test_string.translate(string.maketrans(f_range.encode("utf8"),t_range.encode("utf8")))
Traceback (most recent call last):
File "<stdin>", line 1, in ?
ValueError: maketrans arguments must have same length
I was wondering how I ought to be handling character range
translations in python.
What I want to do is translate fullwidth numbers and roman alphabet
characters into their halfwidth ascii equivalents.
In perl I can do this pretty easily with tr:
tr/\x{ff00}-\x{ff5e}/\x{0020}-\x{007e}/;
and I think the string.translate method is what I need to use to
achieve the equivalent in python. Unfortunately the maktrans method
doesn't seem to accept character ranges and I'm also having trouble
with it's interpretation of length. What I came up with was to first
fudge the ranges:
my_test_string = u"$B#A#B#C#D#E#F#G(B"
f_range = "".join([unichr(x) for x in
range(ord(u"\uff00"),ord(u"\uff5e"))])
t_range = "".join([unichr(x) for x in
range(ord(u"\u0020"),ord(u"\u007e"))])
then use these as input to maketrans:
my_trans_string =
my_test_string.translate(string.maketrans(f_range,t_range))
Traceback (most recent call last):
File "<stdin>", line 1, in ?
UnicodeEncodeError: 'ascii' codec can't encode characters in position
0-93: ordinal not in range(128)
but it generates an encoding error... and if I encodethe ranges in
utf8 before passing them on I get a length error because maketrans is
counting bytes not characters and utf8 is variable width...
my_trans_string =
my_test_string.translate(string.maketrans(f_range.encode("utf8"),t_range.encode("utf8")))
Traceback (most recent call last):
File "<stdin>", line 1, in ?
ValueError: maketrans arguments must have same length