M
Mike Brown
I thought I was being pretty clever with my first attempt at using generators,
but I seem to be missing some crucial concept, for even though this seems to
work as intended, the text of the exception message does not bubble up with
either of the ValueErrors when one of them is raised.
# This helps iterate over a unicode string. When python is built with
# 16-bit chars (as is the default on Windows), it returns surrogate
# pairs together (unlike 'for c in s'), and detects illegal surrogate
# pairs. Byte strings are unaffected.
def chars(s):
surrogate = None
for c in s:
cp = ord(c)
if surrogate is not None:
if cp > 56319 and cp < 57344:
pair = surrogate + c
surrogate = None
yield pair
else:
raise ValueError("Bad surrogate pair in %s" % s)
else:
if cp > 55295 and cp < 57344:
if cp < 56320:
surrogate = c
else:
raise ValueError("Bad surrogate pair in %s" %s)
else:
surrogate = None
yield c
if surrogate is not None:
raise ValueError("Bad surrogate pair at end of %s" % s)
# as expected, returns u'example \xe9...\u2022...\U00010000...\U0010fffd'
''.join([c for c in chars(u'example \xe9...\u2022...\ud800\udc00...\U0010fffd')])
# now test the 3 exception conditions. Each produces a ValueError
''.join([c for c in chars(u'2nd half bad: \ud800bogus')])
''.join([c for c in chars(u'no 1st half: \udc00')])
''.join([c for c in chars(u'no 2nd half: \ud800')])
All 3 result of the exception tests result in a bare ValueError; there's no
"Bad surrogate pair in" message shown. Why is thta? What am I doing wrong?
but I seem to be missing some crucial concept, for even though this seems to
work as intended, the text of the exception message does not bubble up with
either of the ValueErrors when one of them is raised.
# This helps iterate over a unicode string. When python is built with
# 16-bit chars (as is the default on Windows), it returns surrogate
# pairs together (unlike 'for c in s'), and detects illegal surrogate
# pairs. Byte strings are unaffected.
def chars(s):
surrogate = None
for c in s:
cp = ord(c)
if surrogate is not None:
if cp > 56319 and cp < 57344:
pair = surrogate + c
surrogate = None
yield pair
else:
raise ValueError("Bad surrogate pair in %s" % s)
else:
if cp > 55295 and cp < 57344:
if cp < 56320:
surrogate = c
else:
raise ValueError("Bad surrogate pair in %s" %s)
else:
surrogate = None
yield c
if surrogate is not None:
raise ValueError("Bad surrogate pair at end of %s" % s)
# as expected, returns u'example \xe9...\u2022...\U00010000...\U0010fffd'
''.join([c for c in chars(u'example \xe9...\u2022...\ud800\udc00...\U0010fffd')])
# now test the 3 exception conditions. Each produces a ValueError
''.join([c for c in chars(u'2nd half bad: \ud800bogus')])
''.join([c for c in chars(u'no 1st half: \udc00')])
''.join([c for c in chars(u'no 2nd half: \ud800')])
All 3 result of the exception tests result in a bare ValueError; there's no
"Bad surrogate pair in" message shown. Why is thta? What am I doing wrong?