On Wed, 03 Apr 2013 09:43:06 -0400, Roy Smith wrote:
[...]
This has to inspect the entire string, no?
Correct. A more efficient implementation would be:
def char_size(s):
for n in map(ord, s):
if n > 0xFFFF: return 4
if n > 0xFF: return 2
return 1
I posted (essentially) this a few days ago:
if all(ord(c) <= 0xffff for c in s):
return "it's all bmp"
else:
return "it's got astral crap in it"
It's not "astral crap". People use it, and they'll use it more in the
future. Just because you don't, doesn't give you leave to make
disparaging remarks about it.
Honestly, it's really painful to see how history repeats itself:
"Bah humbug, why do we need to support the SMP astral crap? The Unicode
BMP is more than enough for everybody."
"Bah humbug, why do we need to support Unicode crap? Latin1 is more than
enough for everybody."
"Bah humbug, why do we need to support Latin1 crap? ASCII is more than
enough for everybody."
"Bah humbug, why do we need to support ASCII crap? Uppercase A-Z is more
than enough for everybody."
Seriously. Go back long enough, to the telegraph days, and you have
people arguing that there was no need for upper and lower case letters.
I'm reasonably sure all() is smart enough to stop at the first False
value.
Yes, all() and any() are guaranteed to be short-circuit functions. They
will stop as soon as they see a False or a True value respectively.