byte count unicode string

willie · Sep 20, 2006

# What's the correct way to get the
# byte count of a unicode (UTF-8) string?
# I couldn't find a builtin method
# and the following is memory inefficient.

ustr = "example\xC2\x9D".decode('UTF-8')

num_chars = len(ustr) # 8

buf = ustr.encode('UTF-8')

num_bytes = len(buf) # 9

# Thanks.

John Machin · Sep 20, 2006

willie said:
# What's the correct way to get the
# byte count of a unicode (UTF-8) string?
# I couldn't find a builtin method
# and the following is memory inefficient.

ustr = "example\xC2\x9D".decode('UTF-8')

num_chars = len(ustr) # 8

buf = ustr.encode('UTF-8')

num_bytes = len(buf) # 9

num_bytes = len("example\xC2\x9D")

This produces 9; isn't that what you want?
If not, please explain, with examples, what you mean by "the
byte count of a unicode (UTF-8) string".

HTH,
John

Marc 'BlackJack' Rintsch · Sep 20, 2006

# What's the correct way to get the
# byte count of a unicode (UTF-8) string?
# I couldn't find a builtin method
# and the following is memory inefficient.

ustr = "example\xC2\x9D".decode('UTF-8')

num_chars = len(ustr) # 8

buf = ustr.encode('UTF-8')

num_bytes = len(buf) # 9

That is the correct way.

Ciao,
Marc 'BlackJack' Rintsch

byte count unicode string

willie

John Machin

Marc 'BlackJack' Rintsch

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads

byte count unicode string	1	Sep 20, 2006
byte count unicode string	0	Sep 21, 2006
byte count unicode string	2	Sep 20, 2006
byte count unicode string	2	Sep 20, 2006
Unicode literals and byte string interpretation.	4	Oct 28, 2011
byte count unicode string	7	Sep 20, 2006
Python Unicode handling wins again -- mostly	67	Nov 30, 2013
Byte string to Unicode & vice versa	1	Apr 27, 2006