byte count unicode string

W

willie

# What's the correct way to get the
# byte count of a unicode (UTF-8) string?
# I couldn't find a builtin method
# and the following is memory inefficient.

ustr = "example\xC2\x9D".decode('UTF-8')

num_chars = len(ustr) # 8

buf = ustr.encode('UTF-8')

num_bytes = len(buf) # 9


# Thanks.
 
J

John Machin

willie said:
# What's the correct way to get the
# byte count of a unicode (UTF-8) string?
# I couldn't find a builtin method
# and the following is memory inefficient.

ustr = "example\xC2\x9D".decode('UTF-8')

num_chars = len(ustr) # 8

buf = ustr.encode('UTF-8')

num_bytes = len(buf) # 9

num_bytes = len("example\xC2\x9D")

This produces 9; isn't that what you want?
If not, please explain, with examples, what you mean by "the
byte count of a unicode (UTF-8) string".

HTH,
John
 
M

Marc 'BlackJack' Rintsch

# What's the correct way to get the
# byte count of a unicode (UTF-8) string?
# I couldn't find a builtin method
# and the following is memory inefficient.

ustr = "example\xC2\x9D".decode('UTF-8')

num_chars = len(ustr) # 8

buf = ustr.encode('UTF-8')

num_bytes = len(buf) # 9

That is the correct way.

Ciao,
Marc 'BlackJack' Rintsch
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,969
Messages
2,570,161
Members
46,705
Latest member
Stefkari24

Latest Threads

Top