Unicode to HTML entities

C

Clodoaldo

I was looking for a function to transform a unicode string into
htmlentities. Not only the usual html escaping thing but all
characters.

As I didn't find I wrote my own:

# -*- coding: utf-8 -*-
from htmlentitydefs import codepoint2name

def unicode2htmlentities(u):

htmlentities = list()

for c in u:
if ord(c) < 128:
htmlentities.append(c)
else:
htmlentities.append('&%s;' % codepoint2name[ord(c)])

return ''.join(htmlentities)

print unicode2htmlentities(u'São Paulo')

Is there a function like that in one of python builtin modules? If not
is there a better way to do it?

Regards, Clodoaldo Pinto Neto
 
D

Duncan Booth

Clodoaldo said:
That was a fast answer. I would never find that myself.
You might actually want:
'São Paulo &amp; Espírito Santo'

as you have to be sure to escape any ampersands in your unicode
string before doing the encode.
 
T

Tommy Nordgren

I was looking for a function to transform a unicode string into
htmlentities. Not only the usual html escaping thing but all
characters.

As I didn't find I wrote my own:

# -*- coding: utf-8 -*-
from htmlentitydefs import codepoint2name

def unicode2htmlentities(u):

htmlentities = list()

for c in u:
if ord(c) < 128:
htmlentities.append(c)
else:
htmlentities.append('&%s;' % codepoint2name[ord(c)])

return ''.join(htmlentities)

print unicode2htmlentities(u'São Paulo')

Is there a function like that in one of python builtin modules? If not
is there a better way to do it?

Regards, Clodoaldo Pinto Neto
In many cases, the need to use html/xhtml entities can be avoided by
generating
utf8- coded pages.
 
C

Clodoaldo

I was looking for a function to transform a unicode string into
htmlentities. Not only the usual html escaping thing but all
characters.
As I didn't find I wrote my own:
# -*- coding: utf-8 -*-
from htmlentitydefs import codepoint2name
def unicode2htmlentities(u):
htmlentities = list()
for c in u:
if ord(c) < 128:
htmlentities.append(c)
else:
htmlentities.append('&%s;' % codepoint2name[ord(c)])
return ''.join(htmlentities)
print unicode2htmlentities(u'São Paulo')
Is there a function like that in one of python builtin modules? If not
is there a better way to do it?
Regards, Clodoaldo Pinto Neto

In many cases, the need to use html/xhtml entities can be avoided by
generating
utf8- coded pages.

Sure. All my pages are utf-8 encoded. The case I'm dealing with is an
email link which subject has non ascii characters like in:

<a href=mailto:[email protected]?subject=Dúvidas>Mail to</a>

Somehow when the user clicks on the link the subject goes to his email
client with the non ascii chars as garbage.

And before someone points that I should not expose email addresses,
the email is only linked with the consent of the owner and the source
is obfuscated to make it harder for a robot to harvest it.

Regards, Clodoaldo
 
C

Clodoaldo

You might actually want:


'São Paulo &amp; Espírito Santo'

as you have to be sure to escape any ampersands in your unicode
string before doing the encode.

I will do it. Thanks.

Regards, Clodoaldo.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,968
Messages
2,570,153
Members
46,701
Latest member
XavierQ83

Latest Threads

Top