RE + UTF-8

C

cepl

Working on extension of genericwiki.py plugin for PyBlosxom and I have
problems with UTF-8 and RE. When I have this wiki line, it does break
URL too early:

[http://en.wikipedia.org/wiki/Petr_Chelcický Petr Chelcický's]
work(s) into English.

and creates

[<a
href="http://en.wikipedia.org/wiki/Petr_Chel">http://en.wikipedia.org/wiki/Petr_Chel</a>cický
Petr Chelcický's]

The RE genericwiki uses for parsing this:

# WikiName pattern used in your wiki
wikinamepattern = r'\b(([A-Z]\w+){2,})\b' # original
mailurlpattern = r'mailto\:[\"\-\_\.\w]+\@[\-\_\.\w]+\w'
newsurlpattern = r'news\:(?:\w+\.){1,}\w+'
fileurlpattern =
r'(?:http|https|file|ftp):[/-_.\w-]+[\/\w][?&+=%\w/-_.#]*'

[...]

# Turn '[xxx:address label]' into labeled link
body = re.sub(r'\[(' +
fileurlpattern + '|' +
mailurlpattern + '|' +
newsurlpattern + ')\s+(.+?)\]',
r'<a href="\1">\2</a>', body,re.U)

I have tried to test RE and UTF-8 in Python generally and the results
are even more confusing (done with locale cs_CZ.UTF-8 in konsole):
locale.getpreferredencoding() 'UTF-8'
print re.sub("(\w*)","X","[Chelcický]",re.L) X[X?Xý]
print re.sub("(\w*)","X","[Chelcický]",re.UNICODE) X[X?X?X]X

I would expect that both print commands should give just plain X, but
apparently Python doesn't undestand that. What's the problem?

Thanks for any reply,

Matej
 
?

=?ISO-8859-1?Q?Michael_Str=F6der?=

I have tried to test RE and UTF-8 in Python generally and the results
are even more confusing (done with locale cs_CZ.UTF-8 in konsole):
locale.getpreferredencoding()
'UTF-8'
print re.sub("(\w*)","X","[Chelcický]",re.L)

You first have to turn the raw strings into Unicode strings. It seems on
your console it should be:

unicode('[Chelcický]','utf-8')

Note that you have to set HTTP headers and <form accept-charset=...> in
web applications.

Ciao, Michael.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,264
Messages
2,571,323
Members
48,006
Latest member
TerranceCo

Latest Threads

Top