T
Timothy Wu
I'm parsing Firefox bookmarks and writing the same bookmark to another
file. To make sure I read and write utf-8 correctly I make open files
like this for read and write:
codecs.open(file, "r", "utf-8")
For regular expression I parse like this:
m = re.search("<TITLE>(.*?)</TITLE>", line, re.I)
How do I tell the regular expression to parse in utf-8? From the docs it
seems like I can do re.compile("<TITLE>(.*?)</TITLE>", 'U') for unicode.
But does it need to be specified to be utf-8 instead of some other
unicode standards? Or does that matter at all?
And, I'm not calling compile() directly at all. I'm simply calling
re.search(). How would I specify unicode? Is it simply re.flags = 'U'
before any call search?
Timothy
file. To make sure I read and write utf-8 correctly I make open files
like this for read and write:
codecs.open(file, "r", "utf-8")
For regular expression I parse like this:
m = re.search("<TITLE>(.*?)</TITLE>", line, re.I)
How do I tell the regular expression to parse in utf-8? From the docs it
seems like I can do re.compile("<TITLE>(.*?)</TITLE>", 'U') for unicode.
But does it need to be specified to be utf-8 instead of some other
unicode standards? Or does that matter at all?
And, I'm not calling compile() directly at all. I'm simply calling
re.search(). How would I specify unicode? Is it simply re.flags = 'U'
before any call search?
Timothy