Parsing

Thread starter Timothy Wu
Start date Apr 11, 2004

Timothy Wu

Apr 11, 2004

I'm parsing Firefox bookmarks and writing the same bookmark to another
file. To make sure I read and write utf-8 correctly I make open files
like this for read and write:

codecs.open(file, "r", "utf-8")

For regular expression I parse like this:

m = re.search("<TITLE>(.*?)</TITLE>", line, re.I)

How do I tell the regular expression to parse in utf-8? From the docs it
seems like I can do re.compile("<TITLE>(.*?)</TITLE>", 'U') for unicode.
But does it need to be specified to be utf-8 instead of some other
unicode standards? Or does that matter at all?

And, I'm not calling compile() directly at all. I'm simply calling
re.search(). How would I specify unicode? Is it simply re.flags = 'U'
before any call search?

Timothy

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads

Python pyPDF4 code to bookmark pdf based upon date text	1	Jan 18, 2023
Script stops working when using variables to save time typing...	4	Oct 31, 2022
HOWTO: Parsing email using Python part2	1	Jul 15, 2011
removing BOM prepended by codecs?	0	Sep 24, 2013
Working on mobile css menu with plenty of frustration!	2	Dec 29, 2022
encoding error	1	Feb 20, 2013
Writing a Carriage Return in Unicode	11	Nov 19, 2009
encoding error in python 27	4	Feb 22, 2013

Facebook Twitter Reddit Pinterest Tumblr WhatsApp Email Link

Members online

gjsign

Total: 1,044 (members: 4, guests: 1,040)
Robots: 137

Forum statistics

Threads: 474,184

Messages: 2,570,978

Members: 47,561

Latest member: gjsign

Latest Threads

IDE program help
- Started by tomtomyiu
- Today at 2:43 AM
The Best Way to Combine Multiple PST Files in Outlook
- Started by Regain@123
- Yesterday at 1:12 PM
How to manage duplicate items in Microsoft Outlook?
- Started by Regain@123
- Yesterday at 7:11 AM
How to convert MBOX to PDF without losing any data?
- Started by Mahesh01
- Friday at 8:02 AM
Creating a vector of queues
- Started by Chris#
- Thursday at 2:08 PM
How to convert EML files to PDF format using a converter tool?
- Started by Mahesh01
- Thursday at 12:55 PM
How to sync Outlook OST files with Gmail?
- Started by Mahesh01
- Thursday at 11:20 AM
VB .Net form windowState to keep same
- Started by FResher
- Thursday at 11:05 AM
Is that OST to EML Conversion Possible Automatically?
- Started by treekmostly22
- Thursday at 9:39 AM
Convert Excel Contacts to vCard Free Vs Paid Method
- Started by bennettnguyen
- Thursday at 7:54 AM

Top