How to parse a name out of a web page?

H

Haibao Tang

with high accuracy...

My temporary plan is to first recognized consecutive two or three
initial-capitalized words, but certainly we need to do more than that?
Anyone has suggestions?

Thanks first.
 
R

Rune Strand

Haibao said:
with high accuracy...

My temporary plan is to first recognized consecutive two or three
initial-capitalized words, but certainly we need to do more than that?
Anyone has suggestions?

Thanks first.

It's not easy to say without seeing the HTML. If you the structure
allows it, a couple of str.split() is probably the easiest way, but you
always have BeautifulSoup.

http://www.crummy.com/software/BeautifulSoup/
 
A

Andrew Gwozdziewycz

with high accuracy...

My temporary plan is to first recognized consecutive two or three
initial-capitalized words, but certainly we need to do more than that?
Anyone has suggestions?

Thanks first.

Surely, this is a task for http://nltk.sourceforge.net/ . Especially
if you want high accuracy.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,292
Messages
2,571,494
Members
48,171
Latest member
EllaHolmwo

Latest Threads

Top