Fetching data from a HTML file

S

Sangeet

Hi,

I've got to fetch data from the snippet below and have been trying to match the digits in this to specifically to specific groups. But I can't seem to figure how to go about stripping the tags! :(

<tr><td align="center"><b>Sum</b></td><td></td><td align='center' class="green">245</td><td align='center' class="red">11</td><td align='center'>0</td><td align='center' >256</td><td align='center' >1.496 [min]</td></tr>
</table>

Actually, I'm working on ROBOT Framework, and haven't been able to figure out how to read data from HTML tables. Reading from the source, is the best (read rudimentary) way I could come up with. Any suggestions are welcome!

Thanks,
Sangeet
 
P

Prasad, Ramit

Actually, I'm working on ROBOT Framework, and haven't been able to figure
out how to read data from HTML tables. Reading from the source, is the best
(read rudimentary) way I could come up with. Any suggestions are welcome!

I've got to fetch data from the snippet below and have been trying to match
the digits in this to specifically to specific groups. But I can't seem to
figure how to go about stripping the tags! :(

In addition to Simon's response. You may want to look at Beautiful Soup
which I hear is good at dealing with malformed HTML.
http://www.crummy.com/software/BeautifulSoup/



Ramit


Ramit Prasad | JPMorgan Chase Investment Bank | Currencies Technology
712 Main Street | Houston, TX 77002
work phone: 713 - 216 - 5423

--
This email is confidential and subject to important disclaimers and
conditions including on offers for the purchase or sale of
securities, accuracy and completeness of information, viruses,
confidentiality, legal privilege, andlegal entity disclaimers,
available at http://www.jpmorgan.com/pages/disclosures/email.
 
D

Daniel Fetchinson

Hi,

I've got to fetch data from the snippet below and have been trying to match
the digits in this to specifically to specific groups. But I can't seem to
figure how to go about stripping the tags! :(

<tr><td align="center"><b>Sum</b></td><td></td><td align='center'
class="green">245</td><td align='center' class="red">11</td><td
align='center'>0</td><td align='center' >256</td><td align='center' >1.496
[min]</td></tr>
</table>

Try beautiful soup: http://www.crummy.com/software/BeautifulSoup/
 
J

Jon Clements

Hi,

I've got to fetch data from the snippet below and have been trying to match the digits in this to specifically to specific groups. But I can't seem to figure how to go about stripping the tags! :(

<tr><td align="center"><b>Sum</b></td><td></td><td align='center' class="green">245</td><td align='center' class="red">11</td><td align='center'>0</td><td align='center' >256</td><td align='center' >1.496 [min]</td></tr>
</table>

Actually, I'm working on ROBOT Framework, and haven't been able to figureout how to read data from HTML tables. Reading from the source, is the best (read rudimentary) way I could come up with. Any suggestions are welcome!

Thanks,
Sangeet

I would personally use lxml - a quick example:

# -*- coding: utf-8 -*-
import lxml.html

text = """
<tr><td align="center"><b>Sum</b></td>​<td></td><td align='center' class="green">245</td><td align='center' class="red">11</td><td align='center'>0</td><td align='center' >256</td><td align='center' >1..496 [min]</td></tr>
</table>
"""

table = lxml.html.fromstring(text)
for tr in table.xpath('//tr'):
print [ (el.get('class', ''), el.text_content()) for el in tr.iterfind('td') ]

[('', 'Sum'), ('', ''), ('green', '245'), ('red', '11'), ('', '0'), ('', '256'), ('', '1.496 [min]')]

It does a reasonable job, but if it doesn't work quite right, then there's a .fromstring(parser=...) option, and you should be able to pass in ElementSoup and try your luck from there.

hth,

Jon.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,968
Messages
2,570,154
Members
46,702
Latest member
LukasConde

Latest Threads

Top