Parsing an HTML table with XML

R

Rick Walsh

I have an HTML table in the following format:

<table>
<tr><td>Header 1</td><td>Header 2</td></tr>
<tr><td>1</td><td>2</td></tr>
<tr><td>3</td><td>4</td></tr>
<tr><td>5</td><td>6</td></tr>
</table>

With an XSLT styles sheet, I can use for-each to grab the values in
each row

However, I dont want to grab the very first row - because this isnt
data!

How do I iterate throught each <tr> and ignore the first <tr>??
 
J

Johannes Koch

Rick said:
I have an HTML table in the following format:

<table>
<tr><td>Header 1</td><td>Header 2</td></tr>
<tr><td>1</td><td>2</td></tr>
<tr><td>3</td><td>4</td></tr>
<tr><td>5</td><td>6</td></tr>
</table>

With an XSLT styles sheet, I can use for-each to grab the values in
each row

However, I dont want to grab the very first row - because this isnt
data!

Another possiblility would be to change the input by using the (X)HTML
thead and tbody elements, then selecting only tbody/tr.
 
A

Andy Dingley

Rick said:
I have an HTML table in the following format:

<table>
<tr><td>Header 1</td><td>Header 2</td></tr>
<tr><td>1</td><td>2</td></tr>
However, I dont want to grab the very first row - because this isnt
data!

Then code it with <th>, not <td>

If this table isn't under your control, then be carweful of parsing it
with an XML parser -- HTML isn't XML (XHTML on the web usually isn't
either). It's not a good assumption to make if you're trying to build
robust code - something as simple as an embedded &nbsp; might break it.
 
P

Philippe Poulard

Andy Dingley said:
Then code it with <th>, not <td>

If this table isn't under your control, then be carweful of parsing it
with an XML parser -- HTML isn't XML (XHTML on the web usually isn't
either). It's not a good assumption to make if you're trying to build
robust code - something as simple as an embedded &nbsp; might break it.

For this purpose, use an HTML parser ; I personally use neko HTML that I
have included in the RefleX toolkit ; with RefleX, parsing an HTML file
is as simple as parsing an XML file :
http://reflex.gforge.inria.fr/tips.html#N80178E
(section : HTML to XML)

example :
<!--parse a non-well-balanced HTML file to XML-->
<xcl:parse-html name="htmlFile" source="file:///path/to/file.html"/>
<!--apply a stylesheet to it-->
<xcl:transform output="file:///path/to/new-file.html" source="{
$htmlFile }"
stylesheet="file:///path/to/stylesheet.xsl">

of course, you could select with XPath the tag to transform, say the
<body> tag of the parsed HTML ; something like this :
<xcl:transform output="file:///path/to/new-file.html" source="{
$htmlFile/html/body }"
stylesheet="file:///path/to/stylesheet.xsl">

--
Cordialement,

///
(. .)
--------ooO--(_)--Ooo--------
| Philippe Poulard |
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,002
Messages
2,570,261
Members
46,858
Latest member
FlorrieTuf

Latest Threads

Top