G
googleboy
Hi.
I am trying to collapse an html table into a single line. Basically,
anytime I see ">" & "<" with nothing but whitespace between them, I'd
like to remove all the whitespace, including newlines. I've read the
how-to and I have tried a bunch of things, but nothing seems to work
for me:
--
table = open(r'D:\path\to\tabletest.txt', 'rb')
strTable = table.read()
#Below find the different sort of things I have tried, one at a time:
strTable = strTable.replace(">\s<", "><") #I got this from the module
docs
strTable = strTable.replace(">.<", "><")
strTable = ">\s+<".join(strTable)
strTable = ">\s<".join(strTable)
print strTable
--
The table in question looks like this:
<table width="80%" border="0">
<tr>
<td> </td>
<td colspan="2">Introduction</td>
<td><div align="right">3</div></td>
</tr>
<tr>
<td> </td>
</tr>
<tr>
<td><i>ONE</i></td>
<td colspan="2">Childraising for Parrots</td>
<td><div align="right">11</div></td>
</tr>
</table>
For extra kudos (and I confess I have been so stuck on the above
problem I haven't put much thought into how to do this one) I'd like to
be able to measure the number of characters between the <p> & </p>
tags, and then insert a newline character at the end of the next word
after an arbitrary number of characters..... I am reading in to a
script a bunch of paragraphs formatted for a webpage, but they're all
on one big long line and I would like to split them for readability.
TIA
Googleboy
I am trying to collapse an html table into a single line. Basically,
anytime I see ">" & "<" with nothing but whitespace between them, I'd
like to remove all the whitespace, including newlines. I've read the
how-to and I have tried a bunch of things, but nothing seems to work
for me:
--
table = open(r'D:\path\to\tabletest.txt', 'rb')
strTable = table.read()
#Below find the different sort of things I have tried, one at a time:
strTable = strTable.replace(">\s<", "><") #I got this from the module
docs
strTable = strTable.replace(">.<", "><")
strTable = ">\s+<".join(strTable)
strTable = ">\s<".join(strTable)
print strTable
--
The table in question looks like this:
<table width="80%" border="0">
<tr>
<td> </td>
<td colspan="2">Introduction</td>
<td><div align="right">3</div></td>
</tr>
<tr>
<td> </td>
</tr>
<tr>
<td><i>ONE</i></td>
<td colspan="2">Childraising for Parrots</td>
<td><div align="right">11</div></td>
</tr>
</table>
For extra kudos (and I confess I have been so stuck on the above
problem I haven't put much thought into how to do this one) I'd like to
be able to measure the number of characters between the <p> & </p>
tags, and then insert a newline character at the end of the next word
after an arbitrary number of characters..... I am reading in to a
script a bunch of paragraphs formatted for a webpage, but they're all
on one big long line and I would like to split them for readability.
TIA
Googleboy