Il giorno lunedì 27 agosto 2012 12:59:02 UTC+2, mikcec82 ha scritto:
Hallo,
I have an html file on my pc and I want to read it to extract some text.
Can you help on which libs I have to use and how can I do it?
thank you so much.
Michele
Thank you to all.
Hi Chris, thank you for your hint. I'll try to do as you said and to be clear:
I have to work on an HTML File. This file is not a website-file, neither it comes from internet.
It is a file created by a local software (where "local" means "on my pc").
On this file, I need to do this operation:
1) Open the file
2) Check the occurences of the strings:
2a) XXXX, in this case I have this code:
<tr style="font-size: 10" align="left">
<th>
</th><th>
DTC CODE Read:
</th>
<td>
<samp>
</samp>
XXXX
</td>
</tr>
2b) NOT PASSED, in this case I have this code:
<tr style="color: red" align="left">
<th>
</th><th>
CODE CHECK
</th>
<th>
: NOT PASSED
</th>
</tr>
Note: color in "<tr style="color: red" align="left">" can be "red" or "orange"
2c) OK or PASSED
3) Then, I need to fill an excel file following this rules:
3a) If 2a or 2b occurs on htmlfile, I'll write NOK in excel file
3b) If 2c occurs on htmlfile, I'll write OK in excel file
Note:
1) In this example, in 2b case, I have "CODE CHECK" in the code, but I could also have "TEXT CHECK" or "CHAR CHECK".
2) The research of occurences can be done either by tag ("<tr style="color: red" align="left">") or via (NOT PASSED, PASSED). But I would to use the first method.
==================================================
In my script I have used the second way to looking for, i.e.:
**
fileorig = "C:\Users\Mike\Desktop\\2012_05_16_1___p0201_13.html"
f = open(fileorig, 'r')
nomefile = f.read()
for x in nomefile:
if 'XXXX' in nomefile:
print 'NOK'
else :
print 'OK'
**
But this one works on charachters and not on strings (i.e.: in this way I have searched NOT string by string, but charachters-by-charachters).
===============================================
I hope I was clear.
Thank for your help
Michele