Regular expression issue

dmbkiwi · Jul 19, 2006

I'm trying to parse a line of html as follows:

<td style="width:20%" align="left">101.120

KPA (-)</td>
<td style="width:35%" align="left">Snow on Ground)0 </td>

however, sometimes it looks like this:

<td style="width:20%" align="left">N/A</td>
<td style="width:35%" align="left">Snow on Ground)0 </td>

I want to get either the numerical value 101.120 (which could be a
different number depending on the data that's been fed into the page,
or in terms of the second option, 'N/A'.

The regexp I'm using is:

..*?Pressure.*?"left">(?P<baro>\d+?|N/A)</td>|\sKPA.*?Snow\son\sGround

Can someone help me debug this. It's not picking up the number, and
I'm not sure I've got the syntax for '|' right, but can't find a
detailed tutorial on how to use |.

Any help would be appreciated.

Thanks

Matt

Marc 'BlackJack' Rintsch · Jul 19, 2006

I'm trying to parse a line of html as follows:

<td style="width:20%" align="left">101.120 KPA (-)</td>
<td style="width:35%" align="left">Snow on Ground)0 </td>

however, sometimes it looks like this:

<td style="width:20%" align="left">N/A</td>
<td style="width:35%" align="left">Snow on Ground)0 </td>

I want to get either the numerical value 101.120 (which could be a
different number depending on the data that's been fed into the page,
or in terms of the second option, 'N/A'.

The regexp I'm using is:

.*?Pressure.*?"left">(?P<baro>\d+?|N/A)</td>|\sKPA.*?Snow\son\sGround

Can someone help me debug this. It's not picking up the number, and
I'm not sure I've got the syntax for '|' right, but can't find a
detailed tutorial on how to use |.

What about something like

align="left">((?P<baro>[\d.]+):\(\sKPA)|(?P<na>N/A).*Ground\)

You need the flags re.MULTILINE and re.DOTALL when compiling the regular
expression.

You'll have to check the 'baro' and 'na' groups to decide if it matched a
numerical value or 'N/A'.

Ciao,
Marc 'BlackJack' Rintsch

Sibylle Koczian · Jul 24, 2006

I'm trying to parse a line of html as follows:

<td style="width:20%" align="left">101.120 KPA (-)</td>
<td style="width:35%" align="left">Snow on Ground)0 </td>

however, sometimes it looks like this:

<td style="width:20%" align="left">N/A</td>
<td style="width:35%" align="left">Snow on Ground)0 </td>

I want to get either the numerical value 101.120 (which could be a
different number depending on the data that's been fed into the page,
or in terms of the second option, 'N/A'.

The regexp I'm using is:

.*?Pressure.*?"left">(?P<baro>\d+?|N/A)</td>|\sKPA.*?Snow\son\sGround

Wouldn't it be simpler to use HTMLParser or something similar first to
separate text and HTML tags and get the content of each cell separately?
Then you have only to find the 'right' cell, possibly quite simply by
its position in the HTML table, and check if it contains 'N/A' or
something numeric (that check wouldn't need a regular expression if its
really so simple).

No Python here so I can't try it out to be more specific, but look for
HTMLParser in the library reference.

Can someone tell me if this a real tracker? Or is it one designed to show you a different message at certain times, ie. acting like one?	0	Jan 10, 2021
Can anyone please help? HTML - two tables applying different styles	4	Dec 1, 2020
Having trouble centering contents of td ?	3	May 2, 2023
Image shifts to the right when export the page to pdf	4	May 5, 2023
SendGrid email issue in responsive Gmail	1	Nov 4, 2021
Help with my responsive home page	2	Dec 14, 2022
Angularjs newbie - second JSON datasource does not display	0	May 18, 2022
How to have two html audio players on one page?	0	May 3, 2022

Regular expression issue

dmbkiwi

Marc 'BlackJack' Rintsch

Sibylle Koczian

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads