M
matteosartori
Hi all,
I've spent all morning trying to work this one out:
I've got the following string:
<td>04/01/2006</td><td>Wednesday</td><td> </td><td>09:14</td><td>12:44</td><td>12:50</td><td>17:58</td><td> </td><td> </td><td> </td><td> </td><td>08:14</td>
from which I'm attempting to extract the date, and the five times from
into a list. Only the very last time is guaranteed to be there so it
should also work for a line like:
<td>03/01/2006</td><td>Tuesday</td><td>Annual_Holiday</td><td> </td><td> </td><td> </td><td> </td><td> </td><td> </td><td> </td><td> </td><td>08:00</td>
My Python regular expression to match that is currently:
digs = re.compile(
r'<td>(\d{2}\/\d{2}\/\d{4})</td>.*?(?:<td>(\d+\:\d+)</td>).*$' )
which first extracts the date into group 1
then matches the tags between the date and the first instance of a time
into group 2
then matches the first instance of a time into group 3
but then group 4 grabs all the remaining string.
I've tried changing the time pattern into
(?:<td>(\d+\:\d+)</td>)+
but that doesn't seem to mean "grab one or more cases of the previous
regexp."
Any Python regexp gurus with a hint would be greatly appreciated.
M@
I've spent all morning trying to work this one out:
I've got the following string:
<td>04/01/2006</td><td>Wednesday</td><td> </td><td>09:14</td><td>12:44</td><td>12:50</td><td>17:58</td><td> </td><td> </td><td> </td><td> </td><td>08:14</td>
from which I'm attempting to extract the date, and the five times from
into a list. Only the very last time is guaranteed to be there so it
should also work for a line like:
<td>03/01/2006</td><td>Tuesday</td><td>Annual_Holiday</td><td> </td><td> </td><td> </td><td> </td><td> </td><td> </td><td> </td><td> </td><td>08:00</td>
My Python regular expression to match that is currently:
digs = re.compile(
r'<td>(\d{2}\/\d{2}\/\d{4})</td>.*?(?:<td>(\d+\:\d+)</td>).*$' )
which first extracts the date into group 1
then matches the tags between the date and the first instance of a time
into group 2
then matches the first instance of a time into group 3
but then group 4 grabs all the remaining string.
I've tried changing the time pattern into
(?:<td>(\d+\:\d+)</td>)+
but that doesn't seem to mean "grab one or more cases of the previous
regexp."
Any Python regexp gurus with a hint would be greatly appreciated.
M@