regex help

David · Jul 8, 2009

Hi

I have a few regexs I need to do, but im struggling to come up with a
nice way of doing them, and more than anything am here to learn some
tricks and some neat code rather than getting an answer - although
thats obviously what i would like to get to.

Problem 1 -

(25.47%) 

I want to extract 25.47 from here - so far I've tried -

xPer = re.search('(.*?)%', content)

and

xPer = re.search('\((\d*)%\) ', content)

neither of these seem to do what I want - am I not doing this
correctly? (obviously!)

Problem 2 -

<td> </td>

<td width="1%" class=key>Open:
</td>
<td width="1%" class=val>5.50
</td>
<td> </td>
<td width="1%" class=key>Mkt Cap:
</td>
<td width="1%" class=val>6.92M
</td>
<td> </td>
<td width="1%" class=key>P/E:
</td>
<td width="1%" class=val>21.99
</td>

I want to extract the open, mkt cap and P/E values - but apart from
doing loads of indivdual REs which I think would look messy, I can't
think of a better and neater looking way. Any ideas?

Cheers

David

Chris Rebert · Jul 8, 2009

Hi

I have a few regexs I need to do, but im struggling to come up with a
nice way of doing them, and more than anything am here to learn some
tricks and some neat code rather than getting an answer - although
thats obviously what i would like to get to.

Problem 1 -

(25.47%) 

I want to extract 25.47 from here - so far I've tried -

xPer = re.search('(.*?)%', content)

and

xPer = re.search('\((\d*)%\) ', content)

neither of these seem to do what I want - am I not doing this
correctly? (obviously!)

Problem 2 -

<td> </td>

<td width="1%" class=key>Open:
</td>
<td width="1%" class=val>5.50
</td>
<td> </td>
<td width="1%" class=key>Mkt Cap:
</td>
<td width="1%" class=val>6.92M
</td>
<td> </td>
<td width="1%" class=key>P/E:
</td>
<td width="1%" class=val>21.99
</td>

I want to extract the open, mkt cap and P/E values - but apart from
doing loads of indivdual REs which I think would look messy, I can't
think of a better and neater looking way. Any ideas?

Use an actual HTML parser? Like BeautifulSoup
(http://www.crummy.com/software/BeautifulSoup/), for instance.

I will never understand why so many people try to parse/scrape
HTML/XML with regexes...

Cheers,
Chris

Tim Harig · Jul 8, 2009

You are downloading market data? Yahoo offers its stats in CSV format that
is easier to parse without a dedicated parser.

Use an actual HTML parser? Like BeautifulSoup
(http://www.crummy.com/software/BeautifulSoup/), for instance.

I agree with your sentiment exactly. If the regex he is trying to get is
difficult enough that he has to ask; then, yes, he should be using a
parser.

I will never understand why so many people try to parse/scrape
HTML/XML with regexes...

Why? Because some times it is good enough to get the job done easily.

Rhodri James · Jul 9, 2009

Hi

I have a few regexs I need to do, but im struggling to come up with a
nice way of doing them, and more than anything am here to learn some
tricks and some neat code rather than getting an answer - although
thats obviously what i would like to get to.

Problem 1 -

(25.47%) 

I want to extract 25.47 from here - so far I've tried -

xPer = re.search('(.*?)%', content)

Supposing that str(xID.group(1)) == "678774", let's see how that string
concatenation turns out:

(.*?)%

The obvious problems here are the spurious double quotes, the spurious
(but harmless) escaping of a double quote, and the lack of (escaped)
backslash and (escaped) open parenthesis. The latter you can always
strip off later, but the first sink the match rather thoroughly.

and

xPer = re.search('\((\d*)%\) ', content)

With only two single quotes present, the biggest problem should be obvious.

Unfortunately if you just fix the obvious in either of the two regular
expressions, you're setting yourself up for a fall later on. As The Fine
Manual says right at the top of the page on the re module
(http://docs.python.org/library/re.html), you want to be using raw string
literals when you're dealing with regular expressions, because you want
the backslashes getting through without being interpreted specially by
Python's own parser. As it happens you get away with it in this case,
since neither '\d' nor '\(' have a special meaning to Python, so aren't
changed, and '\"' is interpreted as '"', which happens to be the right
thing anyway.

Problem 2 -

<td> </td>

<td width="1%" class=key>Open:
</td>
<td width="1%" class=val>5.50
</td>
<td> </td>
<td width="1%" class=key>Mkt Cap:
</td>
<td width="1%" class=val>6.92M
</td>
<td> </td>
<td width="1%" class=key>P/E:
</td>
<td width="1%" class=val>21.99
</td>

I want to extract the open, mkt cap and P/E values - but apart from
doing loads of indivdual REs which I think would look messy, I can't
think of a better and neater looking way. Any ideas?

What you're trying to do is inherently messy. You might want to use
something like BeautifulSoup to hide the mess, but never having had
cause to use it myself I couldn't say for sure.

Peter Otten · Jul 9, 2009

David said:
<td> </td>

<td width="1%" class=key>Open:
</td>
<td width="1%" class=val>5.50
</td>
<td> </td>
<td width="1%" class=key>Mkt Cap:
</td>
<td width="1%" class=val>6.92M
</td>
<td> </td>
<td width="1%" class=key>P/E:
</td>
<td width="1%" class=val>21.99
</td>

I want to extract the open, mkt cap and P/E values - but apart from
doing loads of indivdual REs which I think would look messy, I can't
think of a better and neater looking way. Any ideas?

....
.... <td width="1%" class=key>Open:
.... </td>
.... <td width="1%" class=val>5.50
.... </td>
.... <td> </td>
.... <td width="1%" class=key>Mkt Cap:
.... </td>
.... <td width="1%" class=val>6.92M
.... </td>
.... <td> </td>
.... <td width="1%" class=key>P/E:
.... </td>
.... <td width="1%" class=val>21.99
.... value = key.findNext(attrs={"class": "val"})
.... print key.string.strip(), "-->", value.string.strip()
....
Open: --> 5.50
Mkt Cap: --> 6.92M
P/E: --> 21.99

Can someone tell me if this a real tracker? Or is it one designed to show you a different message at certain times, ie. acting like one?	0	Jan 10, 2021
Sort by number of characters	1	Nov 2, 2023
Angularjs newbie - second JSON datasource does not display	0	May 18, 2022
Filter table rows based on multiple checkboxes value	2	Jan 13, 2023
I need help fixing my website	2	Oct 15, 2023
How to have two html audio players on one page?	0	May 3, 2022
Checking dynamically populated data using ajax with user entered value	5	Apr 11, 2020
Help with code	0	Jun 12, 2022

regex help

David

Chris Rebert

Tim Harig

Rhodri James

Peter Otten

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads