Where can be a problem?

L

Lad

I use the following
###############
import re
Results=[]
data1='<a href="detailaspxmember=15015&mode=advert" </a><a
href="detailaspxmember=15016&mode=advert" </a><a
href="detailaspxmember=15017&mode=advert" </a>'
ID = re.compile(r'^.*=(\d+)&.*$',re.MULTILINE)
Results=re.findall(ID,data1)
print Results
#############
to extract from data1 all numbers such as 15015,15016,15017

But the program extracts only the last number 15017.
Why?
Thank you for help
La.
 
P

Peter Otten

Lad said:
I use the following
###############
import re
Results=[]
data1='<a href="detailaspxmember=15015&mode=advert" </a><a
href="detailaspxmember=15016&mode=advert" </a><a
href="detailaspxmember=15017&mode=advert" </a>'
ID = re.compile(r'^.*=(\d+)&.*$',re.MULTILINE)
Results=re.findall(ID,data1)
print Results
#############
to extract from data1 all numbers such as 15015,15016,15017

But the program extracts only the last number 15017.
Why?
Thank you for help
La.

After changing

data = '...
'

to

data = '''...
'''

I get all three numbers. There is probably another significant difference
between the posted code and the code you are actually running.

Peter
 
L

Lad

Peter,
I tried exactly this
########
import re
Results=[]
data1='<a href="detailaspxmember=15015&mode=advert" </a><a
href="detailaspxmember=15016&mode=advert" </a><a
href="detailaspxmember=15017&mode=advert" </a>'
ID = re.compile(r'^.*=(\d+)&.*$',re.MULTILINE)
Results=re.findall(ID,data1)
print "Results are= ",Results
#########
and received
Results are= ['15017']

Not all numbers

What exactly did you get?
Thanks.
L.
 
P

Peter Otten

Lad said:
Peter,
I tried exactly this
########
import re
Results=[]
data1='<a href="detailaspxmember=15015&mode=advert" </a><a
href="detailaspxmember=15016&mode=advert" </a><a
href="detailaspxmember=15017&mode=advert" </a>'
ID = re.compile(r'^.*=(\d+)&.*$',re.MULTILINE)
Results=re.findall(ID,data1)
print "Results are= ",Results
#########
and received
Results are= ['15017']

Not all numbers

What exactly did you get?

With /exactly/ this, I get:

$ cat lad1.py
import re
Results=[]
data1='<a href="detailaspxmember=15015&mode=advert" </a><a
href="detailaspxmember=15016&mode=advert" </a><a
href="detailaspxmember=15017&mode=advert" </a>'
ID = re.compile(r'^.*=(\d+)&.*$',re.MULTILINE)
Results=re.findall(ID,data1)
print "Results are= ",Results
$ python lad1.py
File "lad1.py", line 3
data1='<a href="detailaspxmember=15015&mode=advert" </a><a
^
SyntaxError: EOL while scanning single-quoted string

When I modify it to compile, I get /exactly/ this:

$ cat lad2.py
import re
Results=[]
data1='''<a href="detailaspxmember=15015&mode=advert" </a><a
href="detailaspxmember=15016&mode=advert" </a><a
href="detailaspxmember=15017&mode=advert" </a>'''
ID = re.compile(r'^.*=(\d+)&.*$',re.MULTILINE)
Results=re.findall(ID,data1)
print "Results are= ",Results
$ python lad2.py
Results are= ['15015', '15016', '15017']

Peter
 
L

Lad

Thank you Peter for help.
The reason why it did not work was the fact that findall function
required CRLF among lines
 
P

Paul McGuire

Try this, its a bit more readable than your re.

from pyparsing import Word,nums,Literal,replaceWith

data1='''<a href="detailaspxmember=15015&m-ode=advert" </a><a
href="detailaspxmember=15016&m­ode=advert" </a><a
href="detailaspxmember=15017&m­ode=advert" </a>'''

# a number is a word composed of nums, that is, the digits 0-9
# your search string is looking for a number between an '=' and '&'
EQUALS = Literal("=")
AMPER = Literal("&")
number = Word(nums)
hrefNumber = EQUALS + number + AMPER

# scanString is a generator, that returns matching tokens, start,
# and end location for each occurrence in the input string - we
# just care about the second token of each match
print [ tokens[1] for tokens,s,e in hrefNumber.scanString(data1) ]

# just for grins, here is how to convert the numbers to the
# string "###"
number.setParseAction( replaceWith("###") )
print number.transformString(data1)


Prints:

['15015', '15016', '15017']
<a href="detailaspxmember=###&m-ode=advert" </a><a
href="detailaspxmember=###&m­ode=advert" </a><a
href="detailaspxmember=###&m­ode=advert" </a>

Download pyparsing at http://pyparsing.sourceforge.net.

-- Paul
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,262
Messages
2,571,311
Members
47,986
Latest member
ColbyG935

Latest Threads

Top