Please help with regular expression finding multiple floats

J

Jeremy

I have text that looks like the following (but all in one string with
'\n' separating the lines):

1.0000E-08 1.58024E-06 0.0048
1.0000E-07 2.98403E-05 0.0018
1.0000E-06 8.85470E-06 0.0026
1.0000E-05 6.08120E-06 0.0032
1.0000E-03 1.61817E-05 0.0022
1.0000E+00 8.34460E-05 0.0014
2.0000E+00 2.31616E-05 0.0017
5.0000E+00 2.42717E-05 0.0017
total 1.93417E-04 0.0012

I want to capture the two or three floating point numbers in each line
and store them in a tuple. I want to find all such tuples such that I
have
[('1.0000E-08', '1.58024E-06', '0.0048'),
('1.0000E-07', '2.98403E-05', '0.0018'),
('1.0000E-06', '8.85470E-06', '0.0026'),
('1.0000E-05', '6.08120E-06', '0.0032'),
('1.0000E-03', '1.61817E-05', '0.0022'),
('1.0000E+00', '8.34460E-05', '0.0014'),
('2.0000E+00', '2.31616E-05', '0.0017'),
('5.0000E+00', '2.42717E-05', '0.0017')
('1.93417E-04', '0.0012')]

as a result. I have the regular expression pattern

fp1 = '([-+]?\d*\.?\d+(?:[eE][-+]?\d+)?)\s+'

which can find a floating point number followed by some space. I can
find three floats with

found = re.findall('%s%s%s' %fp1, text)

My question is, how can I use regular expressions to find two OR three
or even an arbitrary number of floats without repeating %s? Is this
possible?

Thanks,
Jeremy
 
C

Cousin Stanley

I have text that looks like the following
(but all in one string with '\n' separating the lines):
....

I want to capture the two or three floating point numbers in each line
and store them in a tuple.
....
I have the regular expression pattern
....

Jeremy ....

For a non-regular-expression solution
you might consider something simlar to
the following ....

s = '''\
1.0000E-08 1.58024E-06 0.0048
1.0000E-07 2.98403E-05 0.0018
1.0000E-06 8.85470E-06 0.0026
1.0000E-05 6.08120E-06 0.0032
1.0000E-03 1.61817E-05 0.0022
1.0000E+00 8.34460E-05 0.0014
2.0000E+00 2.31616E-05 0.0017
5.0000E+00 2.42717E-05 0.0017
total 1.93417E-04 0.0012'''

l1 = s.split( '\n' )

l2 = [ ]

for this_row in l1[ : -1 ] :
temp = this_row.strip().split()
l2.append( [ float( x ) for x in temp ] )

last = l1[ -1 ].strip().split()[ 1 : ]

l2.append( [ float( x ) for x in last ] )

print
for this_row in l2 :
if len( this_row ) > 2 :
x , y , z = this_row
print ' %5.4e %5.4e %5.4e ' % ( x , y , z )
else :
x , y = this_row
print ' %5.4e %5.4e ' % ( x , y )
 
E

Edward Dolan

My question is, how can I use regular expressions to find two OR three
or even an arbitrary number of floats without repeating %s?  Is this
possible?

Thanks,
Jeremy

Any time you have tabular data such as your example, split() is
generally the first choice. But since you asked, and I like fscking
with regular expressions...

import re

# I modified your data set just a bit to show that it will
# match zero or more space separated real numbers.

data =
"""
1.0000E-08

1.0000E-08 1.58024E-06 0.0048 1.0000E-08 1.58024E-06
0.0048
1.0000E-07 2.98403E-05
0.0018
foo bar
baaz
1.0000E-06 8.85470E-06
0.0026
1.0000E-05 6.08120E-06
0.0032
1.0000E-03 1.61817E-05
0.0022
1.0000E+00 8.34460E-05
0.0014
2.0000E+00 2.31616E-05
0.0017
5.0000E+00 2.42717E-05
0.0017
total 1.93417E-04
0.0012
"""

ntuple = re.compile
(r"""
# match beginning of line (re.M in the
docs)
^
# chew up anything before the first real (non-greedy -
..*?
# named match (turn the match into a named atom while allowing
irrelevant (groups))
(?
P<ntuple>
# match one
real
[-+]?(\d*\.\d+|\d+\.\d*)([eE][-+]?\d
+)?
# followed by zero or more space separated
reals
([ \t]+[-+]?(\d*\.\d+|\d+\.\d*)([eE][-+]?\d+)?)
*)
# match end of line (re.M in the
docs)
$
""", re.X | re.M) # re.X to allow comments and arbitrary
whitespace

print [tuple(mo.group('ntuple').split())
for mo in re.finditer(ntuple, data)]

Now compare the previous post using split with this one. Even with the
comments in the re, it's still a bit difficult to read. Regular
expressions
are brittle. My code works fine for the data above but if you change
the
structure the re will probably fail. At that point, you have to fiddle
with
the re to get it back on course.

Don't get me wrong, regular expressions are hella fun to play with.
You have
to ask yourself, "Do I really _need_ to use a regular expression here?"
 
J

Jeremy

My question is, how can I use regular expressions to find two OR three
or even an arbitrary number of floats without repeating %s?  Is this
possible?
Thanks,
Jeremy

Any time you have tabular data such as your example, split() is
generally the first choice. But since you asked, and I like fscking
with regular expressions...

import re

# I modified your data set just a bit to show that it will
# match zero or more space separated real numbers.

data =
"""
1.0000E-08

1.0000E-08 1.58024E-06 0.0048 1.0000E-08 1.58024E-06
0.0048
1.0000E-07 2.98403E-05
0.0018
foo bar
baaz
1.0000E-06 8.85470E-06
0.0026
1.0000E-05 6.08120E-06
0.0032
1.0000E-03 1.61817E-05
0.0022
1.0000E+00 8.34460E-05
0.0014
2.0000E+00 2.31616E-05
0.0017
5.0000E+00 2.42717E-05
0.0017
total 1.93417E-04
0.0012
"""

ntuple = re.compile
(r"""
# match beginning of line (re.M in the
docs)
^
# chew up anything before the first real (non-greedy -> ?)

.*?
# named match (turn the match into a named atom while allowing
irrelevant (groups))
(?
P<ntuple>
  # match one
real
  [-+]?(\d*\.\d+|\d+\.\d*)([eE][-+]?\d
+)?
  # followed by zero or more space separated
reals
  ([ \t]+[-+]?(\d*\.\d+|\d+\.\d*)([eE][-+]?\d+)?)
*)
# match end of line (re.M in the
docs)
$
""", re.X | re.M) # re.X to allow comments and arbitrary
whitespace

print [tuple(mo.group('ntuple').split())
       for mo in re.finditer(ntuple, data)]

Now compare the previous post using split with this one. Even with the
comments in the re, it's still a bit difficult to read. Regular
expressions
are brittle. My code works fine for the data above but if you change
the
structure the re will probably fail. At that point, you have to fiddle
with
the re to get it back on course.

Don't get me wrong, regular expressions are hella fun to play with.
You have
to ask yourself, "Do I really _need_ to use a regular expression here?"

In this simplified example I don't really need regular expressions.
However I will need regular expressions for more complex problems and
I'm trying to become more proficient at using regular expressions. I
tried to simplify this so as not to bother the mailing list too much.

Thanks for the great suggestion. It looks like it will work fine, but
I can't get it to work. I downloaded the simple script you put on
http://codepad.org/Z7eWBusl but it only prints an empty list. Am I
missing something?

Thanks,
Jeremy
 
E

Edward Dolan

No, you're not missing a thing. I am ;) Something was happening with
the triple-quoted
strings when I pasted them. Here is hopefully, the correct code.
http://codepad.org/OIazr9lA
The output is shown on that page as well.

Sorry for the line noise folks. One of these days I'm going to learn
gnus.
 
J

Jeremy

No, you're not missing a thing. I am ;) Something was happening with
the triple-quoted
strings when I pasted them. Here is hopefully, the correct code.http://codepad.org/OIazr9lA
The output is shown on that page as well.

Sorry for the line noise folks. One of these days I'm going to learn
gnus.

Yep now that works. Thanks for the help.
Jeremy
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,969
Messages
2,570,161
Members
46,705
Latest member
Stefkari24

Latest Threads

Top