Please help with regular expression finding multiple floats

Jeremy · Oct 22, 2009

I have text that looks like the following (but all in one string with
'\n' separating the lines):

1.0000E-08 1.58024E-06 0.0048
1.0000E-07 2.98403E-05 0.0018
1.0000E-06 8.85470E-06 0.0026
1.0000E-05 6.08120E-06 0.0032
1.0000E-03 1.61817E-05 0.0022
1.0000E+00 8.34460E-05 0.0014
2.0000E+00 2.31616E-05 0.0017
5.0000E+00 2.42717E-05 0.0017
total 1.93417E-04 0.0012

I want to capture the two or three floating point numbers in each line
and store them in a tuple. I want to find all such tuples such that I
have
[('1.0000E-08', '1.58024E-06', '0.0048'),
('1.0000E-07', '2.98403E-05', '0.0018'),
('1.0000E-06', '8.85470E-06', '0.0026'),
('1.0000E-05', '6.08120E-06', '0.0032'),
('1.0000E-03', '1.61817E-05', '0.0022'),
('1.0000E+00', '8.34460E-05', '0.0014'),
('2.0000E+00', '2.31616E-05', '0.0017'),
('5.0000E+00', '2.42717E-05', '0.0017')
('1.93417E-04', '0.0012')]

as a result. I have the regular expression pattern

fp1 = '([-+]?\d*\.?\d+(?:[eE][-+]?\d+)?)\s+'

which can find a floating point number followed by some space. I can
find three floats with

found = re.findall('%s%s%s' %fp1, text)

My question is, how can I use regular expressions to find two OR three
or even an arbitrary number of floats without repeating %s? Is this
possible?

Thanks,
Jeremy

Cousin Stanley · Oct 23, 2009

I have text that looks like the following
(but all in one string with '\n' separating the lines):
....

I want to capture the two or three floating point numbers in each line
and store them in a tuple.
....
I have the regular expression pattern
....

Jeremy ....

For a non-regular-expression solution
you might consider something simlar to
the following ....

s = '''\
1.0000E-08 1.58024E-06 0.0048
1.0000E-07 2.98403E-05 0.0018
1.0000E-06 8.85470E-06 0.0026
1.0000E-05 6.08120E-06 0.0032
1.0000E-03 1.61817E-05 0.0022
1.0000E+00 8.34460E-05 0.0014
2.0000E+00 2.31616E-05 0.0017
5.0000E+00 2.42717E-05 0.0017
total 1.93417E-04 0.0012'''

l1 = s.split( '\n' )

l2 = [ ]

for this_row in l1[ : -1 ] :
temp = this_row.strip().split()
l2.append( [ float( x ) for x in temp ] )

last = l1[ -1 ].strip().split()[ 1 : ]

l2.append( [ float( x ) for x in last ] )

print
for this_row in l2 :
if len( this_row ) > 2 :
x , y , z = this_row
print ' %5.4e %5.4e %5.4e ' % ( x , y , z )
else :
x , y = this_row
print ' %5.4e %5.4e ' % ( x , y )

Edward Dolan · Oct 23, 2009

My question is, how can I use regular expressions to find two OR three
or even an arbitrary number of floats without repeating %s? Is this
possible?

Thanks,
Jeremy

Any time you have tabular data such as your example, split() is
generally the first choice. But since you asked, and I like fscking
with regular expressions...

import re

# I modified your data set just a bit to show that it will
# match zero or more space separated real numbers.

data =
"""
1.0000E-08

1.0000E-08 1.58024E-06 0.0048 1.0000E-08 1.58024E-06
0.0048
1.0000E-07 2.98403E-05
0.0018
foo bar
baaz
1.0000E-06 8.85470E-06
0.0026
1.0000E-05 6.08120E-06
0.0032
1.0000E-03 1.61817E-05
0.0022
1.0000E+00 8.34460E-05
0.0014
2.0000E+00 2.31616E-05
0.0017
5.0000E+00 2.42717E-05
0.0017
total 1.93417E-04
0.0012
"""

ntuple = re.compile
(r"""
# match beginning of line (re.M in the
docs)
^
# chew up anything before the first real (non-greedy -

?)

..*?
# named match (turn the match into a named atom while allowing
irrelevant (groups))
(?
P<ntuple>
# match one
real
[-+]?(\d*\.\d+|\d+\.\d*)([eE][-+]?\d
+)?
# followed by zero or more space separated
reals
([ \t]+[-+]?(\d*\.\d+|\d+\.\d*)([eE][-+]?\d+)?)
*)
# match end of line (re.M in the
docs)
$
""", re.X | re.M) # re.X to allow comments and arbitrary
whitespace

print [tuple(mo.group('ntuple').split())
for mo in re.finditer(ntuple, data)]

Now compare the previous post using split with this one. Even with the
comments in the re, it's still a bit difficult to read. Regular
expressions
are brittle. My code works fine for the data above but if you change
the
structure the re will probably fail. At that point, you have to fiddle
with
the re to get it back on course.

Don't get me wrong, regular expressions are hella fun to play with.
You have
to ask yourself, "Do I really _need_ to use a regular expression here?"

Edward Dolan · Oct 23, 2009

I can see why this line could wrap

1.0000E-08 1.58024E-06 0.0048 1.0000E-08 1.58024E-06
0.0048 But this one?
1.0000E-07 2.98403E-05
0.0018

anyway, here is the code -> http://codepad.org/Z7eWBusl

Jeremy · Oct 23, 2009

My question is, how can I use regular expressions to find two OR three
or even an arbitrary number of floats without repeating %s? Is this
possible?

Click to expand...

Thanks,
Jeremy

Click to expand...

Any time you have tabular data such as your example, split() is
generally the first choice. But since you asked, and I like fscking
with regular expressions...

import re

# I modified your data set just a bit to show that it will
# match zero or more space separated real numbers.

data =
"""
1.0000E-08

1.0000E-08 1.58024E-06 0.0048 1.0000E-08 1.58024E-06
0.0048
1.0000E-07 2.98403E-05
0.0018
foo bar
baaz
1.0000E-06 8.85470E-06
0.0026
1.0000E-05 6.08120E-06
0.0032
1.0000E-03 1.61817E-05
0.0022
1.0000E+00 8.34460E-05
0.0014
2.0000E+00 2.31616E-05
0.0017
5.0000E+00 2.42717E-05
0.0017
total 1.93417E-04
0.0012
"""

ntuple = re.compile
(r"""
# match beginning of line (re.M in the
docs)
^
# chew up anything before the first real (non-greedy -> ?)

.*?
# named match (turn the match into a named atom while allowing
irrelevant (groups))
(?
P<ntuple>
# match one
real
[-+]?(\d*\.\d+|\d+\.\d*)([eE][-+]?\d
+)?
# followed by zero or more space separated
reals
([ \t]+[-+]?(\d*\.\d+|\d+\.\d*)([eE][-+]?\d+)?)
*)
# match end of line (re.M in the
docs)
$
""", re.X | re.M) # re.X to allow comments and arbitrary
whitespace

print [tuple(mo.group('ntuple').split())
for mo in re.finditer(ntuple, data)]

Now compare the previous post using split with this one. Even with the
comments in the re, it's still a bit difficult to read. Regular
expressions
are brittle. My code works fine for the data above but if you change
the
structure the re will probably fail. At that point, you have to fiddle
with
the re to get it back on course.

Don't get me wrong, regular expressions are hella fun to play with.
You have
to ask yourself, "Do I really _need_ to use a regular expression here?"

In this simplified example I don't really need regular expressions.
However I will need regular expressions for more complex problems and
I'm trying to become more proficient at using regular expressions. I
tried to simplify this so as not to bother the mailing list too much.

Thanks for the great suggestion. It looks like it will work fine, but
I can't get it to work. I downloaded the simple script you put on
http://codepad.org/Z7eWBusl but it only prints an empty list. Am I
missing something?

Thanks,
Jeremy

Edward Dolan · Oct 24, 2009

No, you're not missing a thing. I am

Something was happening with
the triple-quoted
strings when I pasted them. Here is hopefully, the correct code.
http://codepad.org/OIazr9lA
The output is shown on that page as well.

Sorry for the line noise folks. One of these days I'm going to learn
gnus.

Jeremy · Oct 26, 2009

No, you're not missing a thing. I am Something was happening with
the triple-quoted
strings when I pasted them. Here is hopefully, the correct code.http://codepad.org/OIazr9lA
The output is shown on that page as well.

Sorry for the line noise folks. One of these days I'm going to learn
gnus.

Yep now that works. Thanks for the help.
Jeremy

Php combine identical lines in text file	4	Oct 11, 2023
help coding a hash table	2	Feb 7, 2012
How to use ufixed when it involves multiplication a number of times?(VHDL question)	0	Aug 22, 2016
Trouble with prediction code, for the life of me I can't figure out why it isnt running properly. Help would be appreciated.	0	Jul 8, 2023
Help: Efficient regular expression	24	Jul 10, 2007
Regular Expression Syntax Help	4	Feb 7, 2006
Sort by number of characters	1	Nov 2, 2023
20GB of Ebooks...a gr8 and rare collection	6	Jan 29, 2007

Please help with regular expression finding multiple floats

Jeremy

Cousin Stanley

Edward Dolan

Edward Dolan

Jeremy

Edward Dolan

Jeremy

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads