How do I parse this ? regexp ?

S

serpent17

Hello all,

I have this line of numbers:


04242005 18:20:42-0.000002, 271.1748608, [-4.119873046875,
3.4332275390625, 105.062255859375], [0.093780517578125, 0.041015625,
-0.960662841796875], [0.01556396484375, 0.01220703125,
0.01068115234375]


repeated several times in a text file and I would like each element to
be part of a vector. how do I do this ? I am not very capable in using
regexp as you can see.


Thanks in advance,


Jake.
 
J

Jorge Godoy

Hello all,

I have this line of numbers:


04242005 18:20:42-0.000002, 271.1748608, [-4.119873046875,
3.4332275390625, 105.062255859375], [0.093780517578125, 0.041015625,
-0.960662841796875], [0.01556396484375, 0.01220703125,
0.01068115234375]


repeated several times in a text file and I would like each element to
be part of a vector. how do I do this ? I am not very capable in using
regexp as you can see.

You don't need a regexp to do that.

Use the split string method. It will split on spaces by default. If you want
to keep the values inside "[]" together, remove the spaces before splitting or
split on the "[" char first and then split the first item using spaces as a
separator.


Be seeing you,
 
S

serpent17

Hello,

I am not understanding your answer, but I probably asked the wrong
question :)

I want to remove the commas, and square brackets [ and ] characters and
rewrite this whole line (and all the ones following in a text file
where only space would be a delimiter. How do I do this ?

I have tried this:

f = open(name3,'r')
r = r"\d+\.\d*"
for line in f:
cols = line.split()
data1 = re.findall(r,line)

and then I don't know what to do with either cols nor data1

Jake.
 
J

Jeremy Bowers

Hello all,

I have this line of numbers:


04242005 18:20:42-0.000002, 271.1748608, [-4.119873046875,
3.4332275390625, 105.062255859375], [0.093780517578125, 0.041015625,
-0.960662841796875], [0.01556396484375, 0.01220703125, 0.01068115234375]


repeated several times in a text file and I would like each element to be
part of a vector. how do I do this ? I am not very capable in using regexp
as you can see.

I think, based on the responses you've gotten so far, that perhaps you
aren't being clear enough.

Some starter questions:

* Is that all on one line in your file?
* Are there ever variable numbers of the [] fields?
* What do you mean by "vectors"?

If the line format is stable (no variation in numbers), and especially if
that is all one line, given that you are not familiar with regexp I
wouldn't muck about with it. (For me, I'd still say it's borderline if I
would go with that.) Instead, follow along in the following and it'll
probably help, though as I don't precisely know what you're asking I can't
give a complete solution:

Python 2.3.5 (#1, Mar 3 2005, 17:32:12)
[GCC 3.4.3 (Gentoo Linux 3.4.3, ssp-3.4.3-0, pie-8.7.6.6)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
x = "04242005 18:20:42-0.000002, 271.1748608, [-4.119873046875, 3.4332275390
625, 105.062255859375], [0.093780517578125, 0.041015625, -0.960662841796875], [0
..01556396484375, 0.01220703125, 0.01068115234375]"['04242005 18:20:42-0.000002', ' 271.1748608', ' [-4.119873046875, 3.43322753906
25, 105.062255859375], [0.093780517578125, 0.041015625, -0.960662841796875], [0.
01556396484375, 0.01220703125, 0.01068115234375]']
splitted = x.split(',', 2)
splitted[2]
' [-4.119873046875, 3.4332275390625, 105.062255859375], [0.093780517578125, 0.04
1015625, -0.960662841796875], [0.01556396484375, 0.01220703125, 0.01068115234375
]'
import re
safetyChecker = re.compile(r"^[-\[\]0-9,. ]*$")
if safetyChecker.match(splitted[2]):
.... eval(splitted[2], {}, {})
....
([-4.119873046875, 3.4332275390625, 105.062255859375], [0.093780517578125,
0.041015625, -0.960662841796875], [0.01556396484375, 0.01220703125,
0.01068115234375])
splitted[0].split() ['04242005', '18:20:42-0.000002']
splitted[0].split()[1].split('-') ['18:20:42', '0.000002']


I'd like to STRONGLY EMPHASIZE that there is danger in using "eval" as it
is very dangerous if you can't trust the source; *any* python code will
be run. That is why I am extra paranoid and double-check that the
expression only has the characters listed in that simple regex in it.
(Anyone who can construct a malicious string out of those characters will
get my sincere admiration.) You may do as you please, of course, but I
believe it is not helpful to suggest security holes on comp.lang.python
:) The coincidence of that part of your data, which is also the most
challenging to parse, exactly matching Python syntax is too much to pass
up.

This should give you some good ideas; if you post more detailed questions
we can probably be of more help.
 
P

Paul McGuire

Jake -

If regexp's give you pause, here is a pyparsing version that, while
verbose, is fairly straightforward. I made some guesses at what some
of the data fields might be, but that doesn't matter much.

Note the use of setResultsName() to give different parse fragments
names so that they are directly addressable in the results, instead of
having to count out "the 0'th group is the date, the 1'st group is the
time...". Also, there is a commented-out conversion action, to
automatically convert strings to floats during parsing.

Download pyparsing at http://pyparsing.sourceforge.net.

Good luck,
-- Paul


data = """04242005 18:20:42-0.000002, 271.1748608, [-4.119873046875,
3.4332275390625, 105.062255859375], [0.093780517578125, 0.041015625,
-0.960662841796875], [0.01556396484375, 0.01220703125,
0.01068115234375]"""

from pyparsing import *

COMMA = Literal(",").suppress()
LBRACK = Literal("[").suppress()
RBRACK = Literal("]").suppress()

# define a two-digit integer, we'll need a lot of them
int2 = Word(nums,exact=2)
month = int2
day = int2
yr = Combine("20" + int2)
date = Combine(month + day + yr)

hr = int2
min = int2
sec = int2
tz = oneOf("+ -") + Word(nums) + "." + Word(nums)
time = Combine( hr + ":" + min + ":" + sec + tz )

realNum = Combine( Optional("-") + Word(nums) + "." + Word(nums) )
# uncomment the next line and reals will be converted from strings to
floats during parsing
#realNum.setParseAction( lambda s,l,t: float(t[0]) )

triplet = Group( LBRACK + realNum + COMMA + realNum + COMMA + realNum +
RBRACK )
entry = Group( date.setResultsName("date") +
time.setResultsName("time") + COMMA +
realNum.setResultsName("temp") + COMMA +
Group( triplet + COMMA + triplet + COMMA + triplet
).setResultsName("coords") )

dataFormat = OneOrMore(entry)
results = dataFormat.parseString(data)

for d in results:
print d.date
print d.time
print d.temp
print d.coords[0].asList()
print d.coords[1].asList()
print d.coords[2].asList()

returns:

04242005
18:20:42-0.000002
271.1748608
['-4.119873046875', '3.4332275390625', '105.062255859375']
['0.093780517578125', '0.041015625', '-0.960662841796875']
['0.01556396484375', '0.01220703125', '0.01068115234375']
 
P

Peter Hansen

Simon said:
safetyChecker = re.compile(r"^[-\[\]0-9,. ]*$")

..doesn't the dot (.) in your character class mean that you are allowing
EVERYTHING (except newline?)

The re docs clearly say this is not the case:

'''
[]
Used to indicate a set of characters. Characters can be listed
individually, or a range of characters can be indicated by giving two
characters and separating them by a "-". Special characters are not
active inside sets.
'''

Note the last sentence in the above quotation...

-Peter
 
J

Jeremy Bowers

The re docs clearly say this is not the case:

'''
[]
Used to indicate a set of characters. Characters can be listed
individually, or a range of characters can be indicated by giving two
characters and separating them by a "-". Special characters are not active
inside sets.
'''

Note the last sentence in the above quotation...

-Peter

Aren't regexes /fun/?

Also from that passage, Simon, note the "-" right in front of
[-\[\]0-9,. ], another one that's tripped me up more than once.

Wheeee!

"Some people, when confronted with a problem, think ``I know, I'll use
regular expressions.'' Now they have two problems." - jwz
http://www.jwz.org/hacks/marginal.html
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,237
Messages
2,571,189
Members
47,824
Latest member
MckinleyBu

Latest Threads

Top