How do I parse this ? regexp ?

serpent17 · Apr 27, 2005

Hello all,

I have this line of numbers:

04242005 18:20:42-0.000002, 271.1748608, [-4.119873046875,
3.4332275390625, 105.062255859375], [0.093780517578125, 0.041015625,
-0.960662841796875], [0.01556396484375, 0.01220703125,
0.01068115234375]

repeated several times in a text file and I would like each element to
be part of a vector. how do I do this ? I am not very capable in using
regexp as you can see.

Thanks in advance,

Jake.

Jorge Godoy · Apr 27, 2005

[email protected] said:
Hello all,

I have this line of numbers:

04242005 18:20:42-0.000002, 271.1748608, [-4.119873046875,
3.4332275390625, 105.062255859375], [0.093780517578125, 0.041015625,
-0.960662841796875], [0.01556396484375, 0.01220703125,
0.01068115234375]

repeated several times in a text file and I would like each element to
be part of a vector. how do I do this ? I am not very capable in using
regexp as you can see.

You don't need a regexp to do that.

Use the split string method. It will split on spaces by default. If you want
to keep the values inside "[]" together, remove the spaces before splitting or
split on the "[" char first and then split the first item using spaces as a
separator.

Be seeing you,

serpent17 · Apr 27, 2005

Hello,

I am not understanding your answer, but I probably asked the wrong
question

I want to remove the commas, and square brackets [ and ] characters and
rewrite this whole line (and all the ones following in a text file
where only space would be a delimiter. How do I do this ?

I have tried this:

f = open(name3,'r')
r = r"\d+\.\d*"
for line in f:
cols = line.split()
data1 = re.findall(r,line)

and then I don't know what to do with either cols nor data1

Jake.

Jeremy Bowers · Apr 27, 2005

Hello all,

I have this line of numbers:

04242005 18:20:42-0.000002, 271.1748608, [-4.119873046875,
3.4332275390625, 105.062255859375], [0.093780517578125, 0.041015625,
-0.960662841796875], [0.01556396484375, 0.01220703125, 0.01068115234375]

repeated several times in a text file and I would like each element to be
part of a vector. how do I do this ? I am not very capable in using regexp
as you can see.

I think, based on the responses you've gotten so far, that perhaps you
aren't being clear enough.

Some starter questions:

* Is that all on one line in your file?
* Are there ever variable numbers of the [] fields?
* What do you mean by "vectors"?

If the line format is stable (no variation in numbers), and especially if
that is all one line, given that you are not familiar with regexp I
wouldn't muck about with it. (For me, I'd still say it's borderline if I
would go with that.) Instead, follow along in the following and it'll
probably help, though as I don't precisely know what you're asking I can't
give a complete solution:

Python 2.3.5 (#1, Mar 3 2005, 17:32:12)
[GCC 3.4.3 (Gentoo Linux 3.4.3, ssp-3.4.3-0, pie-8.7.6.6)] on linux2
Type "help", "copyright", "credits" or "license" for more information.

x = "04242005 18:20:42-0.000002, 271.1748608, [-4.119873046875, 3.4332275390

Click to expand...

Click to expand...

625, 105.062255859375], [0.093780517578125, 0.041015625, -0.960662841796875], [0
..01556396484375, 0.01220703125, 0.01068115234375]"['04242005 18:20:42-0.000002', ' 271.1748608', ' [-4.119873046875, 3.43322753906
25, 105.062255859375], [0.093780517578125, 0.041015625, -0.960662841796875], [0.
01556396484375, 0.01220703125, 0.01068115234375]']

splitted = x.split(',', 2)
splitted[2]

Click to expand...

Click to expand...

' [-4.119873046875, 3.4332275390625, 105.062255859375], [0.093780517578125, 0.04
1015625, -0.960662841796875], [0.01556396484375, 0.01220703125, 0.01068115234375
]'

import re
safetyChecker = re.compile(r"^[-\[\]0-9,. ]*$")
if safetyChecker.match(splitted[2]):

Click to expand...

Click to expand...

.... eval(splitted[2], {}, {})
....
([-4.119873046875, 3.4332275390625, 105.062255859375], [0.093780517578125,
0.041015625, -0.960662841796875], [0.01556396484375, 0.01220703125,
0.01068115234375])

splitted[0].split() ['04242005', '18:20:42-0.000002']
splitted[0].split()[1].split('-') ['18:20:42', '0.000002']

Click to expand...

Click to expand...

I'd like to STRONGLY EMPHASIZE that there is danger in using "eval" as it
is very dangerous if you can't trust the source; *any* python code will
be run. That is why I am extra paranoid and double-check that the
expression only has the characters listed in that simple regex in it.
(Anyone who can construct a malicious string out of those characters will
get my sincere admiration.) You may do as you please, of course, but I
believe it is not helpful to suggest security holes on comp.lang.python

The coincidence of that part of your data, which is also the most
challenging to parse, exactly matching Python syntax is too much to pass
up.

This should give you some good ideas; if you post more detailed questions
we can probably be of more help.

Paul McGuire · Apr 28, 2005

Jake -

If regexp's give you pause, here is a pyparsing version that, while
verbose, is fairly straightforward. I made some guesses at what some
of the data fields might be, but that doesn't matter much.

Note the use of setResultsName() to give different parse fragments
names so that they are directly addressable in the results, instead of
having to count out "the 0'th group is the date, the 1'st group is the
time...". Also, there is a commented-out conversion action, to
automatically convert strings to floats during parsing.

Download pyparsing at http://pyparsing.sourceforge.net.

Good luck,
-- Paul

data = """04242005 18:20:42-0.000002, 271.1748608, [-4.119873046875,
3.4332275390625, 105.062255859375], [0.093780517578125, 0.041015625,
-0.960662841796875], [0.01556396484375, 0.01220703125,
0.01068115234375]"""

from pyparsing import *

COMMA = Literal(",").suppress()
LBRACK = Literal("[").suppress()
RBRACK = Literal("]").suppress()

# define a two-digit integer, we'll need a lot of them
int2 = Word(nums,exact=2)
month = int2
day = int2
yr = Combine("20" + int2)
date = Combine(month + day + yr)

hr = int2
min = int2
sec = int2
tz = oneOf("+ -") + Word(nums) + "." + Word(nums)
time = Combine( hr + ":" + min + ":" + sec + tz )

realNum = Combine( Optional("-") + Word(nums) + "." + Word(nums) )
# uncomment the next line and reals will be converted from strings to
floats during parsing
#realNum.setParseAction( lambda s,l,t: float(t[0]) )

triplet = Group( LBRACK + realNum + COMMA + realNum + COMMA + realNum +
RBRACK )
entry = Group( date.setResultsName("date") +
time.setResultsName("time") + COMMA +
realNum.setResultsName("temp") + COMMA +
Group( triplet + COMMA + triplet + COMMA + triplet
).setResultsName("coords") )

dataFormat = OneOrMore(entry)
results = dataFormat.parseString(data)

for d in results:
print d.date
print d.time
print d.temp
print d.coords[0].asList()
print d.coords[1].asList()
print d.coords[2].asList()

returns:

04242005
18:20:42-0.000002
271.1748608
['-4.119873046875', '3.4332275390625', '105.062255859375']
['0.093780517578125', '0.041015625', '-0.960662841796875']
['0.01556396484375', '0.01220703125', '0.01068115234375']

Simon Dahlbacka · Apr 28, 2005

safetyChecker = re.compile(r"^[-\[\]0-9,. ]*$")

Click to expand...

Click to expand...

...doesn't the dot (.) in your character class mean that you are allowing
EVERYTHING (except newline?)

(you would probably want \. instead)

/Simon

Peter Hansen · Apr 29, 2005

Simon said:
safetyChecker = re.compile(r"^[-\[\]0-9,. ]*$")

Click to expand...

Click to expand...

..doesn't the dot (.) in your character class mean that you are allowing
EVERYTHING (except newline?)

The re docs clearly say this is not the case:

'''
[]
Used to indicate a set of characters. Characters can be listed
individually, or a range of characters can be indicated by giving two
characters and separating them by a "-". Special characters are not
active inside sets.
'''

Note the last sentence in the above quotation...

-Peter

Jeremy Bowers · Apr 29, 2005

The re docs clearly say this is not the case:

'''
[]
Used to indicate a set of characters. Characters can be listed
individually, or a range of characters can be indicated by giving two
characters and separating them by a "-". Special characters are not active
inside sets.
'''

Note the last sentence in the above quotation...

-Peter

Aren't regexes /fun/?

Also from that passage, Simon, note the "-" right in front of
[-\[\]0-9,. ], another one that's tripped me up more than once.

Wheeee!

"Some people, when confronted with a problem, think ``I know, I'll use
regular expressions.'' Now they have two problems." - jwz
http://www.jwz.org/hacks/marginal.html

How do I make this craftinfsystem Work	1	Feb 9, 2023
How do I fix this issue in sqaurespace code block?	1	Jul 2, 2024
How can i do this?	5	Jan 9, 2022
How do I open PLINK with R studio	0	Nov 16, 2023
How can I parse this correctly?	0	Apr 6, 2014
How do I install a loader?	1	Sep 20, 2024
How do i Do this function(dealing with arrays)	1	Dec 10, 2021
How do I position these parts?	4	Jan 6, 2024

How do I parse this ? regexp ?

serpent17

Jorge Godoy

serpent17

Jeremy Bowers

Paul McGuire

Simon Dahlbacka

Peter Hansen

Jeremy Bowers

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads