Python Regex Question

M

MalteseUnderdog

Hi there I just started python (but this question isn't that trivial
since I couldn't find it in google :) )

I have the following text file entries (simplified)

start #frag 1 start
x=Dog # frag 1 end
stop
start # frag 2 start
x=Cat # frag 2 end
stop
start #frag 3 start
x=Dog #frag 3 end
stop
.....

I need a regex expression which returns the start to the x=ANIMAL for
only the x=Dog fragments so all my entries should be start ...
(something here) ... x=Dog . So I am really interested in fragments 1
and 3 only.

My idea (primitive) ^start.*?x=Dog doesn't work because clearly it
would return results

start
x=Dog # (good)

and

start
x=Cat
stop
start
x=Dog # bad since I only want start ... x=Dog portion

Can you help me ?

Thanks
JP, Malta.
 
T

Tim Chase

I need a regex expression which returns the start to the x=ANIMAL for
only the x=Dog fragments so all my entries should be start ...
(something here) ... x=Dog . So I am really interested in fragments 1
and 3 only.

My idea (primitive) ^start.*?x=Dog doesn't work because clearly it
would return results

start
x=Dog # (good)

and

start
x=Cat
stop
start
x=Dog # bad since I only want start ... x=Dog portion

Looks like the following does the trick:
.... x=Dog # frag 1 end
.... stop
.... start # frag 2 start
.... x=Cat # frag 2 end
.... stop
.... start #frag 3 start
.... x=Dog #frag 3 end
.... stop""".... print i, repr(result)
....
0 'start #frag 1 start\nx=Dog # frag 1 end\nstop'
1 'start #frag 3 start\nx=Dog #frag 3 end\nstop'

-tkc
 
A

Arnaud Delobelle

Looks like the following does the trick:

 >>> s = """start      #frag 1 start
... x=Dog # frag 1 end
... stop
... start    # frag 2 start
... x=Cat # frag 2 end
... stop
... start     #frag 3 start
... x=Dog #frag 3 end
... stop"""
 >>> import re
 >>> r = re.compile(r'^start.*\nx=Dog.*\nstop.*', re.MULTILINE)
 >>> for i, result in enumerate(r.findall(s)):
...     print i, repr(result)
...
0 'start      #frag 1 start\nx=Dog # frag 1 end\nstop'
1 'start     #frag 3 start\nx=Dog #frag 3 end\nstop'

-tkc

This will only work if 'x=Dog' directly follows 'start' (which happens
in the given example). If that's not necessarily the case, I would do
it in two steps (in fact I wouldn't use regexps probably but...):
.... m = re.search('^start.*^x=Dog', chunk, re.DOTALL |
re.MULTILINE)
.... if m: print repr(m.group())
....
'start #frag 1 start \nx=Dog'
'start #frag 3 start \nx=Dog'
 
T

Terry Reedy

MalteseUnderdog said:
Hi there I just started python (but this question isn't that trivial
since I couldn't find it in google :) )

I have the following text file entries (simplified)

start #frag 1 start
x=Dog # frag 1 end
stop
start # frag 2 start
x=Cat # frag 2 end
stop
start #frag 3 start
x=Dog #frag 3 end
stop
....

I need a regex expression which returns the start to the x=ANIMAL for
only the x=Dog fragments so all my entries should be start ...
(something here) ... x=Dog . So I am really interested in fragments 1
and 3 only.

As I understand the above....
I would first write a generator that separates the file into fragments
and yields them one at a time. Perhaps something like

def fragments(ifile):
frag = []
for line in ifile:
frag += line
if <line ends fragment>:
yield frag
frag = []

Then I would iterate through fragments, testing for the ones I want:

for frag in fragments(somefile):
if 'x=Dog' in frag:
<do whatever>

Terry Jan Reedy
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,982
Messages
2,570,185
Members
46,738
Latest member
JinaMacvit

Latest Threads

Top