Pattern Matching

Greg Lindstrom · Jul 19, 2004

Hello-

I'm running Python 2.2.3 on Windows XP "Professional" and am reading a file
wit 1 very long line of text (the line consists of multiple records with no
cr/lf). What I would like to do is scan for the occurrence of a specific
pattern of characters which I expect to repeat many times in the file.
Suppose I want to search for "Start: mm/dd/yy" and capture the mm/dd/yyyy
data for processing each time I find it. This is the type of problem I used
to solve with <duck>Perl<\duck> in a former lifetime using regular
expressions. The following does not work, but is the flavor of what I want
to do:

long_line_of_text = 'Start: 1/1/2004 and some stuff.~Start: 2/3/2004 stuff.
~Start 5/1/2004 morestuff.~'
while re.match('Start:\ (\D?/\D?/\D+)', long_line_of_text):
# process the date string here which I hoped to catch in the parenthesis
above.

I'd like this to keep matching and processing the string as long as it keeps
matching the pattern, bopping down the string as it goes.

Another way to handle this is to replace all of the tildes with linefeeds
(tildes are the end of segment marker), or split the records on the tilde
and go from there. I'd just like to know how I could do it with the regular
expressions.

Thanks for your help,
--greg

Greg Lindstrom (501) 975-4859
NovaSys Health (e-mail address removed)

"We are the music makers, and we are the dreamers of dreams" W.W.

Christopher T King · Jul 19, 2004

The following does not work, but is the flavor of what I want to do:

long_line_of_text = 'Start: 1/1/2004 and some stuff.~Start: 2/3/2004 stuff.
~Start 5/1/2004 morestuff.~'
while re.match('Start:\ (\D?/\D?/\D+)', long_line_of_text):
# process the date string here which I hoped to catch in the parenthesis
above.

I'd like this to keep matching and processing the string as long as it keeps
matching the pattern, bopping down the string as it goes.

That line tastes distincly Perlish

What you want to write in Python is:

for match in re.finditer('Start:\ (\D?/\D?/\D+)', long_line_of_text):
<do something with match.group(1)>

re.finditer() returns an iterator that loops over all occurances of the
pattern in the string, returning a match object for each one.
match.group() returns the actual text of the match, and match.group(n)
returns the text of group n.

I'm curious, though, why do you escape the space? My guess is it's
something from Perl that I don't remember.

Kristofer Pettijohn · Jul 19, 2004

Greg Lindstrom said:
long_line_of_text = 'Start: 1/1/2004 and some stuff.~Start: 2/3/2004 stuff.
~Start 5/1/2004 morestuff.~'
while re.match('Start:\ (\D?/\D?/\D+)', long_line_of_text):
# process the date string here which I hoped to catch in the parenthesis
above.

I'd like this to keep matching and processing the string as long as it keeps
matching the pattern, bopping down the string as it goes.

p = re.compile(your_pattern_from_above)
matches = p.findall(long_line_of_text)

matches will be a list of your matches caught in the parenthesis

Eddie Corns · Jul 20, 2004

Greg Lindstrom said:
Hello-

I'm running Python 2.2.3 on Windows XP "Professional" and am reading a file
wit 1 very long line of text (the line consists of multiple records with no
cr/lf). What I would like to do is scan for the occurrence of a specific
pattern of characters which I expect to repeat many times in the file.
Suppose I want to search for "Start: mm/dd/yy" and capture the mm/dd/yyyy
data for processing each time I find it. This is the type of problem I used
to solve with <duck>Perl<\duck> in a former lifetime using regular
expressions. The following does not work, but is the flavor of what I want
to do:

long_line_of_text = 'Start: 1/1/2004 and some stuff.~Start: 2/3/2004 stuff.
~Start 5/1/2004 morestuff.~'
while re.match('Start:\ (\D?/\D?/\D+)', long_line_of_text):
# process the date string here which I hoped to catch in the parenthesis
above.

I'd like this to keep matching and processing the string as long as it keeps
matching the pattern, bopping down the string as it goes.

Another way to handle this is to replace all of the tildes with linefeeds
(tildes are the end of segment marker), or split the records on the tilde
and go from there. I'd just like to know how I could do it with the regular
expressions.

In addition to previous answers, a useful resource might be:
http://gnosis.cx/TPiP/

Pattern Matching	0	Jul 19, 2004
Matching Control Characters	1	Nov 1, 2004
mySQL access	2	Aug 31, 2004
Pmw EntryWidget Help	1	Jul 1, 2004
Boa Constructor Problem	5	Jul 16, 2004
Sharing Base Class members	0	Jul 12, 2004
Oracle Access via cx_Oracle	1	Sep 28, 2004
Working with Forms in MS Word	1	Oct 13, 2004

Pattern Matching

Greg Lindstrom

Christopher T King

Kristofer Pettijohn

Eddie Corns

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads