J
Jocknerd
I'm a Python newbie and I'm having trouble with Regular Expressions when
reading in a text file. Here is a sample layout of the input file:
09/04/2004 Virginia 44 Temple 14
09/04/2004 LSU 22 Oregon State 21
09/09/2004 Troy State 24 Missouri 14
As you can see, the text file contains a list of games. Each game has a
date, a winning team, the winning team's score, the losing team, and the
losing team's score. If I set up my program to import the data with fixed
length format's its no problem. But some of my text files have different
layouts. For instance, some only have one space between a team name and
their score.
Here's how I read in the file using fixed length fields:
filename = sys.argv[1]
file = open (filename, 'r')
schedule = [] # make a list called schedule
while True:
line = file.readline()
if not line: break
game = {} # make a dictionary called game
game['date'] = line[0:10] # fixed length field
game['team1'] = string.strip (line[12:40])
game['score1'] = line[40:42]
game['team2'] = string.strip (line[44:72])
game['score2'] = line[72:74]
schedule.append(game)
file.close()
Note: I'm stripping whitespace from the team names because I don't want
the team name to actually be a fixed length.
How would I set this up to read in the data using Regular expressions?
I've tried this:
while True:
line = file.readline ()
if not line: break
game = {}
datePattern = re.compile('^(\d{2})\D+(\d{2})\D+(\d{4})')
Here's where I get stuck. What do I do from here? I just don't know how
to import the text and assign it to the proper fields using the re module.
reading in a text file. Here is a sample layout of the input file:
09/04/2004 Virginia 44 Temple 14
09/04/2004 LSU 22 Oregon State 21
09/09/2004 Troy State 24 Missouri 14
As you can see, the text file contains a list of games. Each game has a
date, a winning team, the winning team's score, the losing team, and the
losing team's score. If I set up my program to import the data with fixed
length format's its no problem. But some of my text files have different
layouts. For instance, some only have one space between a team name and
their score.
Here's how I read in the file using fixed length fields:
filename = sys.argv[1]
file = open (filename, 'r')
schedule = [] # make a list called schedule
while True:
line = file.readline()
if not line: break
game = {} # make a dictionary called game
game['date'] = line[0:10] # fixed length field
game['team1'] = string.strip (line[12:40])
game['score1'] = line[40:42]
game['team2'] = string.strip (line[44:72])
game['score2'] = line[72:74]
schedule.append(game)
file.close()
Note: I'm stripping whitespace from the team names because I don't want
the team name to actually be a fixed length.
How would I set this up to read in the data using Regular expressions?
I've tried this:
while True:
line = file.readline ()
if not line: break
game = {}
datePattern = re.compile('^(\d{2})\D+(\d{2})\D+(\d{4})')
Here's where I get stuck. What do I do from here? I just don't know how
to import the text and assign it to the proper fields using the re module.