Ernesto said:
I'm still fairly new to python, so I need some guidance here...
I have a text file with lots of data. I only need some of the data. I
want to put the useful data into an [array of] struct-like
mechanism(s). The text file looks something like this:
[BUNCH OF NOT-USEFUL DATA....]
Name: David
Age: 108 Birthday: 061095 SocialSecurity: 476892771999
[MORE USELESS DATA....]
Name........
I would like to have an array of "structs." Each struct has
struct Person{
string Name;
int Age;
int Birhtday;
int SS;
}
I want to go through the file, filling up my list of structs.
My problems are:
1. How to search for the keywords "Name:", "Age:", etc. in the file...
2. How to implement some organized "list of lists" for the data
structure.
Any help is much appreciated.
Ernesto -
Since you are searching for keywords and matching fields, and trying to
populate data structures as you go, this sounds like a good fit for
pyparsing. Pyparsing as built-in features for scanning through text and
extracting data, with suitably named data fields for accessing later.
Download pyparsing at
http://pyparsing.sourceforge.net.
-- Paul
------------------------------------------------
from pyparsing import *
inputData = """[BUNCH OF NOT-USEFUL DATA....]
Name: David
Age: 108 Birthday: 061095 SocialSecurity: 476892771999
[MORE USELESS DATA....]
Name: Fred
Age: 101 Birthday: 061065 SocialSecurity: 587903882000
[MORE USELESS DATA....]
Name: Barney
Age: 99 Birthday: 061265 SocialSecurity: 698014993111
[MORE USELESS DATA....]
"""
dob = Word(nums,exact=6)
# this matches your sample data, but I think SSN's are only 9 digits long
socsecnum = Word(nums,exact=12)
# define the personalData pattern - use results names to associate
# field names with matched tokens, can then access data as if they were
# attributes on an object
personalData = ( "Name:" + empty + restOfLine.setResultsName("Name") +
"Age:" + Word(nums).setResultsName("Age") +
"Birthday:" + dob.setResultsName("Birthday") +
"SocialSecurity:" + socsecnum.setResultsName("SS") )
# use personData.scanString to scan through the input, returning the
matching
# tokens, and their respective start/end locations in the string
for person,s,e in personalData.scanString(inputData):
print "Name:", person.Name
print "Age:", person.Age
print "DOB:", person.Birthday
print "SSN:", person.SS
print
# or use a list comp to scan the whole file, and return your Person data,
giving you
# your requested array of "structs" - not really structs, but ParseResults
objects
persons = [person for person,s,e in personalData.scanString(inputData)]
# or convert to Python dict's, which some people prefer to pyparsing's
ParseResults
persons = [dict(p) for p,s,e in personalData.scanString(inputData)]
print persons[0]
print
# or create an array of Person objects, as suggested in previous postings
class Person(object):
def __init__(self,parseResults):
self.__dict__.update(dict(parseResults))
def __str__(self):
return "Person(%s, %s, %s, %s)" %
(self.Name,self.Age,self.Birthday,self.SS)
persons = [Person(p) for p,s,e in personalData.scanString(inputData)]
for p in persons:
print p.Name,"->",p
--------------------------------------
prints out:
Name: David
Age: 108
DOB: 061095
SSN: 476892771999
Name: Fred
Age: 101
DOB: 061065
SSN: 587903882000
Name: Barney
Age: 99
DOB: 061265
SSN: 698014993111
{'SS': '476892771999', 'Age': '108', 'Birthday': '061095', 'Name': 'David'}
David -> Person(David, 108, 061095, 476892771999)
Fred -> Person(Fred, 101, 061065, 587903882000)
Barney -> Person(Barney, 99, 061265, 698014993111)