Is possible to combine handle_data and regular expressions?

P

ProvoWallis

Hi,

I've experimented with regular expressions to solve my problems in the
past but I have seen so many comments about HTMLParser and sgmllib that
I thought I would try a different approach this time so I tried using
HTMLParser.

I want to search through my SGML file for various strings of text and
find out what section they're in. What I have here does this to a
certain extent but I was wondering if I could make handle_data and
regular expressions work together to make this work a little better.

For instance, when I search for "above" as I am here, I just get
something like this: '174.114[1]':'above' but this isn't very useful
b/c I want to know the context of above (i.e., the informaiton on
either side the above) and maybe even us a regular expression to filter
the search a little more.

Any ideas?

As always, I'd appreciate feedback on my efforts.

Thanks,

Greg

###

from HTMLParser import HTMLParser
import os, re
root = raw_input("Enter the path where the program should run: ")
fname = raw_input("Enter name of the file: ")
print


given,ext = os.path.splitext(fname)

inputFile = open(os.path.join(root,fname), 'r')

data = inputFile.read()

class PartFinder(HTMLParser):

_full = None
_secDict = dict()

def found(self):
return self._secDict

def handle_starttag(self, tag, attrs):
if tag == "sec-main":
self._main = dict(attrs).get('no')
self._full = self._main

if tag == "sec-sub1":
self._subone = dict(attrs).get('no')
self._full = self._main + '[' + self._subone + ']'

if tag == "sec-sub2":
self._subtwo = dict(attrs).get('no')
self._full = self._main + '[' + self._subone + ']' + '['
+ self._subtwo + ']'


def handle_data(self, data):
if "Pt" in data:
if not self._secDict.has_key(self._main):
self._secDict[self._full] = [data]
print self._secDict



if __name__ == "__main__":
parser = PartFinder()
parser.feed(data)
x = parser.found()

output_part = given + '.parts'
outputFile = file(os.path.join(root,output_part), 'w')
outputFile.write(str(x))
outputFile.close()
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,992
Messages
2,570,220
Members
46,807
Latest member
ryef

Latest Threads

Top