P
pyscottishguy
Hi,
Using pyparser, I'm trying to parse a string like this:
:Start: first SECOND THIRD :SECOND: second1 | second2 :THIRD: third1 |
FOURTH :FOURTH: fourth1 | fourth2
I want the parser to do the following:
1) Get the text for the :Start: label e.g ('first SECOND THIRD')
2) Do nothing with the lower-case words e.g ('first')
3) For each upper-case word find the corresponding entries, and
replace the word
with these entries (the '|' indicates separate records)
e.g. for 'SECOND', replace the word with ("second1", "second2")
4 Do this recursively, because each item in '3' can have upper-case
words
I can do this - but not within pyparser. I had to write a recursive
function to do it. I would like to do it within pyparser however.
I'm pretty sure I have to use the Forward() function along with a few
setResultsName() - but after reading the documentation, many examples,
and trying for hours, I'm still totally lost. Please help!
Here is the program I have so far:
#!/usr/bin/python
from pyparsing import Word, Optional, OneOrMore, Group, alphas,
alphanums, Suppress, Dict
import string
def allIn( as, members ):
"Tests that all elements of as are in members"""
for a in as:
if a not in members:
return False
return True
def allUpper( as ):
"""Tests that all strings in as are uppercase"""
return allIn( as, string.uppercase )
def getItems(myArray, myDict):
"""Recursively get the items for each CAPITAL word"""
myElements=[]
for element in myArray:
myWords=[]
for word in element:
if allUpper(word):
items = getItems(myDict[word], myDict)
myWords.append(items)
else:
myWords.append(word)
myElements.append(myWords)
return myElements
testData = """
:Start: first SECOND THIRD fourth FIFTH
:SECOND: second1_1 second1_2 | second2 | second3
:THIRD: third1 third2 | SIXTH
:FIFTH: fifth1 | SEVENTH
:SIXTH: sixth1_1 sixth1_2 | sixth2
:SEVENTH: EIGHTH | seventh1
:EIGHTH: eighth1 | eighth2
"""
label = Suppress(":") + Word(alphas + "_") + Suppress(":")
words = Group(OneOrMore(Word(alphanums + "_"))) +
Suppress(Optional("|"))
data = ~label + OneOrMore(words)
line = Group(label + data)
doc = Dict(OneOrMore(line))
res = doc.parseString(testData)
# This prints out what pyparser gives us
for line in res:
print line
print
print
startString = res["Start"]
items = getItems([startString], res)[0]
# This prints out what we want
for line in items:
print line
Using pyparser, I'm trying to parse a string like this:
:Start: first SECOND THIRD :SECOND: second1 | second2 :THIRD: third1 |
FOURTH :FOURTH: fourth1 | fourth2
I want the parser to do the following:
1) Get the text for the :Start: label e.g ('first SECOND THIRD')
2) Do nothing with the lower-case words e.g ('first')
3) For each upper-case word find the corresponding entries, and
replace the word
with these entries (the '|' indicates separate records)
e.g. for 'SECOND', replace the word with ("second1", "second2")
4 Do this recursively, because each item in '3' can have upper-case
words
I can do this - but not within pyparser. I had to write a recursive
function to do it. I would like to do it within pyparser however.
I'm pretty sure I have to use the Forward() function along with a few
setResultsName() - but after reading the documentation, many examples,
and trying for hours, I'm still totally lost. Please help!
Here is the program I have so far:
#!/usr/bin/python
from pyparsing import Word, Optional, OneOrMore, Group, alphas,
alphanums, Suppress, Dict
import string
def allIn( as, members ):
"Tests that all elements of as are in members"""
for a in as:
if a not in members:
return False
return True
def allUpper( as ):
"""Tests that all strings in as are uppercase"""
return allIn( as, string.uppercase )
def getItems(myArray, myDict):
"""Recursively get the items for each CAPITAL word"""
myElements=[]
for element in myArray:
myWords=[]
for word in element:
if allUpper(word):
items = getItems(myDict[word], myDict)
myWords.append(items)
else:
myWords.append(word)
myElements.append(myWords)
return myElements
testData = """
:Start: first SECOND THIRD fourth FIFTH
:SECOND: second1_1 second1_2 | second2 | second3
:THIRD: third1 third2 | SIXTH
:FIFTH: fifth1 | SEVENTH
:SIXTH: sixth1_1 sixth1_2 | sixth2
:SEVENTH: EIGHTH | seventh1
:EIGHTH: eighth1 | eighth2
"""
label = Suppress(":") + Word(alphas + "_") + Suppress(":")
words = Group(OneOrMore(Word(alphanums + "_"))) +
Suppress(Optional("|"))
data = ~label + OneOrMore(words)
line = Group(label + data)
doc = Dict(OneOrMore(line))
res = doc.parseString(testData)
# This prints out what pyparser gives us
for line in res:
print line
startString = res["Start"]
items = getItems([startString], res)[0]
# This prints out what we want
for line in items:
print line