Dave said:
So I'm trying to write a CSS preprocessor.
I want to add the ability to append a selector onto other selectors.
So, given the following code:
=========================================
#selector {
{ property: value; property: value; }
.other_selector { property: value; property: value; }
#selector_2 {
.more_selector { property: value; }
}
}
=========================================
I want to return the following:
=========================================
#selector { property: value; property: value; }
#selector .other_selector { property: value; property: value; }
#selector #selector_2 .more_selector { property: value; }
=========================================
Dave -
Since other posters have suggested parsing, here is a pyparsing stab at your
problem. Pyparsing allows you to construct your grammar using readable
construct names, and can generate structured parse results. Pyparsing also
has built-in support for skipping over comments.
This paper describes a prior use of pyparsing to parse CSS style sheets:
http://dyomedea.com/papers/2004-extreme/paper.pdf. Google for "pyparsing
CSS" for some other possible references.
This was really more complex than I expected. The grammar was not
difficult, but the recursive routine was trickier than I thought it would
be. Hope this helps.
Download pyparsing at
http://pyparsing.sourceforge.net.
-- Paul
=========================
data = """
#selector {
{ property: value; /* a nasty comment */
property: value; }
.other_selector { property: value; property: value; }
#selector_2 {
/* another nasty comment */
.more_selector { property: value; /* still another nasty
comment */ }
}
}
"""
from pyparsing import Literal,Word,Combine,Group,alphas,nums,alphanums,\
Forward,ZeroOrMore,cStyleComment,ParseResults
# define some basic symbols - suppress grouping and delimiting punctuation
# and let grouping do the rest
lbrace = Literal("{").suppress()
rbrace = Literal("}").suppress()
colon = Literal(":").suppress()
semi = Literal(";").suppress()
pound = Literal("#")
dot = Literal(".")
# define identifiers, property pattern, valid property values, and property
list
ident = Word(alphas,alphanums+"_")
pound_ident = Combine(pound + ident)
dot_ident = Combine(dot + ident)
prop_value = Word(nums) | Word(alphanums) # expand this as needed
property_def = Group( ident + colon + prop_value + semi )
prop_list = Group( lbrace + ZeroOrMore( property_def ) +
rbrace ).setResultsName("propList")
# define selector - must use Forward since selector is recursive
selector = Forward()
selector_contents = (prop_list) | Group( dot_ident.setResultsName("name") +
prop_list ) | selector
selector << Group( pound_ident.setResultsName("name") +
lbrace +
Group(ZeroOrMore(
selector_contents )).setResultsName("contents") +
rbrace )
# C-style comments should be ignored
selector.ignore(cStyleComment)
# parse the data - this only works if data *only* contains a single selector
results = selector.parseString(data)
# use pprint to display list - you can navigate the results to construct the
various selectors
import pprint
pprint.pprint( results[0].asList() )
print
# if scanning through text containing other text than just selectors,
# use scanString, which returns a generator, yielding a tuple
# for each occurrence found
#
# for results,start,end in selector.scanString(cssSourceText):
# pprint.pprint(results.asList())
# a recursive function to print out the names and property lists
def printSelector(res,namePath=[]):
if res.name != "":
subpath = namePath + [res.name]
if res.contents != "":
for c in res.contents:
printSelector(c, subpath)
elif res.propList != "":
print " ".join(subpath),"{", " ".join([ "%s : %s;" % tuple(p)
for p in res.propList ]),"}"
else:
print " ".join(subpath),"{", " ".join([ "%s : %s;" % tuple(r)
for r in res ]),"}"
else:
print " ".join(namePath),"{", " ".join([ "%s : %s;" % tuple(r) for r
in res]),"}"
printSelector( results[0] )
=========================
This prints:
['#selector',
[[['property', 'value'], ['property', 'value']],
['.other_selector', [['property', 'value'], ['property', 'value']]],
['#selector_2', [['.more_selector', [['property', 'value']]]]]]]
#selector { property : value; property : value; }
#selector .other_selector { property : value; property : value; }
#selector #selector_2 .more_selector { property : value; }