regular expressions, stack and nesting

Aaron Brady · Mar 22, 2009

Hi,

Every so often the group gets a request for parsing an expression. I
think it would be significantly easier to do if regular expressions
could modify a stack. However, since you might nearly as well write
Python, maybe there is a compromise.

Could the Secret Labs' regular expression engine be modified to
operate on lists, for example, or a mutable non-string type?

Details (juicy and otherwise):

One of the alternatives is to reconstruct a new string on every match,
removing the expression and replacing it with a tag. (This by the way
takes at least one out-of-band character.) The running time on it
involves constructing a string from at least three parts, maybe five:
the lead, the opening marker, the inside of the match, the closing
marker, and the tail. If it used ropes, it's still constant time, but
is O( string length * number of matches ) with just normal strings.

Another alternative is to create a new unicode object API,
PyUnicode_FROM_DATA, which creates a string object from an existing
buffer, but does not copy it. I expect this would receive -1 from
many people, not least because it breaks immutability of strings.

ctypes character arrays, arrays, and buffer objects are additional
possibilities.

Chris Rebert · Mar 22, 2009

2009/3/22 Aaron Brady said:
Hi,

Every so often the group gets a request for parsing an expression. Â I
think it would be significantly easier to do if regular expressions
could modify a stack. Â However, since you might nearly as well write
Python, maybe there is a compromise.

If you need to parse something of decent complexity, you ought to use
a actual proper parser generator, e.g. PLY, pyparsing, ANTLR, etc.
Abusing regular expressions like that to kludge jury-rigged parsers
together can only lead to pain when special cases and additional
grammar complexity emerge and start breaking the parser in difficult
ways. I'm not seeing the use case for your suggestion.

Cheers,
Chris

Aaron Brady · Mar 22, 2009

If you need to parse something of decent complexity, you ought to use
a actual proper parser generator, e.g. PLY, pyparsing, ANTLR, etc.
Abusing regular expressions like that to kludge jury-rigged parsers
together can only lead to pain when special cases and additional
grammar complexity emerge and start breaking the parser in difficult
ways. I'm not seeing the use case for your suggestion.

Cheers,
Chris

Hey, I don't see the use case either, but that doesn't stop everyone
and their pet snake from asking about it. </snippity>

I guess I'm looking at something on the scale of a recipe. Farewell,
dreams and glory. What do you think anyway?

P.S. What if the topics were, "kludge jury-rigged parsers"?

Utility to locate errors in regular expressions	3	May 24, 2013
The power of regular expressions without regular expressions.	0	Jul 17, 2013
regular expressions and matching delimeters	17	May 21, 2014
Large regular expressions	1	Mar 15, 2010
Use Regular Expressions to extract URL's	3	Apr 30, 2010
Trouble with regular expressions	6	Feb 7, 2009
Groups in regular expressions don't repeat as expected	7	Apr 20, 2011
Regular expressions and Unicode	1	Oct 2, 2008

regular expressions, stack and nesting

Aaron Brady

Chris Rebert

Aaron Brady

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads