A
Aaron Brady
Hi,
Every so often the group gets a request for parsing an expression. I
think it would be significantly easier to do if regular expressions
could modify a stack. However, since you might nearly as well write
Python, maybe there is a compromise.
Could the Secret Labs' regular expression engine be modified to
operate on lists, for example, or a mutable non-string type?
Details (juicy and otherwise):
One of the alternatives is to reconstruct a new string on every match,
removing the expression and replacing it with a tag. (This by the way
takes at least one out-of-band character.) The running time on it
involves constructing a string from at least three parts, maybe five:
the lead, the opening marker, the inside of the match, the closing
marker, and the tail. If it used ropes, it's still constant time, but
is O( string length * number of matches ) with just normal strings.
Another alternative is to create a new unicode object API,
PyUnicode_FROM_DATA, which creates a string object from an existing
buffer, but does not copy it. I expect this would receive -1 from
many people, not least because it breaks immutability of strings.
ctypes character arrays, arrays, and buffer objects are additional
possibilities.
Every so often the group gets a request for parsing an expression. I
think it would be significantly easier to do if regular expressions
could modify a stack. However, since you might nearly as well write
Python, maybe there is a compromise.
Could the Secret Labs' regular expression engine be modified to
operate on lists, for example, or a mutable non-string type?
Details (juicy and otherwise):
One of the alternatives is to reconstruct a new string on every match,
removing the expression and replacing it with a tag. (This by the way
takes at least one out-of-band character.) The running time on it
involves constructing a string from at least three parts, maybe five:
the lead, the opening marker, the inside of the match, the closing
marker, and the tail. If it used ropes, it's still constant time, but
is O( string length * number of matches ) with just normal strings.
Another alternative is to create a new unicode object API,
PyUnicode_FROM_DATA, which creates a string object from an existing
buffer, but does not copy it. I expect this would receive -1 from
many people, not least because it breaks immutability of strings.
ctypes character arrays, arrays, and buffer objects are additional
possibilities.