C
candide
The regular expression HOWTO
(http://docs.python.org/howto/regex.html#more-metacharacters) explains
the following
# ------------------------------
zero-width assertions should never be repeated, because if they match
once at a given location, they can obviously be matched an infinite
number of times.
# ------------------------------
Why the wording is "should never" ? Repeating a zero-width assertion is
not forbidden, for instance :
Nevertheless, the following doesn't execute :
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python2.7/re.py", line 190, in compile
return _compile(pattern, flags)
File "/usr/lib/python2.7/re.py", line 245, in _compile
raise error, v # invalid expression
sre_constants.error: nothing to repeat
\\b\\b and \\b{2} aren't equivalent ?
Surprisingly, the engine doesn't optimize repeated boundary assertions,
for instance
# ------------------------------
import re
import time
a=time.clock()
len("\\b\\b\\b"*100000+"\w+")
b=time.clock()
print "CPU time : %.2f s" %(b - a)
a=time.clock()
re.compile("\\b\\b\\b"*100000+"\w+")
b=time.clock()
print "CPU time : %.2f s" %(b - a)
# ------------------------------
outputs:
# ------------------------------
CPU time : 0.00 s
CPU time : 1.33 s
# ------------------------------
Your comments are welcome!
(http://docs.python.org/howto/regex.html#more-metacharacters) explains
the following
# ------------------------------
zero-width assertions should never be repeated, because if they match
once at a given location, they can obviously be matched an infinite
number of times.
# ------------------------------
Why the wording is "should never" ? Repeating a zero-width assertion is
not forbidden, for instance :
Nevertheless, the following doesn't execute :
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python2.7/re.py", line 190, in compile
return _compile(pattern, flags)
File "/usr/lib/python2.7/re.py", line 245, in _compile
raise error, v # invalid expression
sre_constants.error: nothing to repeat
\\b\\b and \\b{2} aren't equivalent ?
Surprisingly, the engine doesn't optimize repeated boundary assertions,
for instance
# ------------------------------
import re
import time
a=time.clock()
len("\\b\\b\\b"*100000+"\w+")
b=time.clock()
print "CPU time : %.2f s" %(b - a)
a=time.clock()
re.compile("\\b\\b\\b"*100000+"\w+")
b=time.clock()
print "CPU time : %.2f s" %(b - a)
# ------------------------------
outputs:
# ------------------------------
CPU time : 0.00 s
CPU time : 1.33 s
# ------------------------------
Your comments are welcome!