I
Ilpo =?iso-8859-1?Q?Nyyss=F6nen?=
Roy Smith said:Are you speculating that it might be a problem, or saying that you have
seen it be a problem in a real-life program?
Well, it depends, I might say yes. I have a calendar app with command
line user interface. There the use is like this: "view, add, view,
edit, view, ..." and those are separate command invocations. In that
case a second in startup speed can be a long time. And I did use the
profiler and it did show the sre compiling to be the slowest thing.
Nowdays I use libxml2-python as the XML parser and so the problem is
not so acute anymore. (That is just harder to get in running for
python compiled from source outside the rpm system and it is not so
easy to use via DOM interface.)
I just generated a bunch of moderately simple regexes from a dictionary
wordlist. Looks something like:
[...]
So, my guess is that unless you're compiling 100's of regexes each time you
start up, the one-time compilation costs are probably not significant.
Well, as I said, I did get it to be the worst in profiler when using
PyXML/xmlproc.
That's exactly what I would have done if I really needed to improve startup
speed. In fact, I did something like that many moons ago, in a previous
life. See R. Smith, "A finite state machine algorithm for finding
restriction sites and other pattern matching applications", CABIOS, Vol 4,
no. 4, 1988. In that case, I had about 1200 patterns I was searching for
(and doing it on hardware running about 1% of the speed of my current
laptop).
The problem is that it is not so easy to get ALL of the regexps dumped
in that way.
BTW, why did you have to dig out the compiled data before pickling it?
Could you not have just pickled whatever re.compile() returned?
Because it dumps the original regexp and then compiles it when loading.