Compiling regex inside function?

A

Anthra Norell

Hi all,

I have a regex that has no use outside of a particular function. From
an encapsulation point of view it should be scoped as restrictively as
possible. Defining it inside the function certainly works, but if
re.compile () is run every time the function is called, it isn't such a
good idea after all. E.g.

def entries (l):
r = re.compile ('([0-9]+) entr(y|ies)')
match = r.search (l)
if match: return match.group (1)

So the question is: does "r" get regex-compiled once at py-compile time
or repeatedly at entries() run time?


Frederic
 
D

Diez B. Roggisch

Anthra said:
Hi all,

I have a regex that has no use outside of a particular function. From
an encapsulation point of view it should be scoped as restrictively as
possible. Defining it inside the function certainly works, but if
re.compile () is run every time the function is called, it isn't such a
good idea after all. E.g.

def entries (l):
r = re.compile ('([0-9]+) entr(y|ies)')
match = r.search (l)
if match: return match.group (1)

So the question is: does "r" get regex-compiled once at py-compile time
or repeatedly at entries() run time?

This can't be answered as simple yes/no-question.

While the statement is executed each time, the resulting pattern-object
isn't re-created, instead there is a caching-mechanism inside the module -
so unless you create a situation where that cache's limits are exceeded and
pattern objects are removed from it, you are essentially having the
overhead of one function-call & a dict-lookup. Certainly worth it.

As an additional note: r"" has *nothing* todo with this, that's just
so-called raw string literals which have a different escaping-behavior -
thus it's easier to write regexes in them.

Diez
 
A

alex23

Anthra Norell said:
def entries (l):
        r = re.compile ('([0-9]+) entr(y|ies)')
        match = r.search (l)
        if match: return match.group (1)

So the question is: does "r" get regex-compiled once at py-compile time
or repeatedly at entries() run time?

The docs say:
The compiled versions of the most recent patterns passed to re.match
(), re.search() or re.compile() are cached, so programs that use only
a few regular expressions at a time needn’t worry about compiling
regular expressions.

(But they don't say how few is 'only a few'...)

If you're concerned about it, you could always set the compiled
pattern to a default value in the function's argspec, as that _is_
only executed the once:

def entries(line, regex = re.compile('([0-9]+) entr(y|ies)'):
match = regex.search(line)
...
 
A

Anthra Norell

alex23 said:
Anthra Norell said:
def entries (l):
r = re.compile ('([0-9]+) entr(y|ies)')
match = r.search (l)
if match: return match.group (1)

So the question is: does "r" get regex-compiled once at py-compile time
or repeatedly at entries() run time?
The docs say:
The compiled versions of the most recent patterns passed to re.match
(), re.search() or re.compile() are cached, so programs that use only
a few regular expressions at a time needn’t worry about compiling
regular expressions.

(But they don't say how few is 'only a few'...)

If you're concerned about it, you could always set the compiled
pattern to a default value in the function's argspec, as that _is_
only executed the once:

def entries(line, regex = re.compile('([0-9]+) entr(y|ies)'):
match = regex.search(line)
...

Excellent idea! Thank you all for the tips.

Frederic
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,968
Messages
2,570,154
Members
46,702
Latest member
LukasConde

Latest Threads

Top