Ideas to optimize this getitem/eval call?

M

mario

Hi,

below is the essence of a an expression evaluator, by means of a
getitem lookup. The expression codes are compiled and cached -- the
lookup is actually recursive, and the first time around it will always
fail.

import sys
class GetItemEval(object):

def __init__(self):
self.globals = globals() # some dict (always the same)
self.locals = {} # some other dict (may be different between
evaluations)
self.codes = {} # compiled code expressions cache

def __getitem__(self, expr):
try:
return eval(self.codes[expr], self.globals, self.locals)
except:
# KeyError, NameError, AttributeError, SyntaxError,
ValueError,
# TypeError, IOError
#
# Special case if a KeyError is coming from the self.codes
[name]
# lookup (traceback should consist of a single frame
only):
if sys.exc_info()[2].tb_next is None:
if sys.exc_info()[0] is KeyError:
self.codes[expr] = compile(expr, '<string>',
'eval')
return self[expr]
# otherwise handle eval error in some way...

This class could be used in a way as follows:

# define some expressions
def f(s): return "["+s+"]"
exprs = ["1+2+3", "2*3*5", "f(__name__)"]
# instantiate one
gie = GetItemEval()
# use it to lookup/eval each expression
for x in exprs:
print x, "=", gie[x]

And, fwiw, some sample timeit code:

import timeit
print timeit.Timer("for x in exprs: gie[x]",
"from __main__ import gie, exprs").timeit(500000)

I had done various poking to discover if it could be made to go
faster, and in the end I settled on the version above.

mario

Incidentally this constitutes the lion's share of the evaluation
runtime of evoque templating... http://evoque.gizmojo.org/
 
V

vk

What do you mean by 'fail'?
you have;

:: self.codes = {}

so

:: try:
:: return eval(self.codes[expr], self.globals, self.locals)

will always return an exception the first time (if this is what you're
referring to).
 
S

Steven D'Aprano

Hi,

below is the essence of a an expression evaluator, by means of a getitem
lookup. The expression codes are compiled and cached -- the lookup is
actually recursive, and the first time around it will always fail.

import sys
class GetItemEval(object):

def __init__(self):
self.globals = globals() # some dict (always the same)
self.locals = {} # some other dict (may be different between
evaluations)
self.codes = {} # compiled code expressions cache

def __getitem__(self, expr):
try:
return eval(self.codes[expr], self.globals, self.locals)

I was about to make a comment about this being a security hole, but I see
from here

http://evoque.gizmojo.org/usage/restricted/

that you are aware of at least some of the issues.

I must say though, your choice of builtins to prohibit seems rather
arbitrary. What is dangerous about (e.g.) id() and isinstance()?


except:
# KeyError, NameError, AttributeError, SyntaxError,
# ValueError, TypeError, IOError

If you want to capture seven exceptions, then capture seven exceptions,
not any exception.

You should write:

except (KeyError, NameError, ..., IOError):

instead of a bare except clause. That will capture exceptions that
represent bugs in your code as well as exceptions that should propbably
be allowed to propagate, such as KeyboardInterupt and SystemExit.

# Special case if a KeyError is coming from the self.codes[name]
# lookup (traceback should consist of a single frame only):
if sys.exc_info()[2].tb_next is None:
if sys.exc_info()[0] is KeyError:
self.codes[expr] = compile(expr, '<string>', 'eval')
return self[expr]
# otherwise handle eval error in some way...

That seems awfully complicated for no good reason. If you want to handle
KeyError differently, then capture KeyError:

try:
...
except KeyError:
handle_keyerror()
except: (NameError, ..., IOError):
handle_everythingelse()



It also seems significant slower:
.... D = {}
.... try:
.... D['x']
.... except KeyError:
.... pass
........ D = {}
.... try:
.... D['x']
.... except:
.... if sys.exc_info()[2].tb_next is None:
.... if sys.exc_info()[0] is KeyError:
.... pass
....6.2694659233093262
 
M

mario

I was about to make a comment about this being a security hole,

Strange that you say this, as you are also implying that *all* the
widely-used templating systems for python are security holes... Well,
you would be right to say that of course ;-) Infact, evoque is really
one of the few (or even the only one?) that was conceived from the
start to support restricted evaluation.
but I see from here

http://evoque.gizmojo.org/usage/restricted/

that you are aware of at least some of the issues.

I must say though, your choice of builtins to prohibit seems rather
arbitrary. What is dangerous about (e.g.) id() and isinstance()?

Preventive, probably. I also feel that temlates should have any
business with info such as the memory addressed returnred by id(). For
isinstance, becuase it is somewhat related to __subclasses__ that is
known to be insecure.

Incidentally, I updated the page you point to clarify what is going
on. I also added a link to Brett Cannon's inspiration paper on the
topic of securing the python interpreter...
If you want to capture seven exceptions, then capture seven exceptions,
not any exception.

Absolutely not. I want to catch ALL evaluation exceptions... it would
actually *be* a secuity hole to allow any exception to bubble. hey
will however be handled appropriately as per the application policy/
config/deployment.
You should write:

    except (KeyError, NameError, ..., IOError):

instead of a bare except clause. That will capture exceptions that
represent bugs in your code as well as exceptions that should propbably
be allowed to propagate, such as KeyboardInterupt and SystemExit.

Again, no. Template presentational logic has no business issuing
SystemExits or so. And, of course, there are no bugs in my code ;-)
        # Special case if a KeyError is coming from the self.codes[name]
        # lookup (traceback should consist of a single frame only):
        if sys.exc_info()[2].tb_next is None:
            if sys.exc_info()[0] is KeyError:
                self.codes[expr] = compile(expr, '<string>', 'eval')
                    return self[expr]
That seems awfully complicated for no good reason.

Yes, you are probably right. I wanted to be precise in that if the
KeyError originates strictly from the codes lookup and not from the
actual eval of the expr itself -- in which case the expr should be
compiled and added to codes (yes, this is the "first-time failure" I
referred to in the first message).

I tested the performance of your 2 variations in context, and there
seems to be no noticeable performance gain (something like less than
1% gain). But note the two variations as you code them do not quite do
exactly the same test.

I have adjusted to use this code now:

def __getitem__(self, expr):
try:
return eval(self.codes[expr], self.globals, self.locals)
except:
# We want to catch **all** evaluation errors!
# KeyError, NameError, AttributeError, SyntaxError, V
# alueError, TypeError, IOError, ...
#
# Special case:
# if KeyError is coming from self.codes[expr] lookup,
# then we add the compiledentry and try again:
if not name in self.codes:
self.codes[expr] = compile(name, '<string>', 'eval')
return self[expr]
# handle any other error...

This retains the correctness of the check, and has the same marginal
perf improvement (that is basically negligible, but at least it is not
slower) and has teh advantage that the code is clearer.

Thanks for the remarks!

mario
 
M

mario

correction: the code posted in previous message should have been:


def __getitem__(self, expr):
try:
return eval(self.codes[expr], self.globals, self.locals)
except:
# We want to catch **all** evaluation errors!
# KeyError, NameError, AttributeError, SyntaxError, V
# alueError, TypeError, IOError, ...
#
# Special case:
# if KeyError is coming from self.codes[expr] lookup,
# then we add the compiledentry and try again:
if not expr in self.codes:
self.codes[expr] = compile(expr, '<string>', 'eval')
return self[expr]
# handle any other error...
 
S

Steven D'Aprano

On Jan 3, 7:16 am, Steven D'Aprano <st...@REMOVE-THIS-
cybersource.com.au> wrote: [...]
I must say though, your choice of builtins to prohibit seems rather
arbitrary. What is dangerous about (e.g.) id() and isinstance()?

Preventive, probably. I also feel that temlates should have any business
with info such as the memory addressed returnred by id(). For
isinstance, becuase it is somewhat related to __subclasses__ that is
known to be insecure.

It's that "probably" that worries me.

In my earlier post, I was going to say "your choice of builtins to
prohibit seems rather Cargo Cult-ish", but I decided that might be a bit
rude, and I decided to give you the benefit of the doubt. But your answer
here doesn't really give me any more confidence.

What security issues do you think you are preventing by prohibiting id()?
It is true that the id returned is a memory address in CPython, but
that's a mere implementation detail, and may not be true in other
implementations. Even if it is a memory address, so what? You can't do
anything with it except treat it as an integer. I don't see how id()
decreases security one iota.

In the absence of even a theoretical threat, banning a function that
returns an int because the int is derived from a memory address (but
can't be used as one!) makes you appear to be engaging in Cargo Cult
programming: following the ritual, without any understanding of what you
are doing.

http://catb.org/~esr/jargon/html/C/cargo-cult-programming.html

Sorry to be so harsh, especially over such a tiny little thing as the
id() function, but that's the impression it gives me, and I'm probably
not the only one. It is true that a false positive (needlessly banning a
harmless function) has less serious consequences than a false negative
(failing to ban a harmful function), but it does reduce confidence that
you know what you're doing.

Likewise for isinstance() and issubclass(). They may be related to
__subclasses__, but they don't give the caller access to __subclassess__.
So what is the actual threat you are defending against?

I'm not saying that there must be an actual in-the-wild demonstrated
vulnerability before you prohibit a built-in, but there should be at
least a plausible attack vector.


Incidentally, I updated the page you point to clarify what is going on.

Speaking of that page, you have an obvious typo ("teh" instead of "the")
that should be fixed:

"In addition to teh above, all subclasses of BaseException..."

http://evoque.gizmojo.org/usage/restricted/



I also added a link to Brett Cannon's inspiration paper on the topic of
securing the python interpreter...


Absolutely not. I want to catch ALL evaluation exceptions...

I see. It wasn't clear that this was deliberate, I read it as if there
was a fixed, finite list you wanted to catch, and nothing else. You
should make sure this is clearly documented in your code.
 
M

mario g

Thats is definitively not the case. There are at least 2 quite old
template systems on top of a quite good restricted environment.

Interesting, which ones?

I should have probably emphasized *conceived* above... that feature
was there in the initial design and not added afterwards.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,982
Messages
2,570,189
Members
46,735
Latest member
HikmatRamazanov

Latest Threads

Top