'_[1]' in .co_names using builtin compile() in Python 2.6

M

magnus.lycka

When I run e.g. compile('sin(5) * cos(6)', '<string>', 'eval').co_names, I get ('sin', 'cos'), which is just what I expected.

But when I have a list comprehension in the expression, I get a little surprise:
compile('[x*x for x in y]', '<string>', 'eval').co_names ('_[1]', 'y', 'x')

This happens in Python 2.6.6 on Red Hat Linux, but not when I run Python 2.7.3 in Windows. Unfortunately I'm stuck with 2.6.

* Are there more surprises similar to this one that I can expect from compile(...).co_names? Is this "behaviour" documented somewhere?

* Is there perhaps a better way to achieve what I'm trying to do?

What I'm really after, is to check that python expressions embedded in text files are:
- well behaved (no syntax errors etc)
- don't accidentally access anything it shouldn't
- I serve them with the values they need on execution

So, in the case of "a.b + x" I'm really just interested in a and x, not b. So the (almost) whole story is that I do:

# Find names not starting with ".", i.e a & b in "a.c + b"
abbr_expr = re.sub(r"\.\w+", "", expr)
names = compile(abbr_expr, '<string>', 'eval').co_names
# Python 2.6 returns '_[1]' in co_names for list comprehension. Bug?
names = [name for name in names if re.match(r'\w+$', name)]

for name in names:
if name not in allowed_names:
raise NameError('Name: %s not permitted in expression: %s' % (name, expr))
 
N

Ned Batchelder

When I run e.g. compile('sin(5) * cos(6)', '<string>', 'eval').co_names, I get ('sin', 'cos'), which is just what I expected.

But when I have a list comprehension in the expression, I get a little surprise:
compile('[x*x for x in y]', '<string>', 'eval').co_names ('_[1]', 'y', 'x')

This happens in Python 2.6.6 on Red Hat Linux, but not when I run Python 2.7.3 in Windows. Unfortunately I'm stuck with 2.6.

* Are there more surprises similar to this one that I can expect from compile(...).co_names? Is this "behaviour" documented somewhere?

That name is the name of the list being built by the comprehension,
which I found out by disassembling the code object to see the bytecodes:
co = compile("[x*x for x in y]", "<s>", "eval")
co.co_names ('_[1]', 'y', 'x')
import dis
dis.dis(co)
1 0 BUILD_LIST 0
3 DUP_TOP
4 STORE_NAME 0 (_[1])
7 LOAD_NAME 1 (y)
10 GET_ITER14 STORE_NAME 2 (x)
17 LOAD_NAME 0 (_[1])
20 LOAD_NAME 2 (x)
23 LOAD_NAME 2 (x)
26 BINARY_MULTIPLY
27 LIST_APPEND
28 JUMP_ABSOLUTE 11
31 DELETE_NAME 0 (_[1])
34 RETURN_VALUE

The same list comprehension in 2.7 uses an unnamed list on the stack:

1 0 BUILD_LIST 0
3 LOAD_NAME 0 (y)
6 GET_ITER10 STORE_NAME 1 (x)
13 LOAD_NAME 1 (x)
16 LOAD_NAME 1 (x)
19 BINARY_MULTIPLY
20 LIST_APPEND 2
23 JUMP_ABSOLUTE 7
I don't know whether such facts are documented. They are deep
implementation details, and change from version to version, as you've seen.
* Is there perhaps a better way to achieve what I'm trying to do?

What I'm really after, is to check that python expressions embedded in text files are:
- well behaved (no syntax errors etc)
- don't accidentally access anything it shouldn't
- I serve them with the values they need on execution

I hope you aren't trying to prevent malice this way: you cannot examine
a piece of Python code to prove that it's safe to execute. For an
extreme example, see: Eval Really Is Dangerous:
http://nedbatchelder.com/blog/201206/eval_really_is_dangerous.html

In your environment it looks like you have a whitelist of identifiers,
so you're probably ok.
So, in the case of "a.b + x" I'm really just interested in a and x, not b. So the (almost) whole story is that I do:

# Find names not starting with ".", i.e a & b in "a.c + b"
abbr_expr = re.sub(r"\.\w+", "", expr)
names = compile(abbr_expr, '<string>', 'eval').co_names
# Python 2.6 returns '_[1]' in co_names for list comprehension. Bug?
names = [name for name in names if re.match(r'\w+$', name)]

for name in names:
if name not in allowed_names:
raise NameError('Name: %s not permitted in expression: %s' % (name, expr))

I don't know of a better way to determine the real names in the
expression. I doubt Python will insert a valid name into the namespace,
since it doesn't want to step on real user names. The simplest way to
do that is to autogenerate invalid names, like "_[1]" (I wonder why it
isn't "_[0]"?)

--Ned.
 
I

Ian Kelly

So, in the case of "a.b + x" I'm really just interested in a and x, not b. So the (almost) whole story is that I do:

# Find names not starting with ".", i.e a & b in "a.c + b"
abbr_expr = re.sub(r"\.\w+", "", expr)
names = compile(abbr_expr, '<string>', 'eval').co_names
# Python 2.6 returns '_[1]' in co_names for list comprehension. Bug?
names = [name for name in names if re.match(r'\w+$', name)]

for name in names:
if name not in allowed_names:
raise NameError('Name: %s not permitted in expression: %s' % (name, expr))

I don't know of a better way to determine the real names in the
expression. I doubt Python will insert a valid name into the namespace,
since it doesn't want to step on real user names. The simplest way to do
that is to autogenerate invalid names, like "_[1]" (I wonder why it isn't
"_[0]"?)

One possible alternative is to use the ast module to examine the parse tree
of the expression instead of the generated code object. Hard to say whether
that would be "better".
 
C

Chris Kaynor

* Is there perhaps a better way to achieve what I'm trying to do?

I hope you aren't trying to prevent malice this way: you cannot examine a
piece of Python code to prove that it's safe to execute. For an extreme
example, see: Eval Really Is Dangerous: http://nedbatchelder.com/blog/
201206/eval_really_is_dangerous.html

In your environment it looks like you have a whitelist of identifiers, so
you're probably ok.


I just tested the crash example from that link in Python 2.7.5 win64 and
the co_names from the compiled code is empty. Therefore, a simple whitelist
would not catch that problematic code (and likely any other global access
done correctly). Even a simple test of making sure that at least one (or
any number of) valid identifier exists would be insufficent, as you can
merely tack on a ",a" to add "a" to the co_names, and thus for any other
variable.

Basically, even with a pure whitelist, there is likely no possible way to
make eval/exec safe, unless you also eliminate the ability to make literals.

Chris
 
N

Ned Batchelder

On Wed, Nov 27, 2013 at 12:09 PM, Ned Batchelder <[email protected]

* Is there perhaps a better way to achieve what I'm trying to do?

What I'm really after, is to check that python expressions
embedded in text files are:
- well behaved (no syntax errors etc)
- don't accidentally access anything it shouldn't
- I serve them with the values they need on execution


I hope you aren't trying to prevent malice this way: you cannot
examine a piece of Python code to prove that it's safe to execute.
For an extreme example, see: Eval Really Is Dangerous:
http://nedbatchelder.com/blog/__201206/eval_really_is___dangerous.html
<http://nedbatchelder.com/blog/201206/eval_really_is_dangerous.html>

In your environment it looks like you have a whitelist of
identifiers, so you're probably ok.


I just tested the crash example from that link in Python 2.7.5 win64 and
the co_names from the compiled code is empty. Therefore, a simple
whitelist would not catch that problematic code (and likely any other
global access done correctly). Even a simple test of making sure that at
least one (or any number of) valid identifier exists would be
insufficent, as you can merely tack on a ",a" to add "a" to the
co_names, and thus for any other variable.

Ah, right you are! I neglected to go back and examine the dangerous
code. So eval really is dangerous!

--Ned.
 
S

Steven D'Aprano

What I'm really after, is to check that python expressions embedded in
text files are: - well behaved (no syntax errors etc) - don't
accidentally access anything it shouldn't - I serve them with the values
they need on execution

If you are trying to get safe execution of untrusted code in Python, you
should read this recent thread from the Python core developers:

https://mail.python.org/pipermail/python-dev/2013-November/130132.html


Probably the only way to securely sandbox untrusted Python code is to use
operating system level security (such as a chroot jail) or an
implementation such as PyPy which has been designed from the beginning to
be sandboxed -- and even that may simply mean that nobody has broken out
of PyPy's sandbox *yet*.

Looking back at your example:

compile('sin(5) * cos(6)', '<string>', 'eval').co_names

I'm not sure I understand why you inspect the co_names. What does that
give you? You can tell that there are no syntax errors just by compiling
it, if there are syntax errors it will raise SyntaxError.

I would pre-process the string before compiling and disallow *anything*
containing "eval", "exec", or underscore. I'd also apply a limit to the
total length of the string. That doesn't necessarily rule out a hostile
user running arbitrary code, but it does make it harder.

Also, when you execute the compiled code, don't do this:

eval(code) # No!

Instead, provide an explicit globals and locals namespace:

safe_ish_namespace = {'__builtins__': None}
eval(code, safe_ish_namespace)


Again, this increases the barrier to somebody hacking out of your sandbox
without ruling it out altogether.

Good luck!
 
M

magnus.lycka

I hope you aren't trying to prevent malice this way: you cannot examine
a piece of Python code to prove that it's safe to execute.

No worry. Whoever has access to modifying those configuration files
can cause a mess in all sorts of other ways, such as writing and running
arbitrary programs.

I just want to give reasonably rapid feedback when people make mistakes.

As with all python code, it's very important to test properly, but the
top level names are often defined elsewhere in the configuration, so I
want to catch those errors ASAP.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,955
Messages
2,570,117
Members
46,705
Latest member
v_darius

Latest Threads

Top