eval() == evil? --- How to use it safely?

  • Thread starter Bruno Desthuilliers
  • Start date
B

Bruno Desthuilliers

Fett a écrit :
I am creating a program that requires some data that must be kept up
to date. What I plan is to put this data up on a web-site then have
the program periodically pull the data off the web-site.

My problem is that when I pull the data (currently stored as a
dictionary on the site) off the site, it is a string, I can use eval()

Short answer: use json as the format for data transfer.
 
F

Fett

I am creating a program that requires some data that must be kept up
to date. What I plan is to put this data up on a web-site then have
the program periodically pull the data off the web-site.

My problem is that when I pull the data (currently stored as a
dictionary on the site) off the site, it is a string, I can use eval()
to make that string into a dictionary, and everything is great.
However, this means that I am using eval() on some string on a web-
site, which seems pretty un-safe.

I read that by using eval(code,{"__builtins__":None},{}) I can prevent
them from using pretty much anything, and my nested dictionary of
strings is still allowable. What I want to know is:

What are the dangers of eval?
- I originally was using exec() but switched to eval() because I
didn't want some hacker to be able to delete/steal files off my
clients computers. I assume this is not an issue with eval(), since
eval wont execute commands.
- What exactly can someone do by modifying my code string in a command
like: thing = eval(code{"__builtins__":None},{}), anything other than
assign their own values to the object thing?
 
G

Guilherme Polo

I am creating a program that requires some data that must be kept up
to date. What I plan is to put this data up on a web-site then have
the program periodically pull the data off the web-site.

My problem is that when I pull the data (currently stored as a
dictionary on the site) off the site, it is a string, I can use eval()
to make that string into a dictionary, and everything is great.
However, this means that I am using eval() on some string on a web-
site, which seems pretty un-safe.

I read that by using eval(code,{"__builtins__":None},{}) I can prevent
them from using pretty much anything, and my nested dictionary of
strings is still allowable. What I want to know is:

What are the dangers of eval?
- I originally was using exec() but switched to eval() because I
didn't want some hacker to be able to delete/steal files off my
clients computers. I assume this is not an issue with eval(), since
eval wont execute commands.
- What exactly can someone do by modifying my code string in a command
like: thing = eval(code{"__builtins__":None},{}), anything other than
assign their own values to the object thing?

By "disabling" __builtins__ you indeed cut some obvious tricks, but
someone still could send you a string like "10 ** 10 ** 10".
 
J

James Mills

Hi,

If you cannot use a simple data structure/format
like JSON, or CSV, or similar, _don't_
use eval or exec, but use the pickle
libraries instead. This is much safer.

cheers
James
 
C

castironpi

I am creating a program that requires some data that must be kept up
to date. What I plan is to put this data up on a web-site then have
the program periodically pull the data off the web-site.

My problem is that when I pull the data (currently stored as a
dictionary on the site) off the site, it is a string, I can use eval()
to make that string into a dictionary, and everything is great.
However, this means that I am using eval() on some string on a web-
site, which seems pretty un-safe.

May I suggest PyYAML?
 
M

Matimus

By "disabling" __builtins__ you indeed cut some obvious tricks, but
someone still could send you a string like "10 ** 10 ** 10".

Or, they could pass in something like this:

(t for t in 42 .__class__.__base__.__subclasses__() if t.__name__ ==
'LibraryLoader').next()((t for t in
__class__.__base__.__subclasses__() if t.__name__ ==
'CDLL').next()).msvcrt.system("SOMETHING MALICIOUS")

Which can be used to execute pretty much anything on a Windows system
using a "safe" eval. This same exploit exists in some form on *nix.
The above assumes that ctypes has been loaded. It can be modified to
call code in other modules that have been loaded though as well.

Matt
 
S

Steven D'Aprano

I read that by using eval(code,{"__builtins__":None},{}) I can prevent
them from using pretty much anything,

No, it can prevent them from some obvious dangers, but not all obvious
dangers and possibly not unobvious ones.
and my nested dictionary of
strings is still allowable. What I want to know is:

What are the dangers of eval?

You're executing code on your server that was written by arbitrary and
untrusted people over the Internet.

- I originally was using exec() but switched to eval() because I didn't
want some hacker to be able to delete/steal files off my clients
computers. I assume this is not an issue with eval(), since eval wont
execute commands.

Bare eval() certainly can:

eval('__import__("os").system("ls *")') # or worse...

eval() with the extra arguments given makes that sort of thing harder,
but does it make it impossible? Are you willing to bet your server on it?
- What exactly can someone do by modifying my code string in a command
like: thing = eval(code{"__builtins__":None},{}), anything other than
assign their own values to the object thing?

They can cause an exception:

code = '0.0/0.0'
thing = eval(code, {"__builtins__": None}, {})

They can cause a denial of service attack:

code = '10**10**10'

They can feed you bad data:

code = "{ 'akey': 'Something You Don\'t Expect' }"

You have to deal with bad data no matter what you do, but why make it
easy for them to cause exceptions?

BTW, in case you think that you only have to deal with malicious attacks,
you also have to deal with accidents caused by incompetent users.
 
P

Paul Rubin

Fett said:
However, this means that I am using eval() on some string on a web-
site, which seems pretty un-safe.

Don't even think of doing that.

I read that by using eval(code,{"__builtins__":None},{})

It is not reliable enough. Don't use eval for this AT ALL.
- I originally was using exec() but switched to eval()

For this purpose there is no difference between exec and eval.

Use something like simpleson or cjson instead.
 
P

Paul Rubin

James Mills said:
If you cannot use a simple data structure/format
like JSON, or CSV, or similar, _don't_
use eval or exec, but use the pickle
libraries instead. This is much safer.

Pickle uses eval and should also be considered unsafe, as its
documentation describes.
 
F

Fett

So long story short: if I am expecting a dictionary of strings, I
should make a parser that only accepts a dictionary of strings then.
There is no safe way to use an existing construct.

That is what I was afraid of. I know I will have to deal with the
possibility of bad data, but considering my use (an acronym legend for
a database), and the fact that the site I plan to use should be
secure, these issues should be minimal. The users should be able to
spot any obvious false data, and restoring it should be simple.

Many thanks to all of you for your alarmist remarks. I certainly don't
want to, in any way, put my clients computers at risk by providing
unsafe code.
 
F

Fett

So long story short: if I am expecting a dictionary of strings, I
should make a parser that only accepts a dictionary of strings then.
There is no safe way to use an existing construct.

That is what I was afraid of. I know I will have to deal with the
possibility of bad data, but considering my use (an acronym legend for
a database), and the fact that the site I plan to use should be
secure, these issues should be minimal. The users should be able to
spot any obvious false data, and restoring it should be simple.

Many thanks to all of you for your alarmist remarks. I certainly don't
want to, in any way, put my clients computers at risk by providing
unsafe code.

On a related note, what if I encrypted and signed the data, then only
ran eval() on the string after it was decrypted and the signature
verified?

It has occurred to me that posting this data on a site might not be
the best idea unless I can be sure that it is not read by anyone that
it shouldn't be. So I figure an encrypting is needed, and as long as I
can sign it as well, then only people with my private signing key
could pass bad data, much less harmful strings.
 
B

Bruno Desthuilliers

Fett a écrit :
So long story short: if I am expecting a dictionary of strings, I
should make a parser that only accepts a dictionary of strings then.

or use an existing parser for an existing and documented format, as many
posters (including myself) already suggested.
There is no safe way to use an existing construct.

Nothing coming from the outside world is safe.
That is what I was afraid of. I know I will have to deal with the
possibility of bad data, but considering my use (an acronym legend for
a database), and the fact that the site I plan to use should be
secure, these issues should be minimal.

If you feel like opening the door to any script-kiddie, then please
proceed. It's *your* computer, anyway...

Else, use a known format with a known working parser (xml, json, yaml,
csv, etc...), and possibly https if your data are to be protected.
 
L

Lie

On a related note, what if I encrypted and signed the data, then only
ran eval() on the string after it was decrypted and the signature
verified?

It has occurred to me that posting this data on a site might not be
the best idea unless I can be sure that it is not read by anyone that
it shouldn't be. So I figure an encrypting is needed, and as long as I
can sign it as well, then only people with my private signing key
could pass bad data, much less harmful strings.

Your way of thinking is similar to Microsoft's. Encrypting and Signing
is a kludge, a real fix should fix the underlying cause. Anyway using
data parsers isn't that much harder than using eval/exec.
 
F

Fett

Your way of thinking is similar to Microsoft's. Encrypting and Signing
is a kludge, a real fix should fix the underlying cause. Anyway using
data parsers isn't that much harder than using eval/exec.

While I agree that in this situation I should do both, what would you
propose for cases where the data being sent is supposed to be
executable code:

I happen to know that for enterprise disk drives (like what Google
uses to store everything) the firmware is protected by exactly what I
describe. Since the firmware has to be able to run, the kind of fix
you propose is not possible. I would assume that if this kind of data
transfer was deemed poor, that Google and others would be demanding
something better (can you imagine if Google's database stopped working
because someone overwrote the firmware on their hard-drive?).

Again, I suppose that in this case writing a parser is a better option
(parsing a dict with strings by hand is faster than reading
documentation on someone else's parser anyway), but both is the best
option by far.

Again, thank you all for your help.
 
C

castironpi

While I agree that in this situation I should do both, what would you
propose for cases where the data being sent is supposed to be
executable code:

I happen to know that for enterprise disk drives (like what Google
uses to store everything) the firmware is protected by exactly what I
describe. Since the firmware has to be able to run, the kind of fix
you propose is not possible. I would assume that if this kind of data
transfer was deemed poor, that Google and others would be demanding
something better (can you imagine if Google's database stopped working
because someone overwrote the firmware on their hard-drive?).

Again, I suppose that in this case writing a parser is a better option
(parsing a dict with strings by hand is faster than reading
documentation on someone else's parser anyway), but both is the best
option by far.

Again, thank you all for your help.

I as a fan of biological structures tend to favor the 'many-small'
strategy: expose your servers, but only a fraction to any given
source. If one of them crashes, blacklist their recent sources.
Distribute and decentralize ("redundantfy"). Compare I guess to a jet
plane with 1,000 engines, of which a few can fail no problem.
Resources can be expendable in small proportions.

More generally, think of a minimalist operating system, that can
tolerate malicious code execution, and just crash and reboot a lot.
If 'foreign code' execution is fundamental to the project, you might
even look at custom hardware. Otherwise, if it's a lower priority,
just run a custom Python install, and delete modules like os.py,
os.path.py, and maybe even sys.py. Either remove their corresponding
libraries, or create a wrapper that gets Admin approval for calls like
'subprocess.exec' and 'os.path.remove'.

You notice Windows now obtains User approval for internet access by a
new program it doesn't recognize.
 
M

mario

I am creating a program that requires some data that must be kept up
to date. What I plan is to put this data up on a web-site then have
the program periodically pull the data off the web-site.

My problem is that when I pull the data (currently stored as a
dictionary on the site) off the site, it is a string, I can use eval()
to make that string into a dictionary, and everything is great.
However, this means that I am using eval() on some string on a web-
site, which seems pretty un-safe.

I read that by using eval(code,{"__builtins__":None},{}) I can prevent
them from using pretty much anything, and my nested dictionary of
strings is still allowable. What I want to know is:

What are the dangers of eval?
- I originally was using exec() but switched to eval() because I
didn't want some hacker to be able to delete/steal files off my
clients computers. I assume this is not an issue with eval(), since
eval wont execute commands.
- What exactly can someone do by modifying my code string in a command
like: thing = eval(code{"__builtins__":None},{}), anything other than
assign their own values to the object thing?

If you like to look at a specific attempt for making eval() safe(r)
take a look at how the **eval-based** Evoque Templating engine does
it, for which a short overview is here:
http://evoque.gizmojo.org/usage/restricted/

While it does not provide protection against DOS type attacks, it
should be safe against code that tries to pirate tangible resources
off your system, such as files and disk. Actually, any problems anyone
may find are greatly appreciated...
 
R

rustom

May I suggest PyYAML?

I second that.

Yaml is very pythonic (being indentation based) and pyyaml is sweet.

Only make sure you use safe_load not load and you will have only
default construction for standard python objects -- lists,
dictionaries and 'atomic' things so no arbitrary code can be executed.

Someone else suggested json which is about the same as yml if there
are no objects. And by using safe_load you are not using objects.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,962
Messages
2,570,134
Members
46,690
Latest member
MacGyver

Latest Threads

Top