advice needed for lazy evaluation mechanism

M

markolopa

Hi,

Could you please give me some advice on the piece of code I am
writing?

My system has several possible outputs, some of them are not always
needed. I started to get confused with the code flow conditions needed
to avoid doing unnecessary work. So I am trying to restructure it
using lazy evaluation.

In the new mechanism I am coding I have a repository with two types of
objects: infos and routines. In the begining I have a list of
routines. Each routine tells which infos it can compute. The execution
is triggered when the value of an info is requested. In the example
below I have 3 routines

Routine "ReadData" computes info "gender" and info "birth_year"
Routine "YearToAge" computes info "age" (using info "birth_year")
Routine "ComputeMHF" computes info "max_heart_frequency" (using info
"gender" and info "age")

/--> gender ----------------------------\
ReadData --| | --> ComputeMHF --
max_heart_frequency
\--> birth_year --> YearToAge --> age --/

So for instance if all I need is info "age", only the routines
"ReadData" and "YearToAge" are computed.

The code below implements the example. There are 3 files:
- test.py: the test case for the example
- routines.py: the routines (classes) of the example
- repository.py: the lazy evaluation mechanism (independent of the
example)

My questions are:
- Is there a more standard (pythonic) way to do what I am trying to
do? Are there libraries, design patterns, functional programming
structures to use to achieve what I am looking for (i.e. am I trying
to reinvent the wheel)?
- Is the coding style good?
- Can I avoid the eval command in Repository.add_routine? What I want
there is to be able to have a generic code for the repository which
does not depend on the files containing the routines I want it to
hold.

Note: The routines do not need to declare the info they depend on.
They request the info in the computation phase.

test.py
===
import unittest
from repository import Repository

ROUTINE_LIST = """
ReadData routines
YearToAge routines
ComputeMHF routines
"""

class Test(unittest.TestCase):

def test_age(self):
repo = Repository(ROUTINE_LIST)
self.assertEqual(repo['age'], 30)

def test_max_heart_frequency(self):
repo = Repository(ROUTINE_LIST)
self.assertEqual(repo['max_heart_frequency'], 181)
===

routines.py
===
from repository import AbstractRoutine

class ReadData(AbstractRoutine):
def __init__(self):
super(ReadData, self).__init__(self.__class__.__name__,
['birth_year', 'gender'])
def compute(self, repo):
repo['birth_year'] = 1979
repo['gender'] = 'F'

class YearToAge(AbstractRoutine):
def __init__(self):
super(YearToAge, self).__init__(self.__class__.__name__,
['age'])
def compute(self, repo):
repo['age'] = 2009 - repo['birth_year']

class ComputeMHF(AbstractRoutine):
def __init__(self):
super(ComputeMHF, self).__init__(self.__class__.__name__,
['max_heart_frequency'])
def compute(self, repo):
gender = repo['gender']
age = repo['age']
mhf = 211 - age if gender == 'F' else 205 - age
repo['max_heart_frequency'] = mhf
===

repostory.py
===
from StringIO import StringIO

class AbstractRoutine(object):

def __init__(self, name, infos_provided):
self.name = name
self.infos_provided = infos_provided
self.computed = False

def compute(self):
raise NotImplementedError

class Info(object):
def __init__(self, name, routine):
self.name = name
self.routine = routine
self.computed = False
self.value = None

class Repository(object):

def __init__(self, routine_definition_lines):
self._infos = {}
self.add_routines(routine_definition_lines)

def add_routines(self, definition_lines):
for line in StringIO(definition_lines):
line = line.strip()
if line == '':
continue
name, file_name = line.split()
self.add_routine(name, file_name)

def add_routine(self, class_name, file_name):
routine = None # only to cheat pylint
cmd = "from %s import %s\nroutine = %s()" % (file_name,
class_name,

class_name)
exec(cmd) # XXX: ugly
if not isinstance(routine, AbstractRoutine):
raise ValueError('Class %s is not AbstractRoutine'
% class_name)
for info_name in routine.infos_provided:
info = Info(info_name, routine)
self._infos[info_name] = info

def __setitem__(self, key, value):
if key not in self._infos:
raise ValueError('info %s not defined in repository' %
key)
info = self._infos[key]
if info.computed:
raise ValueError('info %s has already been computed' %
key)
info.value = value
info.computed = True

def __getitem__(self, key):
if key not in self._infos:
raise ValueError('info %s not defined in repository' %
key)
info = self._infos[key]
if not info.computed:
print('Calling routine %s to compute info %s'
% (info.routine.name, info.name))
info.routine.compute(self)
if not info.computed:
raise ValueError('routine %s did not compute info %s'
%
(info.routine.name, key))
return info.value
===

Thanks a lot!
Marko
 
M

MRAB

markolopa said:
Hi again,

I put a copy of the message and the tarball of the code here (because
of the problem of line breaks):

http://python-advocacy.wikidot.com/comp-lang-python-question
Here's a slightly different approach:

repository.py
=============
class Repository(object):
def __init__(self):
self._providers = {}
self._values = {}

def add_provider(self, func, keys):
for k in keys:
self._providers[k] = func

def __getitem__(self, key):
if key not in self._values:
self._providers[key](self)
return self._values[key]

def __setitem__(self, key, value):
self._values[key] = value

def register(*modules):
"Register the provider modules and return a repository."
repo = Repository()
for mod in modules:
# Scan the module for providers.
# A provider is a function which lists what it provides in its __doc__ string.
for name, value in mod.__dict__.items():
if callable(value) and value.__doc__:
repo.add_provider(value, value.__doc__.split())
return repo


routines.py
===========
# The providers are functions which list what they provide in their __doc__ strings.

def ReadData(repo):
'birth_year gender'
repo['birth_year'] = 1979
repo['gender'] = 'F'

def YearToAge(repo):
'age'
repo['age'] = 2009 - repo['birth_year']

def ComputeMHF(repo):
'max_heart_frequency'
gender = repo['gender']
age = repo['age']
mhf = 211 - age if gender == 'F' else 205 - age
repo['max_heart_frequency'] = mhf


test.py
=======
import unittest
import repository
import routines

class Test(unittest.TestCase):
def test_age(self):
repo = repository.register(routines)
self.assertEqual(repo['age'], 30)

def test_max_heart_frequency(self):
repo = repository.register(routines)
self.assertEqual(repo['max_heart_frequency'], 181)

unittest.main()
 
M

markolopa

Here's a slightly different approach:

A clean and elegant solution, very impressive! Also a collection of
nice Python features I had never used.

Thanks a lot!
Marko
 
S

Steven D'Aprano

Hi,

Could you please give me some advice on the piece of code I am writing?

My system has several possible outputs, some of them are not always
needed. I started to get confused with the code flow conditions needed
to avoid doing unnecessary work.

How many dozens of man-hours (yours, and the people who have to maintain
the software after you have moved on) of confusion are you going to spend
to avoid how many microseconds of execution time? So what if your system
takes 35ms instead of 18ms to calculate the result?

As Tony Hoare famously said: "We should forget about the small
efficiencies, say about 97% of the time: Premature optimization is the
root of all evil."

Of course, all of this assumes that the routines you are trying to avoid
calling don't require hours of running time each time you call them...

So I am trying to restructure it using lazy evaluation.

Oh great, avoiding confusion with something even more confusing.

- Is there a more standard (pythonic) way to do what I am trying to do?

Yes. Avoid it. Do the simplest thing that works until you *know* --
because you have profiled it -- that it is too slow. Until then, all that
complication you've built, all that over-engineered jungle of classes and
abstract classes, is unnecessary. I find it beyond all credibility that
your data is so complicated that you need a plug-in system just to manage
the methods you need to calculate your data.

Just create some properties, like this one:

class Example(object):
def __init__(self, birth_year):
self.birth_year = birth_year
@property
def age(self):
return 2009 - self.birth_year # FIXME -- it won't be 2009 forever

And there you have a lazily-evaluated age property. Not complicated
enough? The 3ms it takes to calculate the age is too long? Cache it!

class Example(object):
def __init__(self, birth_year):
self.birth_year = birth_year
self._age = None
@property
def age(self):
a = self._age
if a is None:
a = 2009 - self.birth_year
self._age = a
return a

Now all you need is to invalidate the cache if the birth_year changes. So
you make birth_year a property too:

class Example(object):
def __init__(self, birth_year):
self.birth_year = birth_year
@property
def birth_year(self):
return self._birth_year
@property.setter
# Requires Python 2.6. Also untested.
def birth_year(self, year):
self._birth_year = year
self._age = None
@property
def age(self):
a = self._age
if a is None:
a = 2009 - self.birth_year
self._age = a
return a


Are there libraries, design patterns, functional programming structures
to use to achieve what I am looking for (i.e. am I trying to reinvent
the wheel)?

The most important buzzwords you want are YAGNI and Premature
Generalisation, and perhaps a touch of Architecture Astronaut:

http://www.joelonsoftware.com/items/2008/05/01.html

- Is the coding style good?
- Can I avoid the eval command in Repository.add_routine? What I want
there is to be able to have a generic code for the repository which does
not depend on the files containing the routines I want it to hold.

You mean the exec?

cmd = "from %s import %s\nroutine = %s()" % (file_name, class_name,
class_name)
exec(cmd) # XXX: ugly


Yes. Untested:

module = __import__(filename)
class_object = getattr(module, class_name)
routine = class_object()

Of course you can turn that into a one-liner:

routine = getattr(__import__(filename), class_name)()
 
D

Dieter Maurer

Steven D'Aprano said:
...

Oh great, avoiding confusion with something even more confusing.

Lazy evaluation may be confusing if it is misused.
But, it may be very clear and powerful if used appropriately.

Lazy evaluation essentially means:
you describe beforehand how a computation should be
performed but do this computation only when its result is immediately
required.
Of course, this makes it more difficult to predict when the computation
actually happens -- something potentially very confusing when the computation
has side effects. If the computation is side effect free, potentially
great gains can be achieved without loss of clarity.

Python supports lazy evaluation e.g. by its genenerators (and
generator expressions). Its "itertools" module provides examples
to efficiently use iterators (and by inclusion generators) without
sacrificing clarity.
 
S

Steven D'Aprano

Lazy evaluation may be confusing if it is misused. But, it may be very
clear and powerful if used appropriately.

I was referring to the specific example given, not the general concept of
lazy evaluation.

I went on to give another example of simple, straightforward lazy
evaluation: using properties for computed attributes.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,997
Messages
2,570,239
Members
46,827
Latest member
DMUK_Beginner

Latest Threads

Top