Nested Mapping

R

Raymond Hettinger

I'm considering a nested mapping class for the collections module and
would like to solicit feedback from people here on comp.lang.python:

http://code.activestate.com/recipes/577434-nested-contexts-a-chain-of-mapping-objects/

The class is an attempt to generalize from existing use cases (i.e.
the _multimap class in string.py). I also took guidance from the
symbol table analysis tools in ANTLR (see p.173 of Language
Implementation Patterns by Terence Parr). Nested mappings make it
easy to build chains and trees of scopes (like vocabularies in Forth
or like inheritance hierarchies in Python).

The basic idea is to perform look-ups into a series a dictionaries,
much like Python does with locals, nested scopes, global (module
level) scope and builtins.

Starting with a root context which has a full dictionary API, multiple
child contexts can be created. New values are stored in the child
contexts, potentially masking those in a parent context (much like
shadowing a Python builltin). Lookups start with the child node and
work upward through the chain of parents.

There is also a variant where __setitem__ and __delitem__ can mutate
values in parent contexts (much like the nonlocal keyword allows in
Python).

The API seeks to fully emulate regular dictionaries. The only extra
capabilities are:

d = c.new_child() # create a subcontext
e = d.new_child() # create a subsubcontext
e.parent # find the next enclosing context
e.root # find the outermost enclosing context
f = c.new_child() # create a subcontext of c, independent of d
& e

I would appreciate any feedback on the API and on how well it fits
with various use cases that you've found in the wild.


Raymond
 
P

Paul Rubin

Raymond Hettinger said:
I would appreciate any feedback on the API and on how well it fits
with various use cases that you've found in the wild.

What I really want is a Haskell-like persistent (i.e. purely functional)
dictionary implemented as an AVL tree or something like that. It means
the "child" can just be a forked version of the parent, that shares most
of the same nodes. I think that handles most of the obvious uses of the
nested mapping proposal.

Hedgehog Lisp has AVL-based persistent dictionaries implemented in C,
that I've been thinking of trying to port to the Python C API sometime.
 
R

Raymond Hettinger

What I really want is a Haskell-like persistent (i.e. purely functional)
dictionary implemented as an AVL tree or something like that.

Hopefully, that discussion can be in a separate thread.
This is really about keeping all the nice O(1) characteristics
of dicts and keeping their straight-forward API while adding
the ability to support nested lookups.

An AVL tree has different performance characteristics (like O(log n)
search and has a substantially different API (adding ordering
operations)
and works on different data types (relying on total ordering rather
than hashing an equality). Also, having a persistence is a different
requirement.

Sorry for the brush-off, but it would be ashamed if this thread got
immediately hijacked, and I were to lose possible feedback on the
API for nested mappings.


Raymond
 
P

Paul Rubin

Raymond Hettinger said:
Hopefully, that discussion can be in a separate thread.
This is really about keeping all the nice O(1) characteristics
of dicts and keeping their straight-forward API while adding
the ability to support nested lookups.

Are you saying the O(1) characteristics are retained even when the
contexts are N levels deep?
Sorry for the brush-off, but it would be ashamed if this thread got
immediately hijacked, and I were to lose possible feedback on the
API for nested mappings.

The API you suggested looks reasonable although you should also say how
to delete a context, how to find the inner contexts of a context, etc.

One question: what should

c["foo"] = 7
d = c.new_child()
del d["foo"]

do?
 
R

Raymond Hettinger

The API you suggested looks reasonable although you should also say how
to delete a context, how to find the inner contexts of a context, etc.

The c.parent.parent.parent chain finds successive enclosing contexts:

while c.parent is not None:
print c
c = c.parent

One question: what should

   c["foo"] = 7
   d = c.new_child()
   del d["foo"]

do?  

By default, it raises a KeyError because 'foo' is not in the current
context.

But if enable_nonlocal is set to True, it removes 'foo' from the
parent context, c.

Depends on whether you want parent contexts to be mutable or not.
With Python's nonlocal keyword, we can set values in an enclosing
scope. This tool lets you do that also.


Raymond
 
P

Paul Rubin

Raymond Hettinger said:
The c.parent.parent.parent chain finds successive enclosing contexts:

I was asking about finding the child contexts, not the parents. This is
analogous to how you can find the keys in a dict with dict.keys().
One question: what should

   c["foo"] = 7
   d = c.new_child()
   del d["foo"]

do?  

By default, it raises a KeyError because 'foo' is not in the current
context. But if enable_nonlocal is set to True, it removes 'foo' from
the parent context, c.

I would not have guessed either of those behaviors. What happens on

c["foo"] = 7
d = c.new_child()
d["foo"] = 8
del d["foo"]
print c["foo"], d["foo"]
 
P

Peter Otten

Raymond said:
I'm considering a nested mapping class for the collections module and
would like to solicit feedback from people here on comp.lang.python:

http://code.activestate.com/recipes/577434-nested-contexts-a-chain-of- mapping-objects/

The class is an attempt to generalize from existing use cases (i.e.
the _multimap class in string.py). I also took guidance from the
symbol table analysis tools in ANTLR (see p.173 of Language
Implementation Patterns by Terence Parr). Nested mappings make it
easy to build chains and trees of scopes (like vocabularies in Forth
or like inheritance hierarchies in Python).

The basic idea is to perform look-ups into a series a dictionaries,
much like Python does with locals, nested scopes, global (module
level) scope and builtins.

Starting with a root context which has a full dictionary API, multiple
child contexts can be created. New values are stored in the child
contexts, potentially masking those in a parent context (much like
shadowing a Python builltin). Lookups start with the child node and
work upward through the chain of parents.

There is also a variant where __setitem__ and __delitem__ can mutate
values in parent contexts (much like the nonlocal keyword allows in
Python).

The API seeks to fully emulate regular dictionaries. The only extra
capabilities are:

d = c.new_child() # create a subcontext
e = d.new_child() # create a subsubcontext
e.parent # find the next enclosing context
e.root # find the outermost enclosing context
f = c.new_child() # create a subcontext of c, independent of d
& e

I would appreciate any feedback on the API and on how well it fits
with various use cases that you've found in the wild.

I just read the recipe, and it looks good to me except for the len() and
iter() implementations:
a = Context()
a["x"] = 1
b = a.new_child()
b["x"] = 2
len(b) 2
b.keys()
['x', 'x']

I would have expected equal keys to occur/count just once.

Peter
 
R

Robert Kern

I would appreciate any feedback on the API and on how well it fits
with various use cases that you've found in the wild.

We've done something similar in the past:


https://svn.enthought.com/svn/enthought/CodeTools/trunk/enthought/contexts/multi_context.py

I'm not sure we'd have much use for a batteries-included collections.Context,
per se, since we needed the other features of our Context implementation like
notification and restriction. In particular, we were using our MultiContext with
subcontext mapping objects that could restrict the types of values that were
contained in them. We wanted to execute code in a MultiContext and be able to
separate functions and classes that were defined or imported from other
variables for display.

As for the API, I would probably want to be able to create a Context simply
given a list of existing maps rather than needing to build a chain of them using
..new_child(). Most of the use cases I can imagine are ones where I already have
populated dictionaries, and I just want to slap this behavior on top of them.
Other than that, it looks good.

--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
-- Umberto Eco
 
R

Raymond Hettinger

I just read the recipe, and it looks good to me except for the len() and
iter() implementations:
a = Context()
a["x"] = 1
b = a.new_child()
b["x"] = 2
len(b) 2
b.keys()

['x', 'x']

I would have expected equal keys to occur/count just once.

Thanks for reviewing the recipe.

Since only the first (unshadowed) 'x' is accessible, I can
see why it would make sense to count it only once.

Raymond
 
R

Raymond Hettinger


Right. This looks substantially similar.

The shortened url is: http://tinyurl.com/3xh377k
I'm not sure we'd have much use for a batteries-included collections.Context,
per se, since we needed the other features of our Context implementation like
notification and restriction. In particular, we were using our MultiContext with
subcontext mapping objects that could restrict the types of values that were
contained in them. We wanted to execute code in a MultiContext and be able to
separate functions and classes that were defined or imported from other
variables for display.

That's an interesting additional layer of sophistication.

One part that make be applicable to the current recipe
is individual marking individual context as "allow"
to enable __setitem__ for that level. My approach
was to mark the child context as being able to mutate
any of its parents.
As for the API, I would probably want to be able to create a Context simply
given a list of existing maps rather than needing to build a chain of them using
.new_child(). Most of the use cases I can imagine are ones where I already have
populated dictionaries, and I just want to slap this behavior on top of them.

That makes sense. I should modify the constructor to allow a
succession of existing dicts.
Other than that, it looks good.

Thanks for reviewing the recipe.


Raymond
 
R

Raymond Hettinger

I was asking about finding the child contexts, not the parents.  This is
analogous to how you can find the keys in a dict with dict.keys().

Children point to parents
but the parents don't know about the children.


One question: what should
   c["foo"] = 7
   d = c.new_child()
   del d["foo"]
do?  
By default, it raises a KeyError because 'foo' is not in the current
context.  But if enable_nonlocal is set to True, it removes 'foo' from
the parent context, c.

I would not have guessed either of those behaviors.  What happens on

  c["foo"] = 7
  d = c.new_child()
  d["foo"] = 8
  del d["foo"]
  print c["foo"], d["foo"]

I'm curious what you would expect (and find useful)?


Raymond
 
E

Ethan Furman

Raymond said:
The API you suggested looks reasonable although you should also say how
to delete a context, how to find the inner contexts of a context, etc.

The c.parent.parent.parent chain finds successive enclosing contexts:

while c.parent is not None:
print c
c = c.parent

One question: what should

c["foo"] = 7
d = c.new_child()
del d["foo"]

do?

By default, it raises a KeyError because 'foo' is not in the current
context.

But if enable_nonlocal is set to True, it removes 'foo' from the
parent context, c.

Depends on whether you want parent contexts to be mutable or not.
With Python's nonlocal keyword, we can set values in an enclosing
scope. This tool lets you do that also.

Another possibility would be to set d["foo"] to some sentinel value,
effectively blocking access to the parent's value. Of course, the user
could also do that, it need not be built-in to the nested-mapping.

~Ethan~
 
S

Steven D'Aprano

I'm considering a nested mapping class for the collections module and
would like to solicit feedback from people here on comp.lang.python:

http://code.activestate.com/recipes/577434-nested-contexts-a-chain-
of-mapping-objects/

Very nice!


The emulation of Python's nested scopes isn't perfect:
c = Context()
c['spam'] = 'global'
c = c.new_child()
c['spam'] = 'local'
c['spam'] 'local'
del c['spam']
c['spam']
'global'

So far so good -- that is exactly what I would expect. But:
.... spam = "local"
.... print spam
.... del spam
.... print spam
....local
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 5, in f
UnboundLocalError: local variable 'spam' referenced before assignment


I'd call that a gotcha in CPython rather than a problem with the Context
class. But it might be worth a note in the documentation somewhere.


[...]
The API seeks to fully emulate regular dictionaries.

It doesn't appear to do so as yet. E.g. update is missing, although you
refer to it in the comments.

I don't think that either iteration or len are right. I would expect that
these should only count unique keys, not repeated keys. Given:

c = Context()
c['spam'] = 'global'
c = c.new_child()
c['spam'] = 'local'

I would expect that iteration should yield 'spam' once rather than twice,
and c.items() should yield ('spam', 'local') rather than either:

('spam', 'local'), ('spam', 'global')
('spam', 'local'), ('spam', 'local')

For the rare(?) cases where users do want to see non-unique keys, it's
trivial enough to get, given that Context.maps is public. Simple enough
to do in place:

for key in chain.from_iterable(c.maps):
process(key)


With the current API, getting unique keys is less simple:

seen = set()
for key in c:
if key in seen: continue
seen.add(seen)
process(key)

I think the API should be based on unique keys, and leave getting non-
unique keys to the caller.

I would appreciate any feedback on the API and on how well it fits with
various use cases that you've found in the wild.

I can see myself using it in the future.

I'd like to see the constructor use the same signature as regular dicts.
Instead of:

config = Context()
config.update({'spam':1, 'cheese':0}, parrot='sleeping')

I should be able to pass a dict or iterable of (key,item) pairs, plus
keyword arguments, to initialise the newly created scope:

config = Context({'spam':1, 'cheese':0}, parrot='sleeping')

And similarly for new_child().

I think this would be more useful to me than the current arguments taken
by the constructor. I don't see myself needing enable_nonlocal often
enough to want a shortcut for


and as for parent, there doesn't seem to me to be any point to having it
as an argument to the constructor. Instead of:

you can say:

Neither is particularly simpler than the other.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,982
Messages
2,570,190
Members
46,736
Latest member
zacharyharris

Latest Threads

Top