Handling 2.7 and 3.0 Versions of Dict

T

Travis Parks

I am writing a simple algorithms library that I want to work for both
Python 2.7 and 3.x. I am writing some functions like distinct, which
work with dictionaries under the hood. The problem I ran into is that
I am calling itervalues or values depending on which version of the
language I am working in. Here is the code I wrote to overcome it:

import sys
def getDictValuesFoo():
if sys.version_info < (3,):
return dict.itervalues
else:
return dict.values

getValues = getDictValuesFoo()

def distinct(iterable, keySelector = (lambda x: x)):
lookup = {}
for item in iterable:
key = keySelector(item)
if key not in lookup:
lookup[key] = item
return getValues(lookup)

I was surprised to learn that getValues CANNOT be called as if it were
a member of dict. I figured it was more efficient to determine what
getValues was once rather than every time it was needed.

First, how can I make the method getValues "private" _and_ so it only
gets evaluated once? Secondly, will the body of the distinct method be
evaluated immediately? How can I delay building the dict until the
first value is requested?

I noticed that hashing is a lot different in Python than it is in .NET
languages. .NET supports custom "equality comparers" that can override
a type's Equals and GetHashCode functions. This is nice when you can't
change the class you are hashing. That is why I am using a key
selector in my code, here. Is there a better way of overriding the
default hashing of a type without actually modifying its definition? I
figured a requesting a key was the easiest way.
 
T

Terry Reedy

I am writing a simple algorithms library that I want to work for both
Python 2.7 and 3.x. I am writing some functions like distinct, which
work with dictionaries under the hood. The problem I ran into is that
I am calling itervalues or values depending on which version of the
language I am working in. Here is the code I wrote to overcome it:

import sys
def getDictValuesFoo():
if sys.version_info< (3,):
return dict.itervalues
else:
return dict.values

One alternative is to use itervalues and have 2to3 translate for you.
 
M

Martin v. Loewis

Am 31.08.2011 03:43, schrieb Travis Parks:
I am writing a simple algorithms library that I want to work for both
Python 2.7 and 3.x. I am writing some functions like distinct, which
work with dictionaries under the hood. The problem I ran into is that
I am calling itervalues or values depending on which version of the
language I am working in. Here is the code I wrote to overcome it:

import sys
def getDictValuesFoo():
if sys.version_info < (3,):
return dict.itervalues
else:
return dict.values

getValues = getDictValuesFoo()

def distinct(iterable, keySelector = (lambda x: x)):
lookup = {}
for item in iterable:
key = keySelector(item)
if key not in lookup:
lookup[key] = item
return getValues(lookup)

I was surprised to learn that getValues CANNOT be called as if it were
a member of dict. I figured it was more efficient to determine what
getValues was once rather than every time it was needed.

First, how can I make the method getValues "private" _and_ so it only
gets evaluated once?

Not sure what "private" means here. Having the logic selected only once
goes like this

if sys.version_info < (3,):
def getDictValues(dict):
return dict.itervalues()
else:
def getDictValues(dict):
return dict.values()
Secondly, will the body f the distinct method be
evaluated immediately?
Yes.

How can I delay building the dict until the first value is requested?

Make it a generator:

def distinct(iterable, keySelector = (lambda x: x)):
lookup = {}
for item in iterable:
key = keySelector(item)
if key not in lookup:
lookup[key] = item
for v in getValues(lookup):
yield v

This delays *building* the dictionary until the *first* value is
requested. I.e. it completes building the dictionary before the first
value is returned.

If you also want to interleave iteration over iterable with fetching
distinct values, write it like that:

def distinct(iterable, keySelector = (lambda x: x)):
seen = {}
for item in iterable:
key = keySelector(item)
if key not in seen:
yield item
seen[key] = item
I noticed that hashing is a lot different in Python than it is in .NET
languages. .NET supports custom "equality comparers" that can override
a type's Equals and GetHashCode functions. This is nice when you can't
change the class you are hashing. That is why I am using a key
selector in my code, here. Is there a better way of overriding the
default hashing of a type without actually modifying its definition? I
figured a requesting a key was the easiest way.

You could provide a Key class that takes a hash function and a value
function:

class Key:
def __init__(self, value, hash, eq):
self.value, self.hash, self.eq = value, hash, eq
def __hash__(self):
return self.hash(self.value)
def __eq__(self, other_key):
return self.eq(self.value, other_key.value)

This class would then be used instead of your keySelector.

With that, you could change the dictionary to a set. Actually, you
could already do so in the second generator version:

def distinct(iterable, keySelector = (lambda x: x)):
seen = set()
for item in iterable:
key = keySelector(item)
if key not in seen:
yield item
seen.add(key) # item is not needed anymore

HTH,
Martin
 
I

Ian Kelly

if sys.version_info < (3,):
def getDictValues(dict):
return dict.itervalues()
else:
def getDictValues(dict):
return dict.values()

The extra level of function call indirection is unnecessary here.
Better to write it as:

if sys.version_info < (3,):
getDictValues = dict.itervalues
else:
getDictValues = dict.values

(which is basically what the OP was doing in the first place).
You could provide a Key class that takes a hash function and a value
function:

class Key:
 def __init__(self, value, hash, eq):
   self.value, self.hash, self.eq = value, hash, eq
 def __hash__(self):
   return self.hash(self.value)
 def __eq__(self, other_key):
   return self.eq(self.value, other_key.value)

This class would then be used instead of your keySelector.

For added value, you can make it a class factory so you don't have to
specify hash and eq over and over:

def Key(keyfunc):
class Key:
def __init__(self, value):
self.value = value
def __hash__(self):
return hash(keyfunc(self.value))
def __eq__(self, other):
return keyfunc(self) == keyfunc(other)
return Key

KeyTypeAlpha = Key(lambda x: x % 7)

items = set(KeyTypeAlpha(value) for value in sourceIterable)

Cheers,
Ian
 
G

Gregory Ewing

Ian said:
if sys.version_info < (3,):
getDictValues = dict.itervalues
else:
getDictValues = dict.values

(which is basically what the OP was doing in the first place).

And which he seemed to think didn't work for some
reason, but it seems fine as far as I can tell:

Python 2.7 (r27:82500, Oct 15 2010, 21:14:33)
[GCC 4.2.1 (Apple Inc. build 5664)] on darwin
Type "help", "copyright", "credits" or "license" for more information.<dictionary-valueiterator object at 0x2aa210>

% python3.1
Python 3.1.2 (r312:79147, Mar 2 2011, 17:43:12)
[GCC 4.2.1 (Apple Inc. build 5664)] on darwin
Type "help", "copyright", "credits" or "license" for more information.dict_values(['a', 'b'])
 
T

Travis Parks

Ian said:
if sys.version_info < (3,):
    getDictValues = dict.itervalues
else:
    getDictValues = dict.values
(which is basically what the OP was doing in the first place).

And which he seemed to think didn't work for some
reason, but it seems fine as far as I can tell:

Python 2.7 (r27:82500, Oct 15 2010, 21:14:33)
[GCC 4.2.1 (Apple Inc. build 5664)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
 >>> gv = dict.itervalues
 >>> d = {1:'a', 2:'b'}
 >>> gv(d)
<dictionary-valueiterator object at 0x2aa210>

% python3.1
Python 3.1.2 (r312:79147, Mar  2 2011, 17:43:12)
[GCC 4.2.1 (Apple Inc. build 5664)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
 >>> gv = dict.values
 >>> d = {1:'a', 2:'b'}
 >>> gv(d)
dict_values(['a', 'b'])

My problem was that I didn't understand the scoping rules. It is still
strange to me that the getValues variable is still in scope outside
the if/else branches.
 
G

Gabriel Genellina

My problem was that I didn't understand the scoping rules. It is still
strange to me that the getValues variable is still in scope outside
the if/else branches.

Those if/else are at global scope. An 'if' statement does not introduce a
new scope; so getDictValues, despite being "indented", is defined at
global scope, and may be used anywhere in the module.
 
T

Travis Parks

Those if/else are at global scope. An 'if' statement does not introduce a 
new scope; so getDictValues, despite being "indented", is defined at  
global scope, and may be used anywhere in the module.

Does that mean the rules would be different inside a function?
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,962
Messages
2,570,134
Members
46,690
Latest member
MacGyver

Latest Threads

Top