Pickling dictionaries containing dictionaries: failing,recursion-style!

L

lysdexia

I'm having great fun playing with Markov chains. I am making a
dictionary of all the words in a given string, getting a count of how
many appearances word1 makes in the string, getting a list of all the
word2s that follow each appearance of word1 and a count of how many
times word2 appears in the string as well. (I know I should probably
be only counting how many times word2 actually follows word1, but as I
said, I'm having great fun playing ...)


printed output of the dictionary looks like so:

{'and': [1, {'to': 1}], 'down': [1, {'upon': 1}], 'them': [1, {'down':
1}], 'no': [1, {'others': 1}], 'this': [1, {'it': 1}], 'is': [2, {'a':
2}], 'upon': [1, {'a': 1}], 'it': [2, {'is': 2}], 'think': [2, {'and':
1, 'words': 1}], 'write': [1, {'this': 1}], 'to': [3, {'write': 1,
'put': 1, 'think': 1}], 'words': [1, {'no': 1}], 'others': [1,
{'think': 1}], 'put': [1, {'them': 1}], 'sin': [2, {'to': 2}]}

Here's the actual function.

def assembleVocab(self):
self.wordDB = {}
for word in self.words:
try:
if not word in self.wordDB.keys():
wordsWeights = {}
afterwords = [self.words[i + 1] for i, e in
enumerate(self.words) if e == word]
for aw in afterwords:
if not aw in wordsWeights.keys():
wordsWeights[aw] = afterwords.count(aw)
self.wordDB[word] = [self.words.count(word), wordsWeights]
except:
pass
out = open("mchain.pkl",'wb')
pickle.dump(self.wordDB, out, -1)
out.close()

My problem is, I can't seem to get it to unpickle. When I attempt to
load the
saved data, I get:

AttributeError: 'tuple' object has no attribute 'readline'

with pickle, and

TypeError: argument must have 'read' and 'readline' attributes

Looking at the pickle pages on docs.python.org, I see that I am
indeed
supposed to be able to pickle ``tuples, lists, sets, and dictionaries
containing only picklable objects''.

I'm sure I'm missing something obvious. Clues?
 
P

Paul Rubin

lysdexia said:
self.wordDB[word] = [self.words.count(word), wordsWeights]

what is self.words.count? Could it be an iterator? I don't think you
can pickle those.
 
D

David Tweet

Are you opening the file in binary mode ("rb") before doing pickle.load on it?

lysdexia said:
self.wordDB[word] = [self.words.count(word), wordsWeights]

what is self.words.count? Could it be an iterator? I don't think you
can pickle those.
 
J

John Machin

lysdexia said:
self.wordDB[word] = [self.words.count(word), wordsWeights]

what is self.words.count? Could it be an iterator? I don't think you
can pickle those.

Whaaaat??
self.words is obviously an iterable (can you see "for word in
self.words" in his code?), probably just a list.
self.words.count looks like a standard sequence method to me.
self.words.count(word) will return an int -- can you see all those
"[1,", "[2," etc in his printed dict output?
 
P

Paul Rubin

John Machin said:
self.words is obviously an iterable (can you see "for word in
self.words" in his code?), probably just a list.

It could be a file, in which case its iterator method would read lines
from the file and cause that error message. But I think the answer is
that the pickle itself needs to be opened in binary mode, as someone
else posted.
 
J

John Machin

I'm having great fun playing with Markov chains. I am making a
dictionary of all the words in a given string, getting a count of how
many appearances word1 makes in the string, getting a list of all the
word2s that follow each appearance of word1 and a count of how many
times word2 appears in the string as well. (I know I should probably
be only counting how many times word2 actually follows word1, but as I
said, I'm having great fun playing ...)

printed output of the dictionary looks like so:

{'and': [1, {'to': 1}], 'down': [1, {'upon': 1}], 'them': [1, {'down':
1}], 'no': [1, {'others': 1}], 'this': [1, {'it': 1}], 'is': [2, {'a':
2}], 'upon': [1, {'a': 1}], 'it': [2, {'is': 2}], 'think': [2, {'and':
1, 'words': 1}], 'write': [1, {'this': 1}], 'to': [3, {'write': 1,
'put': 1, 'think': 1}], 'words': [1, {'no': 1}], 'others': [1,
{'think': 1}], 'put': [1, {'them': 1}], 'sin': [2, {'to': 2}]}

Here's the actual function.

def assembleVocab(self):
self.wordDB = {}
for word in self.words:
try:
if not word in self.wordDB.keys():
wordsWeights = {}
afterwords = [self.words[i + 1] for i, e in
enumerate(self.words) if e == word]
for aw in afterwords:
if not aw in wordsWeights.keys():
wordsWeights[aw] = afterwords.count(aw)
self.wordDB[word] = [self.words.count(word), wordsWeights]
except:
pass
out = open("mchain.pkl",'wb')
pickle.dump(self.wordDB, out, -1)
out.close()

My problem is, I can't seem to get it to unpickle. When I attempt to
load the
saved data, I get:

AttributeError: 'tuple' object has no attribute 'readline'

with pickle, and

TypeError: argument must have 'read' and 'readline' attributes

The code that created the dictionary is interesting, but not very
relevant. Please consider posting the code that is actually giving the
error!
Looking at the pickle pages on docs.python.org, I see that I am
indeed
supposed to be able to pickle ``tuples, lists, sets, and dictionaries
containing only picklable objects''.

I'm sure I'm missing something obvious. Clues?

The docs for pickle.load(file) say """
Read a string from the open file object file and interpret it as a
pickle data stream, reconstructing and returning the original object
hierarchy. This is equivalent to Unpickler(file).load().

file must have two methods, a read() method that takes an integer
argument, and a readline() method that requires no arguments. Both
methods should return a string. Thus file can be a file object opened
for reading, a StringIO object, or any other custom object that meets
this interface.
"""

The error message(s) [plural??] that you are getting suggest(s) that
the argument that you supplied was *not* an open file object nor
anything else with both a read and readline method. Open the file in
binary mode ('rb') and pass the result to pickle.load.
 
J

John Machin

It could be a file, in which case its iterator method would read lines
from the file and cause that error message.

Impossible:
(1) in "for word in words:" each word would end in "\n" and he'd have
to strip those and there's no evidence of that.
(2) Look at the line """afterwords = [self.words[i + 1] for i, e in
enumerate(self.words) if e == word]"""
and tell me how that works if self.words is a file!
(3) "self.words.count(word)" -- AttributeError: 'file' object has no
attribute 'count'

But I think the answer is
that the pickle itself needs to be opened in binary mode, as someone
else posted.

The answer is (1) he needs to supply a file of any kind for a start
[read the error messages that he got!!]
(2) despite the silence of the docs, it is necessary to have opened
the file in binary mode on systems where it makes a difference
(notably Windows)

[If the OP is still reading this thread, here's an example of how to
show a problem, with minimal code that reproduces the problem, and all
the output including the stack trace]

C:\junk>type dpkl.py
import pickle

d = {'and': [1, {'to': 1}], 'down': [1, {'upon': 1}], 'them': [1,
{'down':
1}], 'no': [1, {'others': 1}], 'this': [1, {'it': 1}], 'is': [2, {'a':
2}], 'upon': [1, {'a': 1}], 'it': [2, {'is': 2}], 'think': [2, {'and':
1, 'words': 1}], 'write': [1, {'this': 1}], 'to': [3, {'write': 1,
'put': 1, 'think': 1}], 'words': [1, {'no': 1}], 'others': [1,
{'think': 1}], 'put': [1, {'them': 1}], 'sin': [2, {'to': 2}]}

s = pickle.dumps(d, -1)
dnews = pickle.loads(s)
print "string", dnews == d

out = open("mchain.pkl",'wb')
pickle.dump(d, out, -1)
out.close()

f = open("mchain.pkl", "rb")
dnewb = pickle.load(f)
f.close()
print "load binary", dnewb == d

f = open("mchain.pkl", "r")
dnewa = pickle.load(f)
f.close()
print "load text", dnewa == d

C:\junk>python dpkl.py
string True
load binary True
Traceback (most recent call last):
File "dpkl.py", line 24, in <module>
dnewa = pickle.load(f)
File "c:\python25\lib\pickle.py", line 1370, in load
return Unpickler(file).load()
File "c:\python25\lib\pickle.py", line 858, in load
dispatch[key](self)
File "c:\python25\lib\pickle.py", line 1169, in load_binput
i = ord(self.read(1))
TypeError: ord() expected a character, but string of length 0 found

Changing the first line to
import cPickle as pickle
gives this:

C:\junk>python dpkl.py
string True
load binary True
Traceback (most recent call last):
File "dpkl.py", line 24, in <module>
dnewa = pickle.load(f)
EOFError

Each of the two different errors indicate that reading was terminated
prematurely by the presence of the good ol' ^Z aka CPMEOF in the file:
363

HTH,
John
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,995
Messages
2,570,230
Members
46,817
Latest member
DicWeils

Latest Threads

Top