Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > Pickling dictionaries containing dictionaries: failing,recursion-style!

Reply
Thread Tools

Pickling dictionaries containing dictionaries: failing,recursion-style!

 
 
lysdexia
Guest
Posts: n/a
 
      12-01-2007
I'm having great fun playing with Markov chains. I am making a
dictionary of all the words in a given string, getting a count of how
many appearances word1 makes in the string, getting a list of all the
word2s that follow each appearance of word1 and a count of how many
times word2 appears in the string as well. (I know I should probably
be only counting how many times word2 actually follows word1, but as I
said, I'm having great fun playing ...)


printed output of the dictionary looks like so:

{'and': [1, {'to': 1}], 'down': [1, {'upon': 1}], 'them': [1, {'down':
1}], 'no': [1, {'others': 1}], 'this': [1, {'it': 1}], 'is': [2, {'a':
2}], 'upon': [1, {'a': 1}], 'it': [2, {'is': 2}], 'think': [2, {'and':
1, 'words': 1}], 'write': [1, {'this': 1}], 'to': [3, {'write': 1,
'put': 1, 'think': 1}], 'words': [1, {'no': 1}], 'others': [1,
{'think': 1}], 'put': [1, {'them': 1}], 'sin': [2, {'to': 2}]}

Here's the actual function.

def assembleVocab(self):
self.wordDB = {}
for word in self.words:
try:
if not word in self.wordDB.keys():
wordsWeights = {}
afterwords = [self.words[i + 1] for i, e in
enumerate(self.words) if e == word]
for aw in afterwords:
if not aw in wordsWeights.keys():
wordsWeights[aw] = afterwords.count(aw)
self.wordDB[word] = [self.words.count(word), wordsWeights]
except:
pass
out = open("mchain.pkl",'wb')
pickle.dump(self.wordDB, out, -1)
out.close()

My problem is, I can't seem to get it to unpickle. When I attempt to
load the
saved data, I get:

AttributeError: 'tuple' object has no attribute 'readline'

with pickle, and

TypeError: argument must have 'read' and 'readline' attributes

Looking at the pickle pages on docs.python.org, I see that I am
indeed
supposed to be able to pickle ``tuples, lists, sets, and dictionaries
containing only picklable objects''.

I'm sure I'm missing something obvious. Clues?
 
Reply With Quote
 
 
 
 
Paul Rubin
Guest
Posts: n/a
 
      12-01-2007
lysdexia <(E-Mail Removed)> writes:
> self.wordDB[word] = [self.words.count(word), wordsWeights]


what is self.words.count? Could it be an iterator? I don't think you
can pickle those.
 
Reply With Quote
 
 
 
 
David Tweet
Guest
Posts: n/a
 
      12-01-2007
Are you opening the file in binary mode ("rb") before doing pickle.load on it?

On 01 Dec 2007 14:13:33 -0800, Paul Rubin
<"http://phr.cx"@nospam.invalid> wrote:
> lysdexia <(E-Mail Removed)> writes:
> > self.wordDB[word] = [self.words.count(word), wordsWeights]

>
> what is self.words.count? Could it be an iterator? I don't think you
> can pickle those.
>
> --
> http://mail.python.org/mailman/listinfo/python-list
>




--
-David
 
Reply With Quote
 
John Machin
Guest
Posts: n/a
 
      12-01-2007
On Dec 2, 9:13 am, Paul Rubin <http://(E-Mail Removed)> wrote:
> lysdexia <(E-Mail Removed)> writes:
> > self.wordDB[word] = [self.words.count(word), wordsWeights]

>
> what is self.words.count? Could it be an iterator? I don't think you
> can pickle those.


Whaaaat??
self.words is obviously an iterable (can you see "for word in
self.words" in his code?), probably just a list.
self.words.count looks like a standard sequence method to me.
self.words.count(word) will return an int -- can you see all those
"[1,", "[2," etc in his printed dict output?
 
Reply With Quote
 
Paul Rubin
Guest
Posts: n/a
 
      12-01-2007
John Machin <(E-Mail Removed)> writes:
> self.words is obviously an iterable (can you see "for word in
> self.words" in his code?), probably just a list.


It could be a file, in which case its iterator method would read lines
from the file and cause that error message. But I think the answer is
that the pickle itself needs to be opened in binary mode, as someone
else posted.
 
Reply With Quote
 
John Machin
Guest
Posts: n/a
 
      12-01-2007
On Dec 2, 8:59 am, lysdexia <(E-Mail Removed)> wrote:
> I'm having great fun playing with Markov chains. I am making a
> dictionary of all the words in a given string, getting a count of how
> many appearances word1 makes in the string, getting a list of all the
> word2s that follow each appearance of word1 and a count of how many
> times word2 appears in the string as well. (I know I should probably
> be only counting how many times word2 actually follows word1, but as I
> said, I'm having great fun playing ...)
>
> printed output of the dictionary looks like so:
>
> {'and': [1, {'to': 1}], 'down': [1, {'upon': 1}], 'them': [1, {'down':
> 1}], 'no': [1, {'others': 1}], 'this': [1, {'it': 1}], 'is': [2, {'a':
> 2}], 'upon': [1, {'a': 1}], 'it': [2, {'is': 2}], 'think': [2, {'and':
> 1, 'words': 1}], 'write': [1, {'this': 1}], 'to': [3, {'write': 1,
> 'put': 1, 'think': 1}], 'words': [1, {'no': 1}], 'others': [1,
> {'think': 1}], 'put': [1, {'them': 1}], 'sin': [2, {'to': 2}]}
>
> Here's the actual function.
>
> def assembleVocab(self):
> self.wordDB = {}
> for word in self.words:
> try:
> if not word in self.wordDB.keys():
> wordsWeights = {}
> afterwords = [self.words[i + 1] for i, e in
> enumerate(self.words) if e == word]
> for aw in afterwords:
> if not aw in wordsWeights.keys():
> wordsWeights[aw] = afterwords.count(aw)
> self.wordDB[word] = [self.words.count(word), wordsWeights]
> except:
> pass
> out = open("mchain.pkl",'wb')
> pickle.dump(self.wordDB, out, -1)
> out.close()
>
> My problem is, I can't seem to get it to unpickle. When I attempt to
> load the
> saved data, I get:
>
> AttributeError: 'tuple' object has no attribute 'readline'
>
> with pickle, and
>
> TypeError: argument must have 'read' and 'readline' attributes


The code that created the dictionary is interesting, but not very
relevant. Please consider posting the code that is actually giving the
error!
>
> Looking at the pickle pages on docs.python.org, I see that I am
> indeed
> supposed to be able to pickle ``tuples, lists, sets, and dictionaries
> containing only picklable objects''.
>
> I'm sure I'm missing something obvious. Clues?


The docs for pickle.load(file) say """
Read a string from the open file object file and interpret it as a
pickle data stream, reconstructing and returning the original object
hierarchy. This is equivalent to Unpickler(file).load().

file must have two methods, a read() method that takes an integer
argument, and a readline() method that requires no arguments. Both
methods should return a string. Thus file can be a file object opened
for reading, a StringIO object, or any other custom object that meets
this interface.
"""

The error message(s) [plural??] that you are getting suggest(s) that
the argument that you supplied was *not* an open file object nor
anything else with both a read and readline method. Open the file in
binary mode ('rb') and pass the result to pickle.load.
 
Reply With Quote
 
John Machin
Guest
Posts: n/a
 
      12-02-2007
On Dec 2, 9:49 am, Paul Rubin <http://(E-Mail Removed)> wrote:
> John Machin <(E-Mail Removed)> writes:
> > self.words is obviously an iterable (can you see "for word in
> > self.words" in his code?), probably just a list.

>
> It could be a file, in which case its iterator method would read lines
> from the file and cause that error message.


Impossible:
(1) in "for word in words:" each word would end in "\n" and he'd have
to strip those and there's no evidence of that.
(2) Look at the line """afterwords = [self.words[i + 1] for i, e in
enumerate(self.words) if e == word]"""
and tell me how that works if self.words is a file!
(3) "self.words.count(word)" -- AttributeError: 'file' object has no
attribute 'count'


> But I think the answer is
> that the pickle itself needs to be opened in binary mode, as someone
> else posted.


The answer is (1) he needs to supply a file of any kind for a start
[read the error messages that he got!!]
(2) despite the silence of the docs, it is necessary to have opened
the file in binary mode on systems where it makes a difference
(notably Windows)

[If the OP is still reading this thread, here's an example of how to
show a problem, with minimal code that reproduces the problem, and all
the output including the stack trace]

C:\junk>type dpkl.py
import pickle

d = {'and': [1, {'to': 1}], 'down': [1, {'upon': 1}], 'them': [1,
{'down':
1}], 'no': [1, {'others': 1}], 'this': [1, {'it': 1}], 'is': [2, {'a':
2}], 'upon': [1, {'a': 1}], 'it': [2, {'is': 2}], 'think': [2, {'and':
1, 'words': 1}], 'write': [1, {'this': 1}], 'to': [3, {'write': 1,
'put': 1, 'think': 1}], 'words': [1, {'no': 1}], 'others': [1,
{'think': 1}], 'put': [1, {'them': 1}], 'sin': [2, {'to': 2}]}

s = pickle.dumps(d, -1)
dnews = pickle.loads(s)
print "string", dnews == d

out = open("mchain.pkl",'wb')
pickle.dump(d, out, -1)
out.close()

f = open("mchain.pkl", "rb")
dnewb = pickle.load(f)
f.close()
print "load binary", dnewb == d

f = open("mchain.pkl", "r")
dnewa = pickle.load(f)
f.close()
print "load text", dnewa == d

C:\junk>python dpkl.py
string True
load binary True
Traceback (most recent call last):
File "dpkl.py", line 24, in <module>
dnewa = pickle.load(f)
File "c:\python25\lib\pickle.py", line 1370, in load
return Unpickler(file).load()
File "c:\python25\lib\pickle.py", line 858, in load
dispatch[key](self)
File "c:\python25\lib\pickle.py", line 1169, in load_binput
i = ord(self.read(1))
TypeError: ord() expected a character, but string of length 0 found

Changing the first line to
import cPickle as pickle
gives this:

C:\junk>python dpkl.py
string True
load binary True
Traceback (most recent call last):
File "dpkl.py", line 24, in <module>
dnewa = pickle.load(f)
EOFError

Each of the two different errors indicate that reading was terminated
prematurely by the presence of the good ol' ^Z aka CPMEOF in the file:

>>> s = open('mchain.pkl', 'rb').read()
>>> s.find(chr(26))

179
>>> len(s)

363

HTH,
John
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
updating dictionaries from/to dictionaries Brandon Python 12 08-15-2008 12:35 AM
Pickling an instance of a class containing a dict doesn't work Marco Lierfeld Python 6 10-13-2006 07:27 AM
pickling multiple dictionaries manstey Python 3 05-24-2006 11:38 PM
PS: Help pickling across sockets Jonathan Hayward Python 1 08-17-2003 02:39 AM
Help pickling across sockets Jonathan Hayward Python 0 08-17-2003 12:51 AM



Advertisments