Serious problem with Shelve

R

Rami A. Kishek

Hi - this mysterious behavior with shelve is just about to kill me. I
hope someone here can shed some light. First of all, I have this piece
of code which uses shelve to save instances of some class I define. It
works perfectly on an old machine (PII-400) running Python 2.2.1 under
RedHat Linux 8.0. When I try to run it under Python for windows ME on a
P-4 1.4 GHz, however, it keeps crashing on reading from the shelved file
the second time I try to access it. The Windows machine was originally
running python 1.5.2, so I upgraded to 2.2.3, thinking that would solve
the problem, but it didn't!

This is what the error looks like:
tmprec = myrecs[key]
File "D:\PROGRAMS\PYTHON22\lib\shelve.py", line 70, in __getitem__
f = StringIO(self.dict[key])
KeyError: A_G_08631616188
^

Notes:
Here's what my program does (it is too much code to include here).
I have 4 related modules: one containing the class definitions (in all
other modules I use from classfile import ___); the second module builds
the shelve file by parsing a large text file containing the data,
building classes; the third re-opens the file later to do reading and
writing operations; and the 4th module is a GUI controller that simple
calls the appropriate functions from the other 2 modules.

The main breakdown occurs in module 3. Significantly, I initially had
this module set up as a script in which everything was done on the
module level, and it was working fine (apparently). The problems
started appearing when I wrapped code inside functions (I need to do
that since I want to call it from other modules, and I have about 4000
lines of code altogether!). I spent painstaking hours trying to isolate
the problem - I pass the open shelve file as a parameter to all the
functions that need it, and I close it properly using try: finally
statements after every use. I also make sure all the keys that go in
there are unique.

What module 3 does is a series of short reads and writes to the shelve
file. First I test if a particular key is in there - if it is not, I
add an item, if it is, I read the existing item, update it, then write
it back like this:

tmprec = myrecs[key] # I read a particular instance from the shelve
file
tmprec.field = 1 # I update one field
#del myrevs[key] # Commented lines are things I tried while
debugging
#myrecs.sync() #
myrecs[key] = tmprec # Then I write it back to the shelve file
#myrecs.sync()

This one function apppears to be the guilty party. When I comment it
out the crash stops. However it is a vital function for my program and
I need to do it. Note that deleting the original item before reqwriting
it helped reduce the frequency of crashes, but didn't eliminate it
completely. The other possibility (which is why I unsuccessfully tried
the .sync() lines) is that it has to do with the timing of writing to
disk. The library reference is vague about this, saying that shelve is
incapable of simultanteous reads and writes, so the file shouldn't be
opened twice for write. However it does not say whether this implies we
cannot read and write like this in quick succession.

More details:
* The first run of module 3 after creating the shelve file doesn't
crash, although I suspect it is doing something funny.
* The second time I get that error above, keeping in mind I am supposed
to have a key in there called "A_G_0863161618" (without the extra '8' at
the end), so the database is already corrupted. So the key
'A_G_08631616188' is in myshelvefile.keys(), the original is no more,
yet NEITHER can be accesed using myshelvefile[key]!
* After creation, the shelve file size is only 71 kB. After running
module 3 - which is supposed to mostly read and not really change the
file much - the size jumps to 110 kB!
* If I open the file in a text editor, I notice all sorts of things that
are not supposed to be there (like directory paths, etc), indicating it
is corrupted. I do not see those things when I open the file on the
good (Linux) machine.
* I did a scandisk to ensure the disk is OK and it is.
 
T

Tim Churches

Hi - this mysterious behavior with shelve is just about to kill me. I
hope someone here can shed some light. First of all, I have this piece
of code which uses shelve to save instances of some class I define. It
works perfectly on an old machine (PII-400) running Python 2.2.1 under
RedHat Linux 8.0. When I try to run it under Python for windows ME on a
P-4 1.4 GHz, however, it keeps crashing on reading from the shelved file
the second time I try to access it. The Windows machine was originally
running python 1.5.2, so I upgraded to 2.2.3, thinking that would solve
the problem, but it didn't!

In Python 2.2 or earlier, by default, shelve uses the Berkeley database
1.8 libraries, which we have found to be seriously broken on all
platforms we have tried them on. Upgrading to a later version of the
Berkeley libraries and using the pybsddb module fixed the mysterious,
inconsistent crashes and segfaults we were seeing with shelve (and which
were also driving us crazy). The easiest way to upgrade is to move to
Python 2.3, which includes these later versions, but you can also
easily install them under earlier version of Python (at least under
2.2).
--

Tim C

PGP/GnuPG Key 1024D/EAF993D0 available from keyservers everywhere
or at http://members.optushome.com.au/tchur/pubkey.asc
Key fingerprint = 8C22 BF76 33BA B3B5 1D5B EB37 7891 46A9 EAF9 93D0



-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.7 (GNU/Linux)

iD8DBQA/P+VBeJFGqer5k9ARAkuAAKD3bR7ei6rB4XT+Mk9ifT64gUEM5gCeIBwO
96YcIZ0DQ7H74iRHLkzcVlc=
=RXEg
-----END PGP SIGNATURE-----
 
R

Rami A. Kishek

Well - I installed Python 2.3, but it still doesn't. My program now
crashes on the first pass. After deleting the old databases and
creating new ones, I opened them for read and this is what I get:

self.revs = shelve.open(os.path.join(tgtdir, dbfn))
File "D:\PROGRAMS\PYTHON23\lib\shelve.py", line 231, in open
return DbfilenameShelf(filename, flag, protocol, writeback, binary)
File "D:\PROGRAMS\PYTHON23\lib\shelve.py", line 212, in __init__
Shelf.__init__(self, anydbm.open(filename, flag), protocol,
writeback, binary)
File "D:\PROGRAMS\PYTHON23\lib\anydbm.py", line 82, in open
mod = __import__(result)
ImportError: No module named bsddb185


I will try enclosing that import bsddb185 in anydbm.py in try: except:,
though I hate messing around with source files, and there may be many
more such problems. Python developers, be aware of this glitch.
 
A

Andrew MacIntyre

File "D:\PROGRAMS\PYTHON23\lib\shelve.py", line 231, in open
return DbfilenameShelf(filename, flag, protocol, writeback, binary)
File "D:\PROGRAMS\PYTHON23\lib\shelve.py", line 212, in __init__
Shelf.__init__(self, anydbm.open(filename, flag), protocol,
writeback, binary)
File "D:\PROGRAMS\PYTHON23\lib\anydbm.py", line 80, in open
raise error, "db type could not be determined"
error: db type could not be determined

Incidentally, on the other machine I mentioned (the one on which shelve
worked perfectly with 2.2.3) shelve still works perfectly after
upgrading to 2.3. Since that is a Linux 2 machine, I figure perhaps it
is using a different db like gdbm or something ...

Your shelve file is in DB v1.85 format. Commenting out the lines in
which.py didn't do anything except deny the shelve module information
about what the format actually _is_.

You'll need to find/build a v1.85 compatible module to read the shelve
then write it out in a later format.
 
S

Skip Montanaro

Rami> Well - I installed Python 2.3, but it still doesn't. My program
Rami> now crashes on the first pass. After deleting the old databases
Rami> and creating new ones, I opened them for read and this is what I
Rami> get:

How did you create those new databases, using an older version of Python
perhaps? What's happening is that whichdb.whichdb() determined that the
file you passed into anydbm.open() was an old hash style database, which can
only be opened in Python 2.3 by the old v 1.85 library, which is only
exposed through the bsddb185 module.

Rami> I will try enclosing that import bsddb185 in anydbm.py in try:
Rami> except:, though I hate messing around with source files, and there
Rami> may be many more such problems. Python developers, be aware of
Rami> this glitch.

That won't work. What's anydbm.open() going to use to open the file?

Can you explain how the files were created? (Sorry if you explained
already. I'm just coming to this thread.)

If you have Python 2.1 or 2.2 laying around with a bsddb module which can
read the file in question, use Tools/scripts/db2pickle.py to convert the
file to a pickle, then with Python 2.3, run Tools/scripts/pickle2db.py to
convert the pickle back to a db file, using the new bsddb. Those two
scripts are in the Python 2.3 distribution, but not the Python 2.2
distribution. They should work with Python 2.1 or 2.2, however. This
problem is exactly why I wrote them.

Synopsis:

python2.2 db2pickle.py olddbfile pickle.pck
python2.3 pickle2db.py newdbfile pickle.pck

Skip
 
S

Skip Montanaro

Rami> Incidentally, on the other machine I mentioned (the one on which
Rami> shelve worked perfectly with 2.2.3) shelve still works perfectly
Rami> after upgrading to 2.3. Since that is a Linux 2 machine, I figure
Rami> perhaps it is using a different db like gdbm or something ...

Try this using python 2.2.3 and python 2.3:

import whichdb
whichdb.whichdb(os.path.join(tgtdir, dbfn))

and see what it prints. That will keep you from guessing about the nature
of the file.

Skip
 
R

Rami A. Kishek

Thanks. With your help, I figured out one of the databases accessed WAS
created with an older Python, so I simply cleaned up that one and now
everything works!
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,982
Messages
2,570,190
Members
46,736
Latest member
zacharyharris

Latest Threads

Top