persistent composites

A

Aaron Brady

Hi, please forgive the multi-posting on this general topic.

Some time ago, I recommended a pursuit of keeping 'persistent
composite' types on disk, to be read and updated at other times by
other processes. Databases provide this functionality, with the
exception that field types in any given table are required to be
uniform. Python has no such restriction.

I tried out an implementation of composite collections, specifically
lists, sets, and dicts, using 'sqlite3' as a persistence back-end.
It's significantly slower, but we might argue that attempting to do it
by hand classifies as a premature optimization; it is easy to optimize
debugged code.

The essentials of the implementation are:
- each 'object' gets its own table.
= this includes immutable types
- a reference count table
= when an object's ref. count reaches zero, its table is dropped
- a type-map table
= maps object's table ID to a string of its type
- a single 'entry point' table, with the table ID of the entry-point
object
= the entry point is the only data structure available to new
connections. (I imagine it will usually be a list or dict.)

I will be sure to kill any interest you might have by now, by
"revealing" a snippet of code.

The object creation procedure:

def new_table( self, type ):
''' 'type' is a string, the name of the class the object is an
instance of '''
cur= self.conn.cursor( )
recs= cur.execute( '''SELECT max( tableid ) FROM refcounts''' )
rec= cur.fetchone( )
if rec[ 0 ] is None:
obid= 0
else:
obid= rec[ 0 ]+ 1
cur.execute( '''INSERT INTO types VALUES( ?, ? )''', ( obid,
type ) )
cur.execute( '''INSERT INTO refcounts VALUES( ?, ? )''', ( obid,
1 ) )

The increment ref. count procedure:

def incref( self, obid ):
cur= self.conn.cursor( )
recs= cur.execute( '''SELECT count FROM refcounts WHERE tableid
= ?''', ( obid, ) )
rec= recs.fetchone( )
newct= rec[ 0 ]+ 1
cur.execute( '''UPDATE refcounts SET count = ? WHERE tableid = ?''',
( newct, obid ) )

The top-level structure contains these two procedures, as well as
'setentry', 'getentry', and 'revive' procedures.

Most notably, user-defined types are possible. The dict is merely a
persistent dict. 'revive' checks the global namespace by name for the
original type, subject to the same restrictions that we all know and
love that 'pickle' has.

As usual, deadlocks and cyclic garbage pose the usual problems. The
approach I used last time was to maintain a graph of acquired locks,
and merely check for cycles to avert deadlocks, which would go in a
separate table. For garbage, I can't find a better solution than
Python already uses.

From the 3.0 docs:
gc.garbage

A list of objects which the collector found to be unreachable but
could not be freed (uncollectable objects).
....
Python doesn’t collect such [garbage] cycles automatically because, in
general, it isn’t possible for Python to guess a safe order in which
to run the __del__() methods. If you know a safe order, you can force
the issue by examining the garbage list, and explicitly breaking
cycles due to your objects within the list.

Before I go and flesh out the entire interfaces for the provided
types, does anyone have a use for it?
 
K

kindly

Hi, please forgive the multi-posting on this general topic.

Some time ago, I recommended a pursuit of keeping 'persistent
composite' types on disk, to be read and updated at other times by
other processes.  Databases provide this functionality, with the
exception that field types in any given table are required to be
uniform.  Python has no such restriction.

I tried out an implementation of composite collections, specifically
lists, sets, and dicts, using 'sqlite3' as a persistence back-end.
It's significantly slower, but we might argue that attempting to do it
by hand classifies as a premature optimization; it is easy to optimize
debugged code.

The essentials of the implementation are:
  - each 'object' gets its own table.
    = this includes immutable types
  - a reference count table
    = when an object's ref. count reaches zero, its table is dropped
  - a type-map table
    = maps object's table ID to a string of its type
  - a single 'entry point' table, with the table ID of the entry-point
object
    = the entry point is the only data structure available to new
connections.  (I imagine it will usually be a list or dict.)

I will be sure to kill any interest you might have by now, by
"revealing" a snippet of code.

The object creation procedure:

def new_table( self, type ):
  ''' 'type' is a string, the name of the class the object is an
instance of '''
  cur= self.conn.cursor( )
  recs= cur.execute( '''SELECT max( tableid ) FROM refcounts''' )
  rec= cur.fetchone( )
  if rec[ 0 ] is None:
    obid= 0
  else:
    obid= rec[ 0 ]+ 1
  cur.execute( '''INSERT INTO types VALUES( ?, ? )''', ( obid,
type ) )
  cur.execute( '''INSERT INTO refcounts VALUES( ?, ? )''', ( obid,
1 ) )

The increment ref. count procedure:

def incref( self, obid ):
  cur= self.conn.cursor( )
  recs= cur.execute( '''SELECT count FROM refcounts WHERE tableid
= ?''', ( obid, ) )
  rec= recs.fetchone( )
  newct= rec[ 0 ]+ 1
  cur.execute( '''UPDATE refcounts SET count = ? WHERE tableid = ?''',
( newct, obid ) )

The top-level structure contains these two procedures, as well as
'setentry', 'getentry', and 'revive' procedures.

Most notably, user-defined types are possible.  The dict is merely a
persistent dict.  'revive' checks the global namespace by name for the
original type, subject to the same restrictions that we all know and
love that 'pickle' has.

As usual, deadlocks and cyclic garbage pose the usual problems.  The
approach I used last time was to maintain a graph of acquired locks,
and merely check for cycles to avert deadlocks, which would go in a
separate table.  For garbage, I can't find a better solution than
Python already uses.

From the 3.0 docs:
gc.garbage

    A list of objects which the collector found to be unreachable but
could not be freed (uncollectable objects).
...
Python doesn’t collect such [garbage] cycles automatically because, in
general, it isn’t possible for Python to guess a safe order in which
to run the __del__() methods. If you know a safe order, you can force
the issue by examining the garbage list, and explicitly breaking
cycles due to your objects within the list.

Before I go and flesh out the entire interfaces for the provided
types, does anyone have a use for it?

I like it as a concept, have not got any use for it this minute, but I
am sure it will be useful someday.
 
S

Steven D'Aprano

Aaron said:
Some time ago, I recommended a pursuit of keeping 'persistent
composite' types on disk, to be read and updated at other times by
other processes. Databases provide this functionality, with the
exception that field types in any given table are required to be
uniform. Python has no such restriction. ....
I will be sure to kill any interest you might have by now, by
"revealing" a snippet of code.

How about telling us what problem you're trying to solve? Giving us your
implementation isn't useful if we don't even know what it's for. Persistent
data on disk, okay, I get that -- but how is it different from (say) XML,
or INI files, or pickles, or similar? Is the idea to have *live* data
stored on disk, updated by multiple processes simultaneously? Don't assume
that people will remember your recommendation from "some time ago".
 
A

Aaron Brady

How about telling us what problem you're trying to solve? Giving us your
implementation isn't useful if we don't even know what it's for. Persistent
data on disk, okay, I get that -- but how is it different from (say) XML,
or INI files, or pickles, or similar? Is the idea to have *live* data
stored on disk, updated by multiple processes simultaneously? Don't assume
that people will remember your recommendation from "some time ago".

It was no time at all in the Usenet world, I suppose. There are lots
of strategies for storing data on a disk, and lots of strategies for
storing data in memory. I was trying on disk what Python uses on
memory.

If you want to argue that Python isn't useful, you're in the wrong
place. If you want to argue that Python's strategy is useful on
disks, you're talking to the right person.

I'll draw an analogy. static data structures : sql & co. :: Python
data structures : this undertaking. It has all the advantages over
SQL & XML structures that Python has over C structures: sort of a
'worst of neither' combination. I haven't written a textbook on DSs,
though, so I don't quite know where to begin. In my eyes, you're
practically asking, "What's your use case for Python?" Either that,
or I don't know enough other languages.

I have nothing to use it for, but someone might, and it might be just
plumb interesting to fellow Pythoneers. If I did, I'd have somewhere
to start.
 
D

Diez B. Roggisch

Aaron said:
Hi, please forgive the multi-posting on this general topic.

Some time ago, I recommended a pursuit of keeping 'persistent
composite' types on disk, to be read and updated at other times by
other processes. Databases provide this functionality, with the
exception that field types in any given table are required to be
uniform. Python has no such restriction.

I tried out an implementation of composite collections, specifically
lists, sets, and dicts, using 'sqlite3' as a persistence back-end.
It's significantly slower, but we might argue that attempting to do it
by hand classifies as a premature optimization; it is easy to optimize
debugged code.

<snip/>

Sounds like you are re-inventing the ZODB.

Diez
 
A

Aaron Brady

<snip/>

Sounds like you are re-inventing the ZODB.

Diez

Alright, Diez. Here is some private consulting for free.

''Section 2.6.1:
The most common idiom that isn't caught by the ZODB is mutating a list
or dictionary''

My approach performs this for free.

The docs also don't mention interprocess communication, which is one
of the two primary functions that I satisfy in my approach.

The syntax is different, non-trivially so, and mine is more Pythonic.
 
A

Aaron Brady

...or SQLAlchemy or pickles in a SQL BLOB or...

I recognize your aversion to my claim of making a new approach on
something. I suppose an austere review of literature would compare
with the existing alternatives.

My approach does not define tables, so it is not SQL Alchemy; it is
not mere sugar for SQL. It defines a 'set' class and a 'tuple' class,
so some of the functionality may (and probably should be expected to)
overlap.

http://www.sqlalchemy.org/docs/05/ormtutorial.html#define-and-create-a-table

My approach does not pickle objects, so it is not a mere BLOB pickle.
If a user writes, listA[ 50 ].append( "abcde" ), one addition is made
to an existing structure, and the remaining contents of 'listA' are
unread and unchanged.
 
A

Aaron Brady

I'm not much for the interface.  But the back end might match what I'm
doing.

Making the charitable interpretation that this was the extent of c-l-
py's support and enthusiasm for my idea, I will now go into mourning.
Death occurred at oh-eight-hundred. Rest in peace, support &
enthusiasm.
 
M

Mike Kazantsev

Making the charitable interpretation that this was the extent of c-l-
py's support and enthusiasm for my idea, I will now go into mourning.
Death occurred at oh-eight-hundred. Rest in peace, support &
enthusiasm.

I've read this thread from the beginning, being tempted to insert
remarks about shelve module or ORMs like SQLAlchemy, but that'd be
meaningless without the problem description, which I haven't seen
anywhere. Is it some trick idea like "let's walk on our heads"?

--
Mike Kazantsev // fraggod.net

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.11 (GNU/Linux)

iEYEARECAAYFAko3tZkACgkQASbOZpzyXnEj0QCgsJEobtw8Y+fW3FETniZGXfqP
ZRAAn3X8yYW5wbAlKYmtsKfSoa2HO8XB
=o/1R
-----END PGP SIGNATURE-----
 
A

Aaron Brady

I've read this thread from the beginning, being tempted to insert
remarks about shelve module or ORMs like SQLAlchemy, but that'd be
meaningless without the problem description, which I haven't seen
anywhere. Is it some trick idea like "let's walk on our heads"?

More like bronze them, or hang them on a tackboard. You haven't
convinced me that it's not a problem, or that it's an easy one.
 
R

Rhodri James

More like bronze them, or hang them on a tackboard. You haven't
convinced me that it's not a problem, or that it's an easy one.

Unfortunately it's up to you to demonstrate that it is a problem,
whichever of the many possible 'it's you're talking about. So far,
the question "Why would I want to use this? What's the use case?"
has gone unanswered, and I'm sure I'm not the only baffled by it.
 
A

Aaron Brady

Unfortunately it's up to you to demonstrate that it is a problem,
whichever of the many possible 'it's you're talking about.  So far,
the question "Why would I want to use this?  What's the use case?"
has gone unanswered, and I'm sure I'm not the only baffled by it.

I can demonstrate it's a problem; many things are.

The subject line says it all. SQL will persist (and share for IPC,
after a fashion) sets of statically typed tuples. But we have need
for sets of dynamically typed tuples, in volatile storage at the very
least, or no one would like Python.

Whether we have need for them in persistent storage remains to be
seen.

POSH Python Object SHaring has been at least one student's graduate
thesis. It couldn't hurt to pursue it if one has time, as with many
things. It's pretty likely my estimates of end user usefulness of
'KeepDB', say, derive sturdily and steadily from our estimated
popularity of Python. So much for my motives, though not for my
opportunities.

My implementation has followed two separate paths: one using SQL for a
back-end host; the other using hand-coded bytes on disk, complete with
large file support using views, and on-disk (or in-buffer) memory
management, which incidentally I spoke up about about a year ago. It
was the topic of an honors lecture in the Data Structures course in my
undergrad. For the record, I still can't beat a log-N balanced tree
for finding free nodes, but I am operating on the assumption that the
smallest sufficient region is the best to return from 'alloc'.

You are not being any help, Rhodri, in your question. Its chief
virtue is fidelity of programming, that is, what you write is most
true to what you mean. If you have an object on disk, and want the
third element of its 'coords' list attribute, it's a waste to do
anything other than find its disk address, then the disk address of
its 'coords' attribute, then the disk address of the third element of
that, such as say, loading or worse, processing, the entire structure;
even if, and in light of parallel considerations about economic market
transparency and efficiency, that is to say, economy, maybe /
especially/ if, you are having a lowly machine make do. You don't
want to open every box in your basement to find your tennis racket;
you want to open one box, find the racket in the index, then open all
and only the boxes that 'nestedly' contain the racket. (Yes, that's /
all/ and /only/, or 'if and only if'-- not to be confused with 'one
and only'.)

It's more economic.
 
R

Rhodri James

You are not being any help, Rhodri, in your question.

To you, perhaps not. To me, it has at least had the effect of making
what you're trying to do (write a pythonic object database) clearer.
 
A

Aahz

You are not being any help, Rhodri, in your question.

Maybe not, but honestly, you're getting pretty close to going back in my
killfile. Although you're no longer trolling this group, I find your
writing difficult to read at best; answering questions from people like
Rhodri is about your only chance to reach people like me. Even without
the killfile, I'm skipping about 60-80% of your posts. It's a free
Usenet, of course, so you're under no obligation to pay any attention to
me, but I think you ought to consider the merits of changing your
opinion of Rhodri's questions.

Side note: the rhetorical trick of inserting a person's name in a
statement is often intended and received as a mildly patronizing insult.
It's not at all clear to me whether that was your intent; if not, you
might want to change your writing style a bit.
 
A

Aaron Brady

Maybe not, but honestly, you're getting pretty close to going back in my
killfile.  Although you're no longer trolling this group, I find your
writing difficult to read at best; answering questions from people like
Rhodri is about your only chance to reach people like me.  Even without
the killfile, I'm skipping about 60-80% of your posts.  It's a free
Usenet, of course, so you're under no obligation to pay any attention to
me, but I think you ought to consider the merits of changing your
opinion of Rhodri's questions.

Side note: the rhetorical trick of inserting a person's name in a
statement is often intended and received as a mildly patronizing insult.
It's not at all clear to me whether that was your intent; if not, you
might want to change your writing style a bit.
Side note: the rhetorical trick of inserting a person's name in a

This is a welcome digression for me. I wasn't sure that my idea was
being taken seriously. On a temperature scale from freezing to
boiling, I took Rhodri's message to be somewhere in the single
digits-- quite chilly. Maybe if it hadn't made me feel so defensive,
I could've just said so. But, judging from his reply, he got the
message. said:
I find your
writing difficult to read at best; ....
I'm skipping about 60-80% of your posts.

My loss.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,981
Messages
2,570,188
Members
46,733
Latest member
LonaMonzon

Latest Threads

Top