Efficient Object Reconstruction?

A

Austin Ziegler

I have a problem, and it's been bugging me for a while, so I thought
that I'd ask people that know a lot more about the internals of Ruby
than I do.

In PDF::Writer, I have a complex object graph. Its complexity is
mostly caused by the fact that all objects must know their parent
object. Maybe something like:

foo#35
bar#36 (@parent =3D foo#35, ...)
baz#37 (@parent =3D foo#35, ...)

When this graph gets roundtripped through Marshal, I might have:

foo#38
bar#39 (@parent =3D foo#35, ...)
baz#40 (@parent =3D foo#35, ...)

This means that there are now "odd" copies of at least the parent
(foo#35) around. This is something that *does* happen when
Transaction::Simple is called. Now, if I then add something else:

foo#38
bar#39 (@parent =3D foo#35, ...)
baz#40 (@parent =3D foo#35, ...)
bom#41 (@parent =3D foo#38, ...)

Obviously, this not only gets silly after a while, but it gets pretty
inefficient.

I want to fix Transaction::Simple so that the object graph is
*properly* reconstructed. That is, if I Marshal.dump the foo with
object ID 35, I don't mind if it's reconstituted as #38, but I want
"old" references to #35 reconstructed as references to #38.

How can I do that? Preferably, how can I do that without crawling the
object graph -- which would be extremely inefficient?

-austin
--=20
Austin Ziegler * (e-mail address removed)
* Alternate: (e-mail address removed)
 
R

Robert Klemme

Austin said:
I have a problem, and it's been bugging me for a while, so I thought
that I'd ask people that know a lot more about the internals of Ruby
than I do.

In PDF::Writer, I have a complex object graph. Its complexity is
mostly caused by the fact that all objects must know their parent
object. Maybe something like:

foo#35
bar#36 (@parent = foo#35, ...)
baz#37 (@parent = foo#35, ...)

When this graph gets roundtripped through Marshal, I might have:

foo#38
bar#39 (@parent = foo#35, ...)
baz#40 (@parent = foo#35, ...)

I don't think so. Since the whole graph is copied foo#35 is copied and
references are adjusted so you end up with parent = foo#38 - regardless
whether you copy the root object or any other object (Marshal cannot know
what the root / parent object is anyway):

09:33:39 [ruby]: ruby mar-ref.rb
134688280 134688268
134688148 134688100
false false
134687860 134687896
false false
09:34:41 [ruby]: cat mar-ref.rb
T = Struct.new:)ref)
a = []
a << T.new(a)
print a.object_id, " ", a[0].object_id, "\n"
b = Marshal.load(Marshal.dump(a))
print b.object_id, " ", b[0].object_id, "\n"
print b.object_id == a.object_id, " ", b[0].object_id == a[0].object_id,
"\n"
c = Marshal.load(Marshal.dump(a[0]))
print c.ref.object_id, " ", c.object_id, "\n"
print c.ref.object_id == a.object_id, " ", c.object_id == a[0].object_id,
"\n"
This means that there are now "odd" copies of at least the parent
(foo#35) around. This is something that *does* happen when
Transaction::Simple is called. Now, if I then add something else:

foo#38
bar#39 (@parent = foo#35, ...)
baz#40 (@parent = foo#35, ...)
bom#41 (@parent = foo#38, ...)

Obviously, this not only gets silly after a while, but it gets pretty
inefficient.

This might be a problem of Transaction::Simple (disclaimer, I never used
it). You probably have to copy the complete graph every time and discard
the old one.
I want to fix Transaction::Simple so that the object graph is
*properly* reconstructed. That is, if I Marshal.dump the foo with
object ID 35, I don't mind if it's reconstituted as #38, but I want
"old" references to #35 reconstructed as references to #38.

How can I do that? Preferably, how can I do that without crawling the
object graph -- which would be extremely inefficient?

I don't think there is an easy solution that allows to write an instance
twice to the stream and adjusting references on checkout other than
writing it twice in one Marshal.dump step. The easiest might be to create
a mapping on your own, store this mapping and adjust it after
deserializing.

Kind regards

robert
 
P

Pit Capitain

Austin said:
The attached program demonstrates this behaviour using Transaction::Simple.

Any ideas, anyone?

This seems to be a problem with the implementation of Transaction::Simple.

The problem occurs when you have two objects that hold references to
each other, say o1 and o2. When you start a transaction on o1, you store
the current state as a Marshal dump of the object graph:

o1 <-----> o2

When you rewind the transaction, you restore the Marshal dump. The
resulting object graph contains new objects o1' and o2', holding
references to each other.

o1' <-----> o2'

Since you don't want to return the new object o1' but reuse the original
object o1, you replace all the instance variables of o1 by those of o1'.
The resulting object graph then looks like this:

o1' <-----> o2' <----- o1 <----- o2

I fear that a fix for this problem wouldn't fit into
Transaction::*Simple* :-(

Regards,
Pit
 
A

Austin Ziegler

This seems to be a problem with the implementation of
Transaction::Simple.

Well, yes. This is part of the reason that I'm trying to figure out how
to fix it. It's not a *huge* problem, but it *is* an annoying one.
The problem occurs when you have two objects that hold references to
each other, say o1 and o2. When you start a transaction on o1, you
store the current state as a Marshal dump of the object graph:
=20
o1 <-----> o2
=20
When you rewind the transaction, you restore the Marshal dump. The
resulting object graph contains new objects o1' and o2', holding
references to each other.
=20
o1' <-----> o2'
=20
Since you don't want to return the new object o1' but reuse the
original object o1, you replace all the instance variables of o1 by
those of o1'. The resulting object graph then looks like this:
=20
o1' <-----> o2' <----- o1 <----- o2
=20
I fear that a fix for this problem wouldn't fit into
Transaction::*Simple* :-(

That's, um, more or less what's going on. What's interesting is that it
only matters when attributes *aside* from containment change during the
transaction.

Got any recommendations on how to fix it ... preferably without walking
the object graph?

The long term goal is to do something like this:

pdf.start_transaction # stores pdf as pdf'
# make modifications
pdf.rewind_transaction # replace pdf with pdf'
# make modifications
pdf.commit_transaction # keep pdf

It's the *replace* that I'm having a problem with, and I suspect that
this isn't something I can do without, say, evil.rb. Does anyone have
any suggestions on how might this might be done? I noticed this a while
back (and implemented an API change to gloss over it) when I was trying
to make it so that one could choose to compress a PDF document on save
rather than when it's created.

I suspect (strongly) that I could also save on PDF::Writer memory use if
I could get this working (and working quickly).

It might be acceptable to require a migration to Transaction::Full or somet=
hing
as yet unwritten, but I *want* it to be pure Ruby, if possible.
(Otherwise, it'll
be something that I make a run-time option.)

-austin
--=20
Austin Ziegler * (e-mail address removed)
* Alternate: (e-mail address removed)
 
A

Austin Ziegler

Maybe try storing the parent<->child relationships in an external table?

I don't think that I meaningfully can, because:

(1) we're talking *hundreds* of objects,
(2) we're not talking about a database

That said, it *might* theoretically be possible, but I think that we'd
be adding so much overhead that it wouldn't be funny. (It is possible
for a transaction to be started for any number of reasons, and those
reasons will only be increasing as the layout engine improves.)

-austin
--=20
Austin Ziegler * (e-mail address removed)
* Alternate: (e-mail address removed)
 
R

Robert Klemme

Austin said:
Got any recommendations on how to fix it ... preferably without
walking
the object graph?

As I said, use another level of indirection and store references in a hash
and use that for resolution. You might be able to do that behind the
scenes with a changed attr_accessor.

Kind regards

robert
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,995
Messages
2,570,230
Members
46,818
Latest member
Brigette36

Latest Threads

Top