Using id2ref for anything?

  • Thread starter Charles Oliver Nutter
  • Start date
C

Charles Oliver Nutter

ObjectSpace._id2ref is another of those peculiar methods, an artifact
of a particular implementation which, due to its lack of a
copying/compacting garbage collector, can always locate in memory an
object given its "id". This is typically *not* easily possible on
other VMs, where objects move around and it may even be difficult to
get a unique "id" for a given object since memory locations keep
moving and adding a numeric ID would increase object or object handle
sizes.

On JRuby, _id2ref is implemented as a pair with Object#object_id/id.
The latter, when called on an object, atomically constructs a numeric
ID for the object in question. It then asks our ObjectSpace
implementation to insert a weak reference to the object into a table
keyed on numeric ID. This allows the resulting ID to be used later for
_id2ref to retrieve the object.

Unfortunately object_id, in its #id form, is often used to get a
unique non-#hash key for an object for purposes entirely unrelated to
_id2ref. As a result, any code using object_id or id on JRuby pays a
significantly higher cost than you might expect.

If we no longer supported _id2ref, the only cost would be in producing
an ID, probably with a strictly-increasing atomic 64-bit value. There
would be no weakref map and no cost of constructing and managing the
weakrefs within that map.

So I am asking you Rubyists...does this sound like a problem? In the
1.8/1.9 stdlib, the only reference to _id2ref is one in drb.rb, which
could be replaced with a "better way". None of the gems I have
installed use _id2ref. Originally, weakref.rb used _id2ref, but we
have a native impl of weakref that uses Java's built-in weakrefs.
Google code search only brings up about 353 hits for "lang:ruby
_id2ref", most of them the already-mentioned cases.

One last demonstration of the perf difference between the current
Object#object_id and one that does not use the ObjectSpace weak map:

Current:

user system
total real
1M calls to obj.object_id 0.658000 0.000000
0.658000 ( 0.658000)
1M calls to Object.new.object_id 6.636000 0.000000
6.636000 ( 6.636000)

Using object's "identity hash":

user system
total real
1M calls to obj.object_id 0.356000 0.000000
0.356000 ( 0.356000)
1M calls to Object.new.object_id 0.636000 0.000000
0.636000 ( 0.636000)

It's also interesting to note that even maintaining the contract of
object_id being unique is hard. On the JVM, for example, it is not
possible to get a unique numeric id or pointer for a given object
unless you manage a weak map of objects on your own...

- Charlie
 
R

Robert Klemme

So I am asking you Rubyists...does this sound like a problem? In the
1.8/1.9 stdlib, the only reference to _id2ref is one in drb.rb, which
could be replaced with a "better way". None of the gems I have
installed use _id2ref. Originally, weakref.rb used _id2ref, but we
have a native impl of weakref that uses Java's built-in weakrefs.
Google code search only brings up about 353 hits for "lang:ruby
_id2ref", most of them the already-mentioned cases.

Charles, thanks for the elaborate report and request! I for my part do
not see an issue with removing _id2ref if a better solution for DRb can
be devised.
It's also interesting to note that even maintaining the contract of
object_id being unique is hard. On the JVM, for example, it is not
possible to get a unique numeric id or pointer for a given object
unless you manage a weak map of objects on your own...

I believe there is an alternative solution which comes at the cost of
the memory overhead for every object: place the id in the instance and
use a central AtomicLong for "generating" ids. You also save the
overhead of map maintenance which would be a central synchronization point.

Kind regards

robert
 
C

Caleb Clausen

If we no longer supported _id2ref, the only cost would be in producing
an ID, probably with a strictly-increasing atomic 64-bit value. There
would be no weakref map and no cost of constructing and managing the
weakrefs within that map.

So I am asking you Rubyists...does this sound like a problem?

In my own projects, I use _id2ref/__id__ in a couple places that I can
recall. So, I was about to object to it going away... but on
reflection, it seems like _id2ref can always be replaced by a WeakRef,
(at least when running in JRuby). So, removing it shouldn't really be
a problem.
 
C

Charles Oliver Nutter

Originally, _id2ref is an implementation dependent hack for weakref,
so that you can remove it, if you can provide the better way.

I suspected as much, since that seemed to be the primary place for it
to be used. I guess the remaining question is about the uniqueness of
object_id. I believe on Sun's JVMs java.lang.System.identityHashcode
of an object will be unique for the lifetime of the object, but not
unique forever (which I'm sure is the case on MRI). However, I don't
think there's any guarantee that the identityHashCode will be unique
across JVMs, though the documentation says an implementation should
make a "best effort" to keep it unique.

We will look at removing _id2ref in 1.5 (or making it do nothing, with
a warning) as well as modifying the one stdlib that uses it (DRb, for
reasons I have not yet explored). And I will explore whether
identityHashcode will be "unique enough" as I suspect it should be.

- Charlie
 
C

Charles Oliver Nutter

In my own projects, I use _id2ref/__id__ in a couple places that I can
recall. So, I was about to object to it going away... but on
reflection, it seems like _id2ref can always be replaced by a WeakRef,
(at least when running in JRuby). So, removing it shouldn't really be
a problem.

Yes, all cases of _id2ref could be implemented yourself by building a
weak map from your own user-generated (or from object_id) to objects.
So I think there's probably no good reason we need to have _id2ref
support if we have our own weakref implementation.

- Charlie
 
C

Charles Oliver Nutter

I believe there is an alternative solution which comes at the cost of the
memory overhead for every object: place the id in the instance and use a
central AtomicLong for "generating" ids. =C2=A0You also save the overhead= of map
maintenance which would be a central synchronization point.

Yes, that may be too high a cost for us to pay. On 32/64-bit JVMs,
adding another field would cost 4 or 8 bytes per object. Stuffing it
into the instance variable table would force ivar tables to be created
when object_id is called, which comes with a base cost of 2 words (4
or 8-byte) plus the word for the reference to a Fixnum object (or 4-8
bytes for a reference to an int or long). It's too high to put on all
Objects, especially if identityHashcode is unique enough.

- Charlie
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,997
Messages
2,570,239
Members
46,828
Latest member
LauraCastr

Latest Threads

Top