Is it considered Harmful?

L

Lennon Day-Reynolds

My point was that DL was already a part of the Ruby standard library,
and as such, had opened the door to implement exactly what you're
asking for without changes to the core Ruby interpreter. The cost of
that flexibility, however, is that you must trust the programmer not
to go and cause a segfault.

I think that it's fine to have that access available for clever hacks,
but only if you trust the author of that hack to really, really know
what they're doing. It also could tie your code to a certain version
of Ruby, as the internal data structures the interpreter uses could
easily change in representation between versions.

Making #class= or #become a part of the core library seems unecessary,
exactly because it is potentially so dangerous. For those rare cases
where people need that functionality, DL and evil.rb are available;
for us mere mortals, the current (already wide-open) object semantics
seem appropriate.
 
S

Sean O'Dell

D> Modules (such as Enumerable) only add methods.

This is worst than this : Enumerable rely on the method #each which is
implemented differently by each class, this is why it work

I have to admit when I'm wrong.

At least doing it brute-force, changing an object's class DOES crash Ruby.
But I've only done a rudimentary test where I simply replaced the klass VALUE
(in C) with something else. I can't understand why Ruby would crash, except
that internally there must be a whole lot of code that doesn't do any kind of
checking of internal values.

I think I can see why Guy says this can't be done. I haven't seen much of the
internal Ruby implementations, but my initial guess is that Ruby makes an
assumption about the presence, and state, of internal data kept with each
object/class.

If that's the case, it would probably be a mountain of work to back through it
all and place checks on that internal data.

Sean O'Dell
 
L

Lennon Day-Reynolds

[snip]
No code should "expect" data to be there that isn't. If a C extension wraps a
native C struct as a Ruby object with Make_Struct, then when it unwraps it,
and gets nothing or a NULL pointer, it shouldn't continue to try and use it.
That's sloppy coding.

Sean O'Dell

I agree with you about it being sloppy programming, but then again,
part of the reason that I write code in Ruby (and Python, Java, C#,
Perl, etc.) is so that I can be a little sloppy, without getting
segfaults or buffer overruns. Programming in C seems to consist mostly
of wrapping unsafe calls in error checks and handlers; I would hate to
see Ruby become similarly burdened due to low-level hacks like #class=
or #become entering the langauge core.

Lennon
 
M

Mauricio Fernández

gabriele renzi said:

Are there limitations on evil.rb's implementation of become? How do you
get around Guy's concerns about crashing?

evil.rb's Object#become is restricted in several ways, to try to prevent
crashes. Some of the checks include:

* in general, refuse operations that would break implicit assumptions
in Ruby (memory layout & internal structs):
* refuse to swap "core classes"
* refuse to swap modules with non-modules (cause they might be in a
klass chain)
* refuse to swap objects whose classes use incompatible "internal types"
* refuse to operate on immediate values (Fixnum, symbols, true, false, nil...)
* prevent cycles in the klass chains
* check EXIVARs
...

Additionally, the method cache is invalidated after two objects are
swapped.

--
Running Debian GNU/Linux Sid (unstable)
batsman dot geo at yahoo dot com

i dont even know if it makes sense at all :) This is an experimental patch
for an experimental kernel :))
-- Ingo Molnar on linux-kernel
 
S

Sean O'Dell

How does it know there's no open file? Go have a look at the code.

My guess is here:

#define GetOpenFile(obj,fp) rb_io_check_closed((fp) =
RFILE(rb_io_taint_check(obj))->fptr)

...because io_read, the C implementation of the File.read method, passes the
"self" VALUE to this macro, which assumes the object is an RFILE, and
immediately uses the fptr member, without checking that the "self" object
contains a valid fptr value. If you change the class of an object to
something else, it still does this, even when fptr is not a valid member
(which, when used, causes the crash).

Now, if Ruby does this in a lot of places, then yeah, changing an object's
class is not going to be possible.

What I think would settle this simply, is if R_CAST was changed to actually
perform a type check and raise an exception when an attempt is made to cast a
VALUE to something it's not.

Sean O'Dell
 
S

Sean O'Dell

[snip]
No code should "expect" data to be there that isn't. If a C extension
wraps a native C struct as a Ruby object with Make_Struct, then when it
unwraps it, and gets nothing or a NULL pointer, it shouldn't continue to
try and use it. That's sloppy coding.

Sean O'Dell

I agree with you about it being sloppy programming, but then again,
part of the reason that I write code in Ruby (and Python, Java, C#,
Perl, etc.) is so that I can be a little sloppy, without getting
segfaults or buffer overruns. Programming in C seems to consist mostly
of wrapping unsafe calls in error checks and handlers; I would hate to
see Ruby become similarly burdened due to low-level hacks like #class=
or #become entering the langauge core.

I was actually speaking of Ruby C code, not so much Ruby script code. Ruby
should be stable even in exceptional circumstances, although you might not
get the results you're after. It still shouldn't crash. C code shouldn't
use internal data without performing some checks, especially if the data
makes the rounds to other libraries/extensions and such.

Sean O'Dell
 
L

Lennon Day-Reynolds

The point of this whole thread, though, is that adding #become and
#class= to the language would effectively make Ruby code as
potentially unsafe as C. Look at the .NET CLR -- they have "unsafe"
blocks, in which you can do raw pointer-based operations, but which
mark the entire assembly (a compiled module) as unsafe, and therefore
platform (and perhaps even OS version) specific.

I agree 100% that "pure Ruby" code should be safe from hard crashes
and as platform-independent as possible. It's pretty good in that
arena now, and I'd like to see it continue to improve, not get worse.

Question: does the current implementation for $SAFE block imports of
native modules at some level?

Lennon

P.S.: Okay, I just RTFM, and think I found the answer to my question
above. Just in case anyone's interested:

Answer: According to Pickaxe, at $SAFE=2, no loads are allowed from
tainted path strings, and at $SAFE=3, all object are created tainted.
That should mean that any code eval'd at $SAFE=3 or higher won't be
able to import at all, but modules that were already loaded should
still be available via 'require' (since 'load', not 'require' is the
checked method), right?

[snip]
No code should "expect" data to be there that isn't. If a C extension
wraps a native C struct as a Ruby object with Make_Struct, then when it
unwraps it, and gets nothing or a NULL pointer, it shouldn't continue to
try and use it. That's sloppy coding.

Sean O'Dell

I agree with you about it being sloppy programming, but then again,
part of the reason that I write code in Ruby (and Python, Java, C#,
Perl, etc.) is so that I can be a little sloppy, without getting
segfaults or buffer overruns. Programming in C seems to consist mostly
of wrapping unsafe calls in error checks and handlers; I would hate to
see Ruby become similarly burdened due to low-level hacks like #class=
or #become entering the langauge core.

I was actually speaking of Ruby C code, not so much Ruby script code. Ruby
should be stable even in exceptional circumstances, although you might not
get the results you're after. It still shouldn't crash. C code shouldn't
use internal data without performing some checks, especially if the data
makes the rounds to other libraries/extensions and such.

Sean O'Dell
 
S

Sean O'Dell

[snip]

No code should "expect" data to be there that isn't. If a C
extension wraps a native C struct as a Ruby object with Make_Struct,
then when it unwraps it, and gets nothing or a NULL pointer, it
shouldn't continue to try and use it. That's sloppy coding.

Sean O'Dell

I agree with you about it being sloppy programming, but then again,
part of the reason that I write code in Ruby (and Python, Java, C#,
Perl, etc.) is so that I can be a little sloppy, without getting
segfaults or buffer overruns. Programming in C seems to consist mostly
of wrapping unsafe calls in error checks and handlers; I would hate to
see Ruby become similarly burdened due to low-level hacks like #class=
or #become entering the langauge core.

I was actually speaking of Ruby C code, not so much Ruby script code.
Ruby should be stable even in exceptional circumstances, although you
might not get the results you're after. It still shouldn't crash. C
code shouldn't use internal data without performing some checks,
especially if the data makes the rounds to other libraries/extensions and
such.

The point of this whole thread, though, is that adding #become and
#class= to the language would effectively make Ruby code as
potentially unsafe as C. Look at the .NET CLR -- they have "unsafe"
blocks, in which you can do raw pointer-based operations, but which
mark the entire assembly (a compiled module) as unsafe, and therefore
platform (and perhaps even OS version) specific.

No, changing an object's class wouldn't cause Ruby code to be as prone to
problems as C code. Not in theory, anyway. In theory, it would be as
problematic as redefining methods, or including modules that conflict with
existing object instance variables and methods.

However, after looking a bit at the actual Ruby implementation, it's clear
that theory and practice are two different worlds. Ruby uses internal data
on assumption in a lot of places, and changing an object's class causes it to
crash quite easily.

But, I think simply placing checks in certain appropriate places would
alleviate the problem. Sometime today I think I'll try putting type checks
in the R_CAST macro and see how that works.

Sean O'Dell
 
F

Florian Gross

Jim said:
The "standard" application for the self.class= method (aka become(class))
is use it to do demand on load style proxies. Suppose you have a largish
object in external storage (database, file, whatever). You create a
small, efficient proxy item for the large object until you really need the
true data in the object. Then you load the object and let the proxy
"become" the newly loaded object.

Without class=/become methods,
(a) the proxy remains and forwards all messages to the real object, or
(b) you replace every reference in the program to the proxy with
a reference to the real object.

(a) is a small, but anoying constant cost, and (b) may be difficult,
depending on how wide spread the proxy was used.

There's another small, but annoying problem with the proxy-approach:
There are operations which can't be forwarded. I can only think of the
truth value of an Object right now.

It's a problem in this case:

irb(main):001:0> [obj = false, ref = WeakRef.new(obj)]
=> [false, false]
# We should now be able to treat obj and ref as
# if they were actually the same thing
irb(main):002:0> [obj.to_s, ref.to_s]
=> ["false", "false"] # This is okay, methods get forwarded
irb(main):003:0> [if not obj then "foo" end, if not ref then "foo" end]
=> ["foo", nil] # truth value decisions can't be forwarded
irb(main):004:0> exit

I'm not sure how other languages handle -- as far as I know Perl has a
way of overloading how Objects will be converted into Boolean values.

Python also seems to have this via __nonzero__. (And __len__ == 0)

I have no idea however if such a method (let's call it #to_bool) could
be do in a way such that it doesn't disturb the performance of all Ruby
scripts, but it looks more possible every day.

Regards,
Florian Gross
 
J

Jim Weirich

Florian Gross said:
There's another small, but annoying problem with the proxy-approach:
There are operations which can't be forwarded. I can only think of the
truth value of an Object right now.

Comparisons are problematic because the first line of a comparison is
often "Are you the same class as me?". Proxies arent't the right class so
the comparisons fail.

Actually, rather than a become or class= method, I would be interested in
a swap_identitys method. For example, to make a proxy object real ...

def make_real
obj = read_real_object_from_the_database
swap_identities(self, obj)
end

When identities are swapped, every reference in the system to the first
object will magically become a reference to the second object, and
vice-versa.

There are no semantic problems with an object suddenly changing classes
and finding inappropriate member variables. They just switch identities.
This would be perfect for the proxy problem.

Although it solves semantic problems of become, I doubt it would still fly
with Ruby as it is today. All objects would have to occupy the same
amount of memory for this to work (amoung other constraints). I don't
think that's true of many of the built in classes.
 
S

Sean O'Dell

How would you handle
a = /cat/
a.class = File
a.read

It's a brave man that calls Guy 'wrong,' so I'm looking forward to
seeing your implementation of this.

With some small changes to object.c and ruby.h, this code:

a = /cat/
a.class = File
a.read

...results in this error:

testclass.rb:5:in `read': wrong argument type File (expected File) (TypeError)
from testclass.rb:5

Which is pretty solid, and now I can change the class of an object. A patch
against the Ruby CVS follows, for reference.

Sean O'Dell



Index: object.c
===================================================================
RCS file: /src/ruby/object.c,v
retrieving revision 1.153
diff -r1.153 object.c
201a202,209
VALUE
rb_obj_class_set(obj, new_class)
VALUE obj, new_class;
{
RBASIC(obj)->klass = new_class;
return new_class;
}
2546a2555
rb_define_method(rb_mKernel, "class=", rb_obj_class_set, 1);
Index: ruby.h
===================================================================
RCS file: /src/ruby/ruby.h,v
retrieving revision 1.105
diff -r1.105 ruby.h
415,420c415,420
< #define RMODULE(obj) RCLASS(obj)
< #define RFLOAT(obj) (R_CAST(RFloat)(obj))
< #define RSTRING(obj) (R_CAST(RString)(obj))
< #define RREGEXP(obj) (R_CAST(RRegexp)(obj))
< #define RARRAY(obj) (R_CAST(RArray)(obj))
< #define RHASH(obj) (R_CAST(RHash)(obj))
---
#define RMODULE(obj) (Check_Type(obj, T_MODULE), RCLASS(obj))
#define RFLOAT(obj) (Check_Type(obj, T_FLOAT), R_CAST(RFloat)(obj))
#define RSTRING(obj) (Check_Type(obj, T_STRING), R_CAST(RString)(obj))
#define RREGEXP(obj) (Check_Type(obj, T_REGEXP), R_CAST(RRegexp)(obj))
#define RARRAY(obj) (Check_Type(obj, T_ARRAY), R_CAST(RArray)(obj))
#define RHASH(obj) (Check_Type(obj, T_HASH), R_CAST(RHash)(obj))
422,424c422,424
< #define RSTRUCT(obj) (R_CAST(RStruct)(obj))
< #define RBIGNUM(obj) (R_CAST(RBignum)(obj))
< #define RFILE(obj) (R_CAST(RFile)(obj))
---
 
C

Charles Comstock

Sean said:
You can do all sorts of things with Ruby, I see no reason why changing an
object's class would crash Ruby anymore than, say, redefining a method to
something different than an object expects, or including modules in a class
that don't belong there. Just be careful of what you're doing. I see no
FUNDAMENTAL reason why you couldn't change an object's class.

Sean O'Dell

All of those places that you can assume the underlying datatype hasn't
changed result in increased performance for us because there is no need
to double check the type. If you have #class= then you would have added
overhead on a significant portion of the code for a small use case.
While this isn't a fundemental reason it is a logical reason why it may
have been avoided in the first place. Why don't you try and make a
patch that allows this on your own copy and compare performance and the
like?
Charles Comstock
 
S

Sean O'Dell

All of those places that you can assume the underlying datatype hasn't
changed result in increased performance for us because there is no need
to double check the type. If you have #class= then you would have added
overhead on a significant portion of the code for a small use case.
While this isn't a fundemental reason it is a logical reason why it may
have been avoided in the first place. Why don't you try and make a
patch that allows this on your own copy and compare performance and the
like?

Actually, the comparison is a function call that mostly just checks a bit
mask. That's overhead, but fairly minimal I think.

Sean O'Dell
 
F

Florian Gross

Jim said:
Florian Gross said:
[problems with proxies]
Comparisons are problematic because the first line of a comparison is
often "Are you the same class as me?". Proxies arent't the right class so
the comparisons fail.

It depends -- the answer to that question can be overloaded in most
cases. (Though Object.instance_method:)class).bind(obj).call will always
give you the real class and I think that some of the internal methods
actually do something like that.)
Actually, rather than a become or class= method, I would be interested in
a swap_identitys method. For example, to make a proxy object real ...

def make_real
obj = read_real_object_from_the_database
swap_identities(self, obj)
end

When identities are swapped, every reference in the system to the first
object will magically become a reference to the second object, and
vice-versa.

This is how Squeak implements this AFAIK -- it just iterates through all
Object references and changes them.

I'm not sure, but this could be a bit slow, maybe. (And I'm not sure if
it would be a general solution for the problems with Proxy objects -- it
would work in the lazy database example, but there might be other
problematic use cases.)

Regards,
Florian Gross
 
S

Sean O'Dell

All of those places that you can assume the underlying datatype hasn't
changed result in increased performance for us because there is no need
to double check the type. If you have #class= then you would have added
overhead on a significant portion of the code for a small use case.
While this isn't a fundemental reason it is a logical reason why it may
have been avoided in the first place. Why don't you try and make a
patch that allows this on your own copy and compare performance and the
like?
Charles Comstock

Ran some tests. Ran this Ruby code, which causes the check to occur 100,000
times:

f = File.new("testfile")

time_start = Time.now
(0..10000).each do | index |
f.read
f.seek(File::SEEK_SET, 0)
end
time_end = Time.now

p time_end - time_start


Most of the times were around 0.753561 seconds.

Then I ran the same tests 100,000 times without the check and got times around
0.763882 seconds. It really takes up no extra time at all, that I can
figure. Totally negligible.

Sean O'Dell
 
M

Mikael Brockman

Florian Gross said:
Jim said:
Florian Gross said:
[problems with proxies]
Comparisons are problematic because the first line of a comparison is
often "Are you the same class as me?". Proxies arent't the right class so
the comparisons fail.

It depends -- the answer to that question can be overloaded in most
cases. (Though Object.instance_method:)class).bind(obj).call will
always give you the real class and I think that some of the internal
methods actually do something like that.)
Actually, rather than a become or class= method, I would be interested in
a swap_identitys method. For example, to make a proxy object real ...
def make_real
obj = read_real_object_from_the_database
swap_identities(self, obj)
end
When identities are swapped, every reference in the system to the
first
object will magically become a reference to the second object, and
vice-versa.

This is how Squeak implements this AFAIK -- it just iterates through
all Object references and changes them.

I'm not sure, but this could be a bit slow, maybe. (And I'm not sure
if it would be a general solution for the problems with Proxy objects
-- it would work in the lazy database example, but there might be
other problematic use cases.)

How about changing the object to a redirection pointer, and also
updating references in the garbage collector's root set traversal?
You'd get some overhead at first, but after one garbage collection, all
references will be updated and the redirection object can be recycled.

mikael
 
B

Bill Kelly

From: "Sean O'Dell said:
No code should "expect" data to be there that isn't.

:)

No ambulator should "expect" the rug to remain steady
under their feet when it might be pulled out from under them.

And yet - we live by many such expectations just to make
it through every day life. One such expectation, for me,
is that when my constructor or initializer (whether it
be C or C++ or Ruby or Java or Perl or Python or.....)
sets up some private member variables in my object - it
is reasonable for me to expect that they won't be tampered
with by unexpected external tomfoolery.
If a C extension wraps a
native C struct as a Ruby object with Make_Struct, then when it unwraps it,
and gets nothing or a NULL pointer, it shouldn't continue to try and use it.
That's sloppy coding.

I'm just getting started writing ruby extensions, so I have a
lot to learn.


Regards,

Bill
 
S

Sean O'Dell

:)

No ambulator should "expect" the rug to remain steady
under their feet when it might be pulled out from under them.

You can if it's nailed down. You don't if it's a throw run with a rope tied
to one end which runs out the front door and into the street.
And yet - we live by many such expectations just to make
it through every day life. One such expectation, for me,
is that when my constructor or initializer (whether it
be C or C++ or Ruby or Java or Perl or Python or.....)
sets up some private member variables in my object - it
is reasonable for me to expect that they won't be tampered
with by unexpected external tomfoolery.

This is my practice as well in Ruby. Ruby handles such cases very well, and
gives you good warnings when something your code needs isn't there. It's
very simple to handle the exceptional cases by adding some checks later on if
you find there's a real problem. Usually there isn't.

C is quite different, though. You make assumptions to speed development and
reduce code bloat, but sometimes you just can't get away with it.

Sean O'Dell
 
J

Jean-Hugues ROBERT

Ran some tests. Ran this Ruby code, which causes the check to occur 100,000
times:

f = File.new("testfile")

time_start = Time.now
(0..10000).each do | index |
f.read
f.seek(File::SEEK_SET, 0)
end
time_end = Time.now

p time_end - time_start


Most of the times were around 0.753561 seconds.

Then I ran the same tests 100,000 times without the check and got times around
0.763882 seconds. It really takes up no extra time at all, that I can
figure. Totally negligible.

Sean O'Dell

There is a theory that says that the speed of a language
is the speed at which it manipulates Integer. I don't
necessarily agree with that theory regarding higher level
languages like Ruby. Yet I feel like it is relevant in
the case of a low level change with pervasive impacts.

As a result I feel like a benchmark with integers or
some other low level objects would tell about the impact
of your change in an interesting way compared to the
test that you ran where I suspect a lot of the time is probably spent
in the OS rather than in Ruby.

Yours,

JeanHuguesRobert
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,148
Messages
2,570,838
Members
47,385
Latest member
Joneswilliam01

Latest Threads

Top