S
Steve [RubyTalk]
Imagine comparing using C++ and Ruby to manipulate some strings. I would
like to confirm some assumptions I am making about in-memory efficiency
with Ruby. (This explains why the example seems contrived and I don't
want to be told to tackle the problem a different way, such as splitting
up the input as it is read from file/sockets etc... )
Assume I've got a large in-memory sequence of bytes (this would be
represented in C(++) using a malloc block) and in ruby as a String. I
have a pre-defined (non-trivial, potentially computationally expensive)
function which calculates a sequence of offsets into the String subject
to some arbitrary criteria... and subsequently I wish to reference
sub-strings (i.e. strings between two successive offsets) as if they
were independent of the original string (though, of course, each having
a fixed length.) N.B. This could be done 'cheaply' using pointers into
the original string if using C/C++.
Given that I only want to compute the offsets once, an obvious solution
would be to construct an Array of String - each element representing a
sub-string of the original... but this would double memory use. What
would be the best way to avoid duplicating the character sequences and
causing run-time bloat?
By corollary, if I had a large number of Strings, what would be the most
memory efficient way to represent their concatenation? If I had n mK
stings, would I need another n*mK contiguous block of memory to
represent their concatenation?
like to confirm some assumptions I am making about in-memory efficiency
with Ruby. (This explains why the example seems contrived and I don't
want to be told to tackle the problem a different way, such as splitting
up the input as it is read from file/sockets etc... )
Assume I've got a large in-memory sequence of bytes (this would be
represented in C(++) using a malloc block) and in ruby as a String. I
have a pre-defined (non-trivial, potentially computationally expensive)
function which calculates a sequence of offsets into the String subject
to some arbitrary criteria... and subsequently I wish to reference
sub-strings (i.e. strings between two successive offsets) as if they
were independent of the original string (though, of course, each having
a fixed length.) N.B. This could be done 'cheaply' using pointers into
the original string if using C/C++.
Given that I only want to compute the offsets once, an obvious solution
would be to construct an Array of String - each element representing a
sub-string of the original... but this would double memory use. What
would be the best way to avoid duplicating the character sequences and
causing run-time bloat?
By corollary, if I had a large number of Strings, what would be the most
memory efficient way to represent their concatenation? If I had n mK
stings, would I need another n*mK contiguous block of memory to
represent their concatenation?