Strings vs arrays

E

Eric Mahurin

--- Daniel Brockman said:
=20
I would like to see this implementation. Particularly the
Ruby
implementation of an O(1) String#shift.

Below is what I have now. You make one of these objects by
just passing an array or a string to Indexed2.new. It only
uses methods in common to Array and String so it doesn't care
which you use (a nice example of taking advantage of duck
typing). From then on you can treat one of these Indexed2
objects like an Array or String (using only the common
methods).

This class also has algorithmic speedup for all inserts/deletes
in the first half of the array. Unfortunately, since Array and
String don't have an efficient way to copy data within the
object, you only see an advantage for about the first quarter
of the sequence. If all of this was written C, inserts/deletes
on both ends of the Array/String would have the same speed
(O(1)) and the average insert/delete time at any position would
be cut in half. Because it is not written in C, you don't
start seeing an advantage until you get to 50K+ elements.

class Indexed2
Growth =3D 0.5
Max =3D 1
Shrink =3D 0.25
BreakPoint =3D 1.0/4 # should be 0.5 with a good copy method
def initialize(data=3D@@data_class.new)
@data =3D data
@offset =3D 0
end
def class
@@data_class =3D @data.class
super
end
def size
@data.size-@offset
end
alias length size
def empty?
(@data.size-@offset).zero?
end
def << (v)
@data << v
end
def slice(index,*len)
index =3D size+index.to_i if
(index.nonzero?||1.0/index)<0
@data.slice(index+@offset,*len)
end
alias [] slice
def slice!(index,*len)
slice(index,*len)
ensure
self[index,len[0]||1] =3D @data.class.new
end
def dup
self.class.new(@data.dup)
end
def []=3D(index,*len_value)
value =3D len_value.pop
return(@data[index+@offset] =3D value) if
len_value.empty?
len =3D len_value[0]
size =3D @data.size-@offset
index =3D size+index.to_i if
(index.nonzero?||1.0/index)<0
delta =3D value.size-len
if delta.nonzero? and index<(size*BreakPoint).to_i
size +=3D delta
len +=3D delta
offset0 =3D (offset =3D @offset-delta)
if delta<0
if offset>Max*size
offset =3D (Shrink*size).to_i
@data[offset,index] =3D @data[@offset,index]
@data[index+offset,offset0-offset+len] =3D
value
else
@data[offset,index] =3D @data[@offset,index]
@data[index+offset,len] =3D value
end
else
if offset<0
offset =3D (Growth*size).to_i
@data[index+len+offset,0] =3D
@data[index+@offset,offset-offset0]
end
@data[offset,index] =3D @data[@offset,index]
@data[index+offset,len] =3D value
end
@offset =3D offset
else
@data[index+@offset,len] =3D value
end
end
def shift
slice!(0)
end
def pop
slice!(-1)
end
def push(*args)
args.each { |v| @data << v }
self
end
def unshift(*args)
value =3D @data.class.new
args.each { |v| value << v }
self[0,0] =3D value
self
end
end




=09
__________________________________=20
Yahoo! Mail for Mobile=20
Take Yahoo! Mail with you! Check email on your mobile phone.=20
http://mobile.yahoo.com/learn/mail=20
 
L

Luke Worth

Consider StringScanner.
Ah, thanks. This is much better than converting into an array.
 
D

David A. Black

Hi --

Black> Maybe a left chop/chomp operation would be a good idea, but I
Black> think it should be called lchop, which would be consistent
Black> with other string method naming (rather than with array
Black> method naming).

I fail to see the point in that. String#chop is meant to chop off
end-of-line characters; String#lchop wouldn't be. So using that name
would be _inconsistent_ with other string method naming.

String#chop chops off the rightmost character:

irb(main):001:0> "abc".chop
=> "ab"

You may be thinking of "chomp", which is a specialized "chop"
operating only on newline characters.

So the idea of lchop would be to serve as a left-hand equivalent of
chop.


David
 
D

Daniel Brockman

David A. Black said:
String#chop chops off the rightmost character:

irb(main):001:0> "abc".chop
=> "ab"

Except if the string ends with a CRLF pair:

"abc\r\n".chop #=> "abc"
You may be thinking of "chomp", which is a specialized "chop"
operating only on newline characters.

If you read the docstrings, you get the impression that String#chop
is more-or-less deprecated in favor of the ``safer'' String#chomp:

+String#chomp+ is ofter a safer alternative, as it leaves
the string unchanged if it doesn't end in a record separator.
So the idea of lchop would be to serve as a left-hand equivalent
of chop.

So I suppose if the string starts with a CRLF pair, String#lchop would
chop off two characters from the left?

Why not go all the way and let all string methods treat CRLF pairs as
single characters?

I think it's a problem that strings are the only way to go for raw
byte arrays in Ruby, yet

* strings lack a few random useful array methods

* the string methods are not binary safe.
 
D

David A. Black

Hi --

Except if the string ends with a CRLF pair:

"abc\r\n".chop #=> "abc"


If you read the docstrings, you get the impression that String#chop
is more-or-less deprecated in favor of the ``safer'' String#chomp:

+String#chomp+ is ofter a safer alternative, as it leaves
the string unchanged if it doesn't end in a record separator.

I believe that means safer in the sense that if you're going through,
say, lines in a file, and for some reason there's no \n at the end of
the last line, you won't accidentally cut off a non-\n character.

In the general case, #chop can't be deprecated in favor of #chomp,
because #chomp doesn't offer the same functionality (chopping off the
last character).
So I suppose if the string starts with a CRLF pair, String#lchop would
chop off two characters from the left?

That's a good question. One could argue that the only reason they are
treated together in the first place is that they represent the more
abstract concept "newline" -- and that where they aren't representing
that concept, they should be treated separately. Or one could go for
the complete symmetry approach. I guess I'd tend to favor the former
notion, since the idea of left-end/right-end is already irreducibly
asymmetrical in a left-to-right writing system. (Though then there's
the matter of what would happen given a right-to-left writing system,
etc.)
Why not go all the way and let all string methods treat CRLF pairs as
single characters?

See above -- there's no magic association between those two
characters, just the historical fact of their serving the newline
role, and the practical need to acknowledge that role. I don't think
there would be any advantage to, say, having String#count combine
them, etc. (though of course there would have been an advantage to
global agreement several decades ago on how to represent newline on
various platforms :)
I think it's a problem that strings are the only way to go for raw
byte arrays in Ruby, yet

* strings lack a few random useful array methods

* the string methods are not binary safe.

String, like Hash, raises interesting questions about the relation
between itself, Array, and Enumerable. It's interesting that
String#to_a breaks the string into lines as opposed to characters or
bytes. That's certainly a behavior one would not expect if the
"arrayness" of strings resided strictly in their status as ordered
collections of bytes. On the other hand, they are ordered collections
of characters of bytes :) I still find myself expecting String#each
to go bytewise. But the fact that these different objects don't map
exactly on to each other is, I think, one of the points of having a
higher and separate abstraction like Enumerable. It decouples them,
while still not making it impossible to assimilate them to each other
when necessary.


David
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,175
Messages
2,570,944
Members
47,491
Latest member
mohitk

Latest Threads

Top