no clue

Joe Van Dyk · Jul 1, 2005

I thought for all of five seconds for a good subject line for this
question, but failed. Sorry!

I have a string like:

"some_key: blah, some_other_key: more_blah, yet_other_key: yet_more_blah"

I want to build up a hash like

{ :some_key =3D> "blah", :some_other_key =3D> "more_blah", :yet_other_key
=3D> "yet_more_blah" }

And I don't really want to have to know what the possible keys are in advan=
ce.

So, the message format looks like:
<key>: <value>, <key>: <value>

How can I properly extract it out?

Here's my initial attempt, which works, but seems hackish:

attributes =3D message.split(",")
attributes.each do |attribute|
key, value =3D attribute.scan(/(\w+): (.+)/)[0]
result_hash[key.to_sym] =3D value.strip=20
end
=20
Also, this will get ran potentially thousands of times per second, so
executation speed is of some concern.

Timothy Hunter · Jul 1, 2005

Joe said:
I thought for all of five seconds for a good subject line for this
question, but failed. Sorry!

I have a string like:

"some_key: blah, some_other_key: more_blah, yet_other_key: yet_more_blah"

I want to build up a hash like

{ :some_key => "blah", :some_other_key => "more_blah", :yet_other_key
=> "yet_more_blah" }

And I don't really want to have to know what the possible keys are in advance.
So, the message format looks like:
<key>: <value>, <key>: <value>

How can I properly extract it out?

Here's my initial attempt, which works, but seems hackish:

attributes = message.split(",")
attributes.each do |attribute|
key, value = attribute.scan(/(\w+): (.+)/)[0]
result_hash[key.to_sym] = value.strip
end

Also, this will get ran potentially thousands of times per second, so
executation speed is of some concern.

It doesn't look particularly hackish to me, but maybe my sensibilities
aren't fine enough. The only thing I'd say is that if performance is
important then we ought to ask the regular expression to strip
whitespace around the key and value so we can avoid the #strip method.

Here's my version:

require 'pp'

hash = Hash.new
DATA.each do |line|
attrs = line.split(/,/)
attrs.each do |attr|
m = /\s*(\w+)\s*:\s*(\w+)\s*/.match(attr)
raise "#{attr.chomp} doesn't look like key:value" unless m
hash[m[1].intern] = m[2]
end
end

pp hash

__END__
some_key: blah, some_other_key: more_blah, yet_other_key: yet_more_blah
key1:value1
key2: value2
key3 : value 3 , key4 :value4 , key555555555:value55555

The output is:
{:some_other_key=>"more_blah",
:key555555555=>"value55555",
:yet_other_key=>"yet_more_blah",
:key1=>"value1",
:key2=>"value2",
:key3=>"value",
:some_key=>"blah",
:key4=>"value4"}

Devin Mullins · Jul 2, 2005

Joe said:
Here's my initial attempt, which works, but seems hackish:

attributes = message.split(",")
attributes.each do |attribute|
key, value = attribute.scan(/(\w+): (.+)/)[0]
result_hash[key.to_sym] = value.strip
end

Slightly more readable:

result_hash = {}
attributes = message.split(",")
attributes.each do |attribute|
key, value = *attribute.match(/(\w+): (.+)/).captures
result_hash[key.to_sym] = value.strip
end

Yet more readable:

result_hash = {}
attributes = message.split ","
attributes.each do |attribute|
key, value = *attribute.split(": ",2)
result_hash[key.to_sym] = value.strip
end

Not sure:
attributes = message.split ","
result_hash = attributes.inject {} do |hash,attribute|
key, value = *attribute.split(": ",2)
hash[key.to_sym] = value.strip
hash
end

Also, this will get ran potentially thousands of times per second, so
executation speed is of some concern.

No clue.

Devin

Ara.T.Howard · Jul 2, 2005

I thought for all of five seconds for a good subject line for this
question, but failed. Sorry!

I have a string like:

"some_key: blah, some_other_key: more_blah, yet_other_key: yet_more_blah"

I want to build up a hash like

{ :some_key => "blah", :some_other_key => "more_blah", :yet_other_key
=> "yet_more_blah" }

And I don't really want to have to know what the possible keys are in advance.

So, the message format looks like:
<key>: <value>, <key>: <value>

How can I properly extract it out?

Here's my initial attempt, which works, but seems hackish:

attributes = message.split(",")
attributes.each do |attribute|
key, value = attribute.scan(/(\w+): (.+)/)[0]
result_hash[key.to_sym] = value.strip
end

Also, this will get ran potentially thousands of times per second, so
executation speed is of some concern.

you'll have a hard time getting much faster than strscan:

harp:~ > cat a.rb
require 'strscan'

class HashString < ::Hash
class SyntaxError < StandardError; end
def initialize s, dup = false
load_from s, dup
end
def load_from s, dup = false
@ss = StringScanner::new s, dup
loop do
key, value = scan_key, scan_value
self[key] = value
break if eos?
end
@ss = nil
end
def scan_key
@ss.scan(%r/[\n\s]*([^:\n]+)[\n\s]*(?=

/o) or syntax_error
key = @ss[1]
@ss.scan(%r/[\n\s]*:[\n\s]*/o) or syntax_error
key
end
def scan_value
scan(%r/[\n\s]*([^,\n]+)[\n\s]*/o) or syntax_error
value = @ss[1]
scan(%r/[\n\s]*,?[\n\s]*/o)
value
end
def eos?
@ss.eos?
end
def scan pat
@ss.scan pat
end
def syntax_error
raise SyntaxError, @ss.peek(16) + '...'
end
def to_yaml
{}.merge(self).to_yaml
end
end

s = <<-txt
some_key: blah,
some_other_key: more_blah, yet_other_key:
yet_more_blah
txt

hs = HashString::new s

require 'yaml'
y hs

harp:~ > ruby a.rb
---
some_key: blah
yet_other_key: yet_more_blah
some_other_key: more_blah

strscan is pure c and extremely fast. it doesn't end up creating any new
strings like spliting or regex based solutions. it keeps a pointer into the
string and moves through it. it takes some getting used to be is really good
and part of the standard dist.

cheers.

-a
--
===============================================================================
| email :: ara [dot] t [dot] howard [at] noaa [dot] gov
| phone :: 303.497.6469
| My religion is very simple. My religion is kindness.
| --Tenzin Gyatso
===============================================================================

Daniel Brockman · Jul 2, 2005

Joe Van Dyk said:
attributes = message.split(",")
attributes.each do |attribute|
key, value = attribute.scan(/(\w+): (.+)/)[0]
result_hash[key.to_sym] = value.strip
end

How about this?

message.scan /(\w+)\s*:\s*([^, ]*)/ do |k, v|
result_hash[k.to_sym] = v end

Also, this will get ran potentially thousands of times per second,
so executation speed is of some concern.

I don't know if the above is the best you can do, but I do believe it
is a bit faster than your original version.

Daniel Brockman · Jul 2, 2005

Daniel Brockman said:
I don't know if the above is the best you can do, but I do believe
it is a bit faster than your original version.

According to my tests, it is also more than twice as fast as that
enourmous strscan implementation. (Can anyone confirm?)

Ara.T.Howard · Jul 2, 2005

According to my tests, it is also more than twice as fast as that
enourmous strscan implementation. (Can anyone confirm?)

sure it is. but with no error checking and it accepts invalid strings. it
will also fail for things like

42.0 : value

since '.' is not a \w (tricky). anyhow i didn't know the standard scan was so
fast! a simple/similar version of the strscan method runs about the same for
small strings, but scales a bit better:

jib:~ > ruby a.rb
HashString @ 16.7303600311279
HashStringSimple @ 21.1355850696564

jib:~ > cat a.rb
require 'strscan'

class HashString < ::Hash
def initialize s
ss = StringScanner::new s, false
loop do
ss.scan(%r/\s*([^:]*[^\s:])\s*:\s*([^,]*[^,\s])\s*,?\s*/o) or break
self[ss[1]] = ss[2]
end
end
end

class HashStringSimple < ::Hash
def initialize s
s.scan(%r/\s*([^:]*[^\s:])\s*:\s*([^,]*[^,\s])\s*,?\s*/o){|k,v| self[k] = v}
end
end

def time label
fork do
a = Time::now.to_f
yield
b = Time::now.to_f
t = b - a
puts "#{ label } @ #{ t }"
end
Process::wait
end

n = 2 ** 20
huge = ''

n.times do |i|
huge << "#{ rand } : #{ rand }"
huge << ", " if i != n - 1
end

time('HashString'){ hs = HashString::new huge }

time('HashStringSimple'){ hs = HashStringSimple::new huge }

cheers.

-a
--
===============================================================================
| email :: ara [dot] t [dot] howard [at] noaa [dot] gov
| phone :: 303.497.6469
| My religion is very simple. My religion is kindness.
| --Tenzin Gyatso
===============================================================================

Daniel Brockman · Jul 2, 2005

Ara.T.Howard said:
sure it is. but with no error checking and it accepts
invalid strings.

Perhaps there are no invalid strings?

it will also fail for things like

42.0 : value

I didn't see that in the original post. The key should be a symbol,
which I took to mean it had to be a valid Ruby identifier.

since '.' is not a \w (tricky).

But the characters permitted in Ruby identifiers are. (Though I
forgot `!' and `?'.)

anyhow i didn't know the standard scan was so fast!

Regular expressions are pretty fast, because you compile them.
I think of them as OpenGL display lists.

[...] %r/\s*( [...]

I've never seen `%r/.../' used before --- interesting.

Joe Van Dyk · Aug 12, 2005

Joe Van Dyk <[email protected]> writes:

(original attempt.. was too slow)

=20

attributes =3D message.split(",")
attributes.each do |attribute|
key, value =3D attribute.scan(/(\w+): (.+)/)[0]
result_hash[key.to_sym] =3D value.strip
end

Click to expand...

=20
How about this?
=20
message.scan /(\w+)\s*:\s*([^, ]*)/ do |k, v|
result_hash[k.to_sym] =3D v end
=20

Also, this will get ran potentially thousands of times per second,
so executation speed is of some concern.

Click to expand...

=20
I don't know if the above is the best you can do, but I do believe it
is a bit faster than your original version.

Bringing up an old thread.... I have the following code.

# Converts an array like
# [[0, "x_position: 20, y_position: 40, z_position: 30"],
# [1, "x_position: 20, y_position: 40, z_position: 30"]
# ]
# =20
# into a hash like
# { 0 =3D> { :x_position =3D> "20", :y_position =3D> "40", :z_position =
=3D> "30" },
# 1 =3D> { :x_position =3D> "20", :y_position =3D> "40", :z_position =
=3D> "30" }
# }
def self.convert_message_to_hash players_array
raise "Can't do anything with empty message!" if original_message.nil?
result_hash =3D {}
original_message.each do |id, message|
message.scan(/(\w+)\s*:\s*([^, ]*)/) do |k, v|
result_hash[id][k.to_sym] =3D v=20
end
end
result_hash
end
end

That code in my application leads to the following profiling:
% cumulative self self total
time seconds seconds calls ms/call ms/call name
37.84 261.23 261.23 5569 46.91 69.61 String#scan
8.12 317.28 56.05 201791 0.28 0.28 Hash#[]
6.42 409.72 44.31 150783 0.29 0.29 Hash#[]=3D
6.36 453.61 43.89 144782 0.30 0.30 String#to_sym

I believe this is probably the critical part of my code.

Ideas on how to improve this would be appreciated!

Joe Van Dyk · Aug 12, 2005

Joe Van Dyk <[email protected]> writes:

Click to expand...

=20
(original attempt.. was too slow)
=20

attributes =3D message.split(",")
attributes.each do |attribute|
key, value =3D attribute.scan(/(\w+): (.+)/)[0]
result_hash[key.to_sym] =3D value.strip
end

Click to expand...

How about this?

message.scan /(\w+)\s*:\s*([^, ]*)/ do |k, v|
result_hash[k.to_sym] =3D v end

Also, this will get ran potentially thousands of times per second,
so executation speed is of some concern.

Click to expand...

I don't know if the above is the best you can do, but I do believe it
is a bit faster than your original version.

Click to expand...

=20
Bringing up an old thread.... I have the following code.
=20
# Converts an array like
# [[0, "x_position: 20, y_position: 40, z_position: 30"],
# [1, "x_position: 20, y_position: 40, z_position: 30"]
# ]
#
# into a hash like
# { 0 =3D> { :x_position =3D> "20", :y_position =3D> "40", :z_position= =3D> "30" },
# 1 =3D> { :x_position =3D> "20", :y_position =3D> "40", :z_position= =3D> "30" }
# }

Whoops! =20
'original_message' should be 'players_array' in the code. =20

def self.convert_message_to_hash players_array
raise "Can't do anything with empty message!" if original_message.nil= ?
result_hash =3D {}
original_message.each do |id, message|
message.scan(/(\w+)\s*:\s*([^, ]*)/) do |k, v|
result_hash[id][k.to_sym] =3D v
end
end
result_hash
end
end
=20
That code in my application leads to the following profiling:
% cumulative self self total
time seconds seconds calls ms/call ms/call name
37.84 261.23 261.23 5569 46.91 69.61 String#scan
8.12 317.28 56.05 201791 0.28 0.28 Hash#[]
6.42 409.72 44.31 150783 0.29 0.29 Hash#[]=3D
6.36 453.61 43.89 144782 0.30 0.30 String#to_sym
=20
I believe this is probably the critical part of my code.
=20
Ideas on how to improve this would be appreciated!

William James · Aug 12, 2005

Joe said:
I have a string like:

"some_key: blah, some_other_key: more_blah,
yet_other_key: yet_more_blah"

I want to build up a hash like

{ :some_key => "blah", :some_other_key => "more_blah",
:yet_other_key => "yet_more_blah" }

h={}
DATA.read.split(/\s*[:,\n]\s*/).inject(false){|a,b|
if a then h[a.to_sym]=b; a=false else a=b end }

p h
__END__
some_key: blah, some_other_key: more_blah, yet_other_key: yet_more_blah
key1:value1
key2: value2
key3 : value 3 , key4 :value4 , key555555555:value55555

----
Output:

{:key3=>"value 3", :some_key=>"blah", :key4=>"value4",
:some_other_key=>"more_blah", :key555555555=>"value55555",
:yet_other_key=>"yet_more_blah", :key1=>"value1",
:key2=>"value2"}

Joe Van Dyk · Aug 12, 2005

Joe said:
Joe said:

I have a string like:

"some_key: blah, some_other_key: more_blah,
yet_other_key: yet_more_blah"

I want to build up a hash like

{ :some_key =3D> "blah", :some_other_key =3D> "more_blah",
:yet_other_key =3D> "yet_more_blah" }

Click to expand...

=20
h=3D{}
DATA.read.split(/\s*[:,\n]\s*/).inject(false){|a,b|
if a then h[a.to_sym]=3Db; a=3Dfalse else a=3Db end }
=20
p h
__END__
some_key: blah, some_other_key: more_blah, yet_other_key: yet_more_blah
key1:value1
key2: value2
key3 : value 3 , key4 :value4 , key555555555:value55555
=20
=20
----
Output:
=20
{:key3=3D>"value 3", :some_key=3D>"blah", :key4=3D>"value4",
:some_other_key=3D>"more_blah", :key555555555=3D>"value55555",
:yet_other_key=3D>"yet_more_blah", :key1=3D>"value1",
:key2=3D>"value2"}

Why would this approach be faster?

William James · Aug 12, 2005

Joe said:
Joe said:

I have a string like:

"some_key: blah, some_other_key: more_blah,
yet_other_key: yet_more_blah"

I want to build up a hash like

{ :some_key => "blah", :some_other_key => "more_blah",
:yet_other_key => "yet_more_blah" }

Click to expand...

h={}
DATA.read.split(/\s*[:,\n]\s*/).inject(false){|a,b|
if a then h[a.to_sym]=b; a=false else a=b end }

p h
__END__
some_key: blah, some_other_key: more_blah, yet_other_key: yet_more_blah
key1:value1
key2: value2
key3 : value 3 , key4 :value4 , key555555555:value55555

----
Output:

{:key3=>"value 3", :some_key=>"blah", :key4=>"value4",
:some_other_key=>"more_blah", :key555555555=>"value55555",
:yet_other_key=>"yet_more_blah", :key1=>"value1",
:key2=>"value2"}

Click to expand...

Why would this approach be faster?

data = []; DATA.each{|x| data << x.chomp}
iter = 100_000

h={}
start = Time.now

iter.times {
data.each{|line| line.split(/\s*[:,]\s*/).inject(false){|a,b|
if a then h[a.to_sym]=b; a=false else a=b end }}
}
t1 = Time.now - start

result_hash = {}
start = Time.now

iter.times {
data.each{|line| line.scan(/(\w+)\s*:\s*([^, ]*)/) do |k, v|
result_hash[k.to_sym] = v
end
} }
t2 = Time.now - start

p result_hash == h
p t1,t2

__END__
some_key: blah, some_other_key: more_blah, yet_other_key: yet_more_blah
key1:value1,key2: value2,key3 :value_3 ,
key4:value4,key555555555:value55555
x_position: 20, y_position: 40, z_position: 30,zz_position:85

Output:
true
22.859
28.094

Simon Kröger · Aug 12, 2005

This may not be exactly an improvement but at least ..hmm..
funny?

--------------------------------------------------------------

a = [[0, "x_position: 20, y_position: 40, z_position: 30"],
[1, "x_position: 20, y_position: 40, z_position: 30"]]

h = eval('{' + ("~" + a.join("~")).
gsub(/\s*([^:\~|]+):\s*([^,~]+),?/, ':\1=>\'\2\',').
gsub(/~(\d+)~/, '},\1=>{')[2..-2]+ '}}')

puts h[1][:y_position]

--------------------------------------------------------------

cheers

Simon

Joe Van Dyk <[email protected]> writes:

Click to expand...

(original attempt.. was too slow)

attributes = message.split(",")
attributes.each do |attribute|
key, value = attribute.scan(/(\w+): (.+)/)[0]
result_hash[key.to_sym] = value.strip
end

Click to expand...

How about this?

message.scan /(\w+)\s*:\s*([^, ]*)/ do |k, v|
result_hash[k.to_sym] = v end

Also, this will get ran potentially thousands of times per second,
so executation speed is of some concern.

Click to expand...

I don't know if the above is the best you can do, but I do believe it
is a bit faster than your original version.

Click to expand...

Bringing up an old thread.... I have the following code.

# Converts an array like
# [[0, "x_position: 20, y_position: 40, z_position: 30"],
# [1, "x_position: 20, y_position: 40, z_position: 30"]
# ]
#
# into a hash like
# { 0 => { :x_position => "20", :y_position => "40", :z_position => "30" },
# 1 => { :x_position => "20", :y_position => "40", :z_position => "30" }
# }
def self.convert_message_to_hash players_array
raise "Can't do anything with empty message!" if original_message.nil?
result_hash = {}
original_message.each do |id, message|
message.scan(/(\w+)\s*:\s*([^, ]*)/) do |k, v|
result_hash[id][k.to_sym] = v
end
end
result_hash
end
end

That code in my application leads to the following profiling:
% cumulative self self total
time seconds seconds calls ms/call ms/call name
37.84 261.23 261.23 5569 46.91 69.61 String#scan
8.12 317.28 56.05 201791 0.28 0.28 Hash#[]
6.42 409.72 44.31 150783 0.29 0.29 Hash#[]=
6.36 453.61 43.89 144782 0.30 0.30 String#to_sym

I believe this is probably the critical part of my code.

Ideas on how to improve this would be appreciated!

William James · Aug 13, 2005

Faster yet:

iter = 20_000

data = DATA.inject([]){|a,x| a << x.chomp}
times = []

h1={}
times << Time.now

iter.times {
data.each{ |line| tmp = line.split(/\s*[:,]\s*/)
(0...tmp.size).step(2){ |i| h1[tmp.to_sym]=tmp[i+1] }
}
}

h2={}
times << Time.now

iter.times {
data.each{ |line| line.split(/\s*[:,]\s*/).inject(false){|a,b|
if a then h2[a.to_sym]=b; a=false else a=b end }
}
}

result_hash = {}
times << Time.now

iter.times {
data.each{|line| line.scan(/(\w+)\s*:\s*([^, ]*)/) { |k, v|
result_hash[k.to_sym] = v }
}
}

times << Time.now

p result_hash == h2 && h2 == h1
(1...times.size).each{|i| p times-times[i-1]}

__END__
some_key: blah, some_other_key: more_blah, yet_other_key: yet_more_blah
key1:value1,key2: value2,key3 :value_3 ,
key4:value4,key555555555:value55555
x_position: 20, y_position: 40, z_position: 30,zz_position:85

Output (on a slower computer):
true
6.82
7.27
8.413

Simon Kröger · Aug 13, 2005

--------------050804040408030407010602
Content-Type: text/plain; charset=us-ascii; format=flowed
Content-Transfer-Encoding: 7bit

Hi Joe,

as a more serious contribution to your problem:
this is 10 times faster:

data.each{ |j, line|
k, v = -2, 0
while (v = line.index(58, k))
h5[j][line[(k+2)...v].intern] =
line[(v+2)...(k = line.index(44, v) || line.length)]
end
}

i attached the whole test script, the output is:

user system total real
inject 4.640000 0.046000 4.686000 ( 4.687000)
scan 5.204000 0.063000 5.267000 ( 5.875000)
eval 6.078000 0.016000 6.094000 ( 6.094000)
tmp 4.375000 0.000000 4.375000 ( 4.391000)
index 0.312000 0.000000 0.312000 ( 0.312000)
true

cheers

Simon

--------------050804040408030407010602
Content-Type: text/plain;
name="hack.rb"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline;
filename="hack.rb"

require 'benchmark'

a = []
100.times {|i| a << [i, "x_position: #{200+i}, y_position: #{400+i}, z_position: #{300+i}"]}

data = a
iter = 1000
h1 = h2 = h3 = h4 = h5 = {}

Benchmark.bm 20 do |bm|
bm.report("inject") do
iter.times {
data.each{|i, line| h1={}; line.split(/\s*[:,]\s*/).inject(false){|a,b|
if a then h1[a.to_sym]=b; a=false else a=b end }}
}
end

bm.report "scan" do
iter.times {
data.each{|i, line|h2={}; line.scan(/(\w+)\s*:\s*([^, ]*)/) do |k, v|
h2[k.to_sym] = v
end
} }
end

bm.report "eval" do
iter.times {
h3 = eval('{' << ('~' << data.join('~')).
gsub!(/\s*([^:~]+):\s*([^,~]+),?/, ':\1=>\'\2\',').
gsub!(/~(\d+)~/, '},\1=>{')[2..-2] << '}}')
}
end

bm.report "tmp" do
iter.times {
data.each{ |j, line| h4[j]={};tmp = line.split(/\s*[:,]\s*/)
(0...tmp.size).step(2){ |i| h4[j][tmp.to_sym]=tmp[i+1] }
}}
end

bm.report "index" do
iter.times {
data.each{ |j, line|
k, v = -2, 0
while (v = line.index(58, k))
h5[j][line[(k+2)...v].intern] = line[(v+2)...(k = line.index(44, v) || line.length)]
end
}
}
end

end

p((h1 == h2) && (h1 == h3) && (h1 == h4) && (h1 == h5))

--------------050804040408030407010602--

Joe Van Dyk · Aug 13, 2005

Faster yet:
=20
iter =3D 20_000
=20
data =3D DATA.inject([]){|a,x| a << x.chomp}

What is DATA? Does it have anything to do with __END__ at the bottom?

Joe

times =3D []
=20
h1=3D{}
times << Time.now
=20
iter.times {
data.each{ |line| tmp =3D line.split(/\s*[:,]\s*/)
(0...tmp.size).step(2){ |i| h1[tmp.to_sym]=3Dtmp[i+1] }
}
}
=20
=20
h2=3D{}
times << Time.now
=20
iter.times {
data.each{ |line| line.split(/\s*[:,]\s*/).inject(false){|a,b|
if a then h2[a.to_sym]=3Db; a=3Dfalse else a=3Db end }
}
}
=20
result_hash =3D {}
times << Time.now
=20
iter.times {
data.each{|line| line.scan(/(\w+)\s*:\s*([^, ]*)/) { |k, v|
result_hash[k.to_sym] =3D v }
}
}
=20
times << Time.now
=20
p result_hash =3D=3D h2 && h2 =3D=3D h1
(1...times.size).each{|i| p times-times[i-1]}
=20
__END__
some_key: blah, some_other_key: more_blah, yet_other_key: yet_more_blah
key1:value1,key2: value2,key3 :value_3 ,
key4:value4,key555555555:value55555
x_position: 20, y_position: 40, z_position: 30,zz_position:85
=20
=20
Output (on a slower computer):
true
6.82
7.27
8.413
=20
=20

Joe Van Dyk · Aug 13, 2005

Hi Joe,
=20
as a more serious contribution to your problem:
this is 10 times faster:
=20
data.each{ |j, line|
k, v =3D -2, 0
while (v =3D line.index(58, k))
h5[j][line[(k+2)...v].intern] =3D
line[(v+2)...(k =3D line.index(44, v) || line.length)]
end
}
=20
i attached the whole test script, the output is:
=20
user system total real
inject 4.640000 0.046000 4.686000 ( 4.687000)
scan 5.204000 0.063000 5.267000 ( 5.875000)
eval 6.078000 0.016000 6.094000 ( 6.094000)
tmp 4.375000 0.000000 4.375000 ( 4.391000)
index 0.312000 0.000000 0.312000 ( 0.312000)
true
=20
cheers
=20
Simon
=20
=20
require 'benchmark'
=20
a =3D []
100.times {|i| a << [i, "x_position: #{200+i}, y_position: #{400+i}, z_po= sition: #{300+i}"]}
=20
data =3D a
iter =3D 1000
h1 =3D h2 =3D h3 =3D h4 =3D h5 =3D {}
=20
Benchmark.bm 20 do |bm|
bm.report("inject") do
iter.times {
data.each{|i, line| h1=3D{}; line.split(/\s*[:= ,]\s*/).inject(false){|a,b|
if a then h1[a.to_sym]=3Db; a=3Dfalse else a= =3Db end }}
}
end
=20
bm.report "scan" do
iter.times {
data.each{|i, line|h2=3D{}; line.scan(/(\w+)\s*:\s*([^= , ]*)/) do |k, v|
h2[k.to_sym] =3D v
end
} }
end
=20
bm.report "eval" do
iter.times {
h3 =3D eval('{' << ('~' << data.join('~')).
gsub!(/\s*([^:~]+):\s*([^,~]+),?/, ':\1= =3D>\'\2\',').
gsub!(/~(\d+)~/, '},\1=3D>{')[2..= -2] << '}}')
}
end
=20
bm.report "tmp" do
iter.times {
data.each{ |j, line| h4[j]=3D{};tmp =3D line.spli= t(/\s*[:,]\s*/)
(0...tmp.size).step(2){ |i| h4[j][tmp.= to_sym]=3Dtmp[i+1] }
}}
end
=20
bm.report "index" do
iter.times {
data.each{ |j, line|
k, v =3D -2, 0
while (v =3D line.index(58, k))
h5[j][line[(k+2)...v].intern] =3D=

line[(v+2)...(k =3D line.index(44, v) || line.length)]

end
}
}
end
=20
end
=20
p((h1 =3D=3D h2) && (h1 =3D=3D h3) && (h1 =3D=3D h4) && (h1 =3D=3D h5))

Click to expand...

Thank you! I must say that the logic in the 'index' one confuses me though=

Simon Kröger · Aug 13, 2005

data.each{ |j, line|
k, v = -2, 0
while (v = line.index(58, k))
h5[j][line[(k+2)...v].intern] =
line[(v+2)...(k = line.index(44, v) || line.length)]
end
}

Ok, lets walk this trough:

j is just the key in the outer hash.

line looks like:
"x_position: 200, y_position: 400, z_position: 300"

k is the index of the key (like 'x_position') in the line
v is the index of the value (like '200') in the line

58 is the ascii number of the char ':'
44 is the ascii number of the char ','

#index returns the index of the char in the string or nil
if no such char exists (after the index given as second
parameter)

while there is another ':' in the string
add key from last ',' to ':' => value from ':' to next ','
end

the +2 is there to skip the (',' or ':') and the space.

the initial k = -2 is there because no ',' is there to skip
at the beginning.

One is loosing readability of code if optimizing for speed
has top priority - even in ruby.

data.each{|j, line|
line.split(',').each{|kv|
k, v = kv.split(':')
h6[j][k.strip.intern] = v.strip
}
}

is much nicer, but look at the numbers:

user system total real
inject 4.672000 0.015000 4.687000 ( 5.281000)
scan 5.250000 0.063000 5.313000 ( 5.312000)
eval 6.140000 0.047000 6.187000 ( 6.219000)
tmp 4.407000 0.062000 4.469000 ( 4.469000)
index 0.375000 0.000000 0.375000 ( 0.375000)
split 10.625000 0.141000 10.766000 ( 10.781000)
true

cheers

Simon

Joe Van Dyk · Aug 14, 2005

=20
data.each{ |j, line|
k, v =3D -2, 0
while (v =3D line.index(58, k))
h5[j][line[(k+2)...v].intern] =3D
line[(v+2)...(k =3D line.index(44, v) || line.length)]
end
}
=20
Ok, lets walk this trough:
=20
j is just the key in the outer hash.
=20
line looks like:
"x_position: 200, y_position: 400, z_position: 300"
=20
k is the index of the key (like 'x_position') in the line
v is the index of the value (like '200') in the line
=20
58 is the ascii number of the char ':'
44 is the ascii number of the char ','
=20
#index returns the index of the char in the string or nil
if no such char exists (after the index given as second
parameter)
=20
while there is another ':' in the string
add key from last ',' to ':' =3D> value from ':' to next ','
end
=20
the +2 is there to skip the (',' or ':') and the space.
=20
the initial k =3D -2 is there because no ',' is there to skip
at the beginning.
=20
One is loosing readability of code if optimizing for speed
has top priority - even in ruby.
=20
data.each{|j, line|
line.split(',').each{|kv|
k, v =3D kv.split(':')
h6[j][k.strip.intern] =3D v.strip
}
}
=20
is much nicer, but look at the numbers:
=20
user system total real
inject 4.672000 0.015000 4.687000 ( 5.281000)
scan 5.250000 0.063000 5.313000 ( 5.312000)
eval 6.140000 0.047000 6.187000 ( 6.219000)
tmp 4.407000 0.062000 4.469000 ( 4.469000)
index 0.375000 0.000000 0.375000 ( 0.375000)
split 10.625000 0.141000 10.766000 ( 10.781000)
true

Ah, ok. In my application, there's a bunch more than 3 possible keys
and they are of differing length. I am in control of the format of
the incoming strings though, and so could modify their format to make
them easier/faster to parse. Any ideas on what would be a more
efficient format for transporting the data?

(for reference, the original string format was "id: 3, x_position: 39,
y_position: 209, z_position: 39" and in my real application, there's
about twenty different attributes that are in the string.)

Perhaps it would be more efficient to not convert the string into a hash?

All I really need to be able to do is access/display a player's data
via some mechanism, and a player's data should be updated once a
second, and there's up to 400 players. The above was the best way I
could come up with transporting and accessing the data, but perhaps
there's a better way of doing it.

[QUIZ][SUMMARY] Restoring Data From SQL (#199)	1	Apr 14, 2009
FW: Fml status report (ruby-talk ML)	1	Dec 19, 2010
FW: Fml status report (ruby-talk ML)	0	Apr 30, 2007
NoSQL Movement?	30	Mar 3, 2010
ANN: Sequel 2.1.0 Released	2	Jun 17, 2008
[ANN] JRuby 1.4.0 Released	2	Nov 2, 2009
SWIG Ruby Memory Management	1	Aug 19, 2005
[SUMMARY] Word Chains (#44)	12	Sep 1, 2005

no clue

Joe Van Dyk

Timothy Hunter

Devin Mullins

Ara.T.Howard

Daniel Brockman

Daniel Brockman

Ara.T.Howard

Daniel Brockman

Joe Van Dyk

Joe Van Dyk

William James

Joe Van Dyk

William James

Simon Kröger

William James

Simon Kröger

Joe Van Dyk

Joe Van Dyk

Simon Kröger

Joe Van Dyk

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads