no clue

J

Joe Van Dyk

I thought for all of five seconds for a good subject line for this
question, but failed. Sorry!

I have a string like:

"some_key: blah, some_other_key: more_blah, yet_other_key: yet_more_blah"

I want to build up a hash like

{ :some_key =3D> "blah", :some_other_key =3D> "more_blah", :yet_other_key
=3D> "yet_more_blah" }

And I don't really want to have to know what the possible keys are in advan=
ce.

So, the message format looks like:
<key>: <value>, <key>: <value>

How can I properly extract it out?

Here's my initial attempt, which works, but seems hackish:

attributes =3D message.split(",")
attributes.each do |attribute|
key, value =3D attribute.scan(/(\w+): (.+)/)[0]
result_hash[key.to_sym] =3D value.strip=20
end
=20
Also, this will get ran potentially thousands of times per second, so
executation speed is of some concern.
 
T

Timothy Hunter

Joe said:
I thought for all of five seconds for a good subject line for this
question, but failed. Sorry!

I have a string like:

"some_key: blah, some_other_key: more_blah, yet_other_key: yet_more_blah"

I want to build up a hash like

{ :some_key => "blah", :some_other_key => "more_blah", :yet_other_key
=> "yet_more_blah" }

And I don't really want to have to know what the possible keys are in advance.
So, the message format looks like:
<key>: <value>, <key>: <value>

How can I properly extract it out?

Here's my initial attempt, which works, but seems hackish:

attributes = message.split(",")
attributes.each do |attribute|
key, value = attribute.scan(/(\w+): (.+)/)[0]
result_hash[key.to_sym] = value.strip
end

Also, this will get ran potentially thousands of times per second, so
executation speed is of some concern.

It doesn't look particularly hackish to me, but maybe my sensibilities
aren't fine enough. The only thing I'd say is that if performance is
important then we ought to ask the regular expression to strip
whitespace around the key and value so we can avoid the #strip method.

Here's my version:

require 'pp'

hash = Hash.new
DATA.each do |line|
attrs = line.split(/,/)
attrs.each do |attr|
m = /\s*(\w+)\s*:\s*(\w+)\s*/.match(attr)
raise "#{attr.chomp} doesn't look like key:value" unless m
hash[m[1].intern] = m[2]
end
end

pp hash

__END__
some_key: blah, some_other_key: more_blah, yet_other_key: yet_more_blah
key1:value1
key2: value2
key3 : value 3 , key4 :value4 , key555555555:value55555



The output is:
{:some_other_key=>"more_blah",
:key555555555=>"value55555",
:yet_other_key=>"yet_more_blah",
:key1=>"value1",
:key2=>"value2",
:key3=>"value",
:some_key=>"blah",
:key4=>"value4"}
 
D

Devin Mullins

Joe said:
Here's my initial attempt, which works, but seems hackish:

attributes = message.split(",")
attributes.each do |attribute|
key, value = attribute.scan(/(\w+): (.+)/)[0]
result_hash[key.to_sym] = value.strip
end
Slightly more readable:

result_hash = {}
attributes = message.split(",")
attributes.each do |attribute|
key, value = *attribute.match(/(\w+): (.+)/).captures
result_hash[key.to_sym] = value.strip
end


Yet more readable:

result_hash = {}
attributes = message.split ","
attributes.each do |attribute|
key, value = *attribute.split(": ",2)
result_hash[key.to_sym] = value.strip
end

Not sure:
attributes = message.split ","
result_hash = attributes.inject {} do |hash,attribute|
key, value = *attribute.split(": ",2)
hash[key.to_sym] = value.strip
hash
end
Also, this will get ran potentially thousands of times per second, so
executation speed is of some concern.
No clue.

Devin
 
A

Ara.T.Howard

I thought for all of five seconds for a good subject line for this
question, but failed. Sorry!

I have a string like:

"some_key: blah, some_other_key: more_blah, yet_other_key: yet_more_blah"

I want to build up a hash like

{ :some_key => "blah", :some_other_key => "more_blah", :yet_other_key
=> "yet_more_blah" }

And I don't really want to have to know what the possible keys are in advance.

So, the message format looks like:
<key>: <value>, <key>: <value>

How can I properly extract it out?

Here's my initial attempt, which works, but seems hackish:

attributes = message.split(",")
attributes.each do |attribute|
key, value = attribute.scan(/(\w+): (.+)/)[0]
result_hash[key.to_sym] = value.strip
end

Also, this will get ran potentially thousands of times per second, so
executation speed is of some concern.

you'll have a hard time getting much faster than strscan:

harp:~ > cat a.rb
require 'strscan'

class HashString < ::Hash
class SyntaxError < StandardError; end
def initialize s, dup = false
load_from s, dup
end
def load_from s, dup = false
@ss = StringScanner::new s, dup
loop do
key, value = scan_key, scan_value
self[key] = value
break if eos?
end
@ss = nil
end
def scan_key
@ss.scan(%r/[\n\s]*([^:\n]+)[\n\s]*(?=:)/o) or syntax_error
key = @ss[1]
@ss.scan(%r/[\n\s]*:[\n\s]*/o) or syntax_error
key
end
def scan_value
scan(%r/[\n\s]*([^,\n]+)[\n\s]*/o) or syntax_error
value = @ss[1]
scan(%r/[\n\s]*,?[\n\s]*/o)
value
end
def eos?
@ss.eos?
end
def scan pat
@ss.scan pat
end
def syntax_error
raise SyntaxError, @ss.peek(16) + '...'
end
def to_yaml
{}.merge(self).to_yaml
end
end

s = <<-txt
some_key: blah,
some_other_key: more_blah, yet_other_key:
yet_more_blah
txt

hs = HashString::new s

require 'yaml'
y hs


harp:~ > ruby a.rb
---
some_key: blah
yet_other_key: yet_more_blah
some_other_key: more_blah


strscan is pure c and extremely fast. it doesn't end up creating any new
strings like spliting or regex based solutions. it keeps a pointer into the
string and moves through it. it takes some getting used to be is really good
and part of the standard dist.

cheers.

-a
--
===============================================================================
| email :: ara [dot] t [dot] howard [at] noaa [dot] gov
| phone :: 303.497.6469
| My religion is very simple. My religion is kindness.
| --Tenzin Gyatso
===============================================================================
 
D

Daniel Brockman

Joe Van Dyk said:
attributes = message.split(",")
attributes.each do |attribute|
key, value = attribute.scan(/(\w+): (.+)/)[0]
result_hash[key.to_sym] = value.strip
end

How about this?

message.scan /(\w+)\s*:\s*([^, ]*)/ do |k, v|
result_hash[k.to_sym] = v end
Also, this will get ran potentially thousands of times per second,
so executation speed is of some concern.

I don't know if the above is the best you can do, but I do believe it
is a bit faster than your original version.
 
D

Daniel Brockman

Daniel Brockman said:
I don't know if the above is the best you can do, but I do believe
it is a bit faster than your original version.

According to my tests, it is also more than twice as fast as that
enourmous strscan implementation. (Can anyone confirm?)
 
A

Ara.T.Howard

According to my tests, it is also more than twice as fast as that
enourmous strscan implementation. (Can anyone confirm?)

sure it is. but with no error checking and it accepts invalid strings. it
will also fail for things like

42.0 : value

since '.' is not a \w (tricky). anyhow i didn't know the standard scan was so
fast! a simple/similar version of the strscan method runs about the same for
small strings, but scales a bit better:

jib:~ > ruby a.rb
HashString @ 16.7303600311279
HashStringSimple @ 21.1355850696564

jib:~ > cat a.rb
require 'strscan'

class HashString < ::Hash
def initialize s
ss = StringScanner::new s, false
loop do
ss.scan(%r/\s*([^:]*[^\s:])\s*:\s*([^,]*[^,\s])\s*,?\s*/o) or break
self[ss[1]] = ss[2]
end
end
end

class HashStringSimple < ::Hash
def initialize s
s.scan(%r/\s*([^:]*[^\s:])\s*:\s*([^,]*[^,\s])\s*,?\s*/o){|k,v| self[k] = v}
end
end

def time label
fork do
a = Time::now.to_f
yield
b = Time::now.to_f
t = b - a
puts "#{ label } @ #{ t }"
end
Process::wait
end

n = 2 ** 20
huge = ''

n.times do |i|
huge << "#{ rand } : #{ rand }"
huge << ", " if i != n - 1
end

time('HashString'){ hs = HashString::new huge }

time('HashStringSimple'){ hs = HashStringSimple::new huge }

cheers.

-a
--
===============================================================================
| email :: ara [dot] t [dot] howard [at] noaa [dot] gov
| phone :: 303.497.6469
| My religion is very simple. My religion is kindness.
| --Tenzin Gyatso
===============================================================================
 
D

Daniel Brockman

Ara.T.Howard said:
sure it is. but with no error checking and it accepts
invalid strings.

Perhaps there are no invalid strings?
it will also fail for things like

42.0 : value

I didn't see that in the original post. The key should be a symbol,
which I took to mean it had to be a valid Ruby identifier.
since '.' is not a \w (tricky).

But the characters permitted in Ruby identifiers are. (Though I
forgot `!' and `?'.)
anyhow i didn't know the standard scan was so fast!

Regular expressions are pretty fast, because you compile them.
I think of them as OpenGL display lists. :)
[...] %r/\s*( [...]

I've never seen `%r/.../' used before --- interesting.
 
J

Joe Van Dyk

Joe Van Dyk <[email protected]> writes:

(original attempt.. was too slow)
=20
attributes =3D message.split(",")
attributes.each do |attribute|
key, value =3D attribute.scan(/(\w+): (.+)/)[0]
result_hash[key.to_sym] =3D value.strip
end
=20
How about this?
=20
message.scan /(\w+)\s*:\s*([^, ]*)/ do |k, v|
result_hash[k.to_sym] =3D v end
=20
Also, this will get ran potentially thousands of times per second,
so executation speed is of some concern.
=20
I don't know if the above is the best you can do, but I do believe it
is a bit faster than your original version.

Bringing up an old thread.... I have the following code.

# Converts an array like
# [[0, "x_position: 20, y_position: 40, z_position: 30"],
# [1, "x_position: 20, y_position: 40, z_position: 30"]
# ]
# =20
# into a hash like
# { 0 =3D> { :x_position =3D> "20", :y_position =3D> "40", :z_position =
=3D> "30" },
# 1 =3D> { :x_position =3D> "20", :y_position =3D> "40", :z_position =
=3D> "30" }
# }
def self.convert_message_to_hash players_array
raise "Can't do anything with empty message!" if original_message.nil?
result_hash =3D {}
original_message.each do |id, message|
message.scan(/(\w+)\s*:\s*([^, ]*)/) do |k, v|
result_hash[id][k.to_sym] =3D v=20
end
end
result_hash
end
end

That code in my application leads to the following profiling:
% cumulative self self total
time seconds seconds calls ms/call ms/call name
37.84 261.23 261.23 5569 46.91 69.61 String#scan
8.12 317.28 56.05 201791 0.28 0.28 Hash#[]
6.42 409.72 44.31 150783 0.29 0.29 Hash#[]=3D
6.36 453.61 43.89 144782 0.30 0.30 String#to_sym

I believe this is probably the critical part of my code.

Ideas on how to improve this would be appreciated!
 
J

Joe Van Dyk

Joe Van Dyk <[email protected]> writes:
=20
(original attempt.. was too slow)
=20
attributes =3D message.split(",")
attributes.each do |attribute|
key, value =3D attribute.scan(/(\w+): (.+)/)[0]
result_hash[key.to_sym] =3D value.strip
end

How about this?

message.scan /(\w+)\s*:\s*([^, ]*)/ do |k, v|
result_hash[k.to_sym] =3D v end
Also, this will get ran potentially thousands of times per second,
so executation speed is of some concern.

I don't know if the above is the best you can do, but I do believe it
is a bit faster than your original version.
=20
Bringing up an old thread.... I have the following code.
=20
# Converts an array like
# [[0, "x_position: 20, y_position: 40, z_position: 30"],
# [1, "x_position: 20, y_position: 40, z_position: 30"]
# ]
#
# into a hash like
# { 0 =3D> { :x_position =3D> "20", :y_position =3D> "40", :z_position= =3D> "30" },
# 1 =3D> { :x_position =3D> "20", :y_position =3D> "40", :z_position= =3D> "30" }
# }

Whoops! =20
'original_message' should be 'players_array' in the code. =20
def self.convert_message_to_hash players_array
raise "Can't do anything with empty message!" if original_message.nil= ?
result_hash =3D {}
original_message.each do |id, message|
message.scan(/(\w+)\s*:\s*([^, ]*)/) do |k, v|
result_hash[id][k.to_sym] =3D v
end
end
result_hash
end
end
=20
That code in my application leads to the following profiling:
% cumulative self self total
time seconds seconds calls ms/call ms/call name
37.84 261.23 261.23 5569 46.91 69.61 String#scan
8.12 317.28 56.05 201791 0.28 0.28 Hash#[]
6.42 409.72 44.31 150783 0.29 0.29 Hash#[]=3D
6.36 453.61 43.89 144782 0.30 0.30 String#to_sym
=20
I believe this is probably the critical part of my code.
=20
Ideas on how to improve this would be appreciated!
 
W

William James

Joe said:
I have a string like:

"some_key: blah, some_other_key: more_blah,
yet_other_key: yet_more_blah"

I want to build up a hash like

{ :some_key => "blah", :some_other_key => "more_blah",
:yet_other_key => "yet_more_blah" }

h={}
DATA.read.split(/\s*[:,\n]\s*/).inject(false){|a,b|
if a then h[a.to_sym]=b; a=false else a=b end }

p h
__END__
some_key: blah, some_other_key: more_blah, yet_other_key: yet_more_blah
key1:value1
key2: value2
key3 : value 3 , key4 :value4 , key555555555:value55555


----
Output:

{:key3=>"value 3", :some_key=>"blah", :key4=>"value4",
:some_other_key=>"more_blah", :key555555555=>"value55555",
:yet_other_key=>"yet_more_blah", :key1=>"value1",
:key2=>"value2"}
 
J

Joe Van Dyk

Joe said:
I have a string like:

"some_key: blah, some_other_key: more_blah,
yet_other_key: yet_more_blah"

I want to build up a hash like

{ :some_key =3D> "blah", :some_other_key =3D> "more_blah",
:yet_other_key =3D> "yet_more_blah" }
=20
h=3D{}
DATA.read.split(/\s*[:,\n]\s*/).inject(false){|a,b|
if a then h[a.to_sym]=3Db; a=3Dfalse else a=3Db end }
=20
p h
__END__
some_key: blah, some_other_key: more_blah, yet_other_key: yet_more_blah
key1:value1
key2: value2
key3 : value 3 , key4 :value4 , key555555555:value55555
=20
=20
----
Output:
=20
{:key3=3D>"value 3", :some_key=3D>"blah", :key4=3D>"value4",
:some_other_key=3D>"more_blah", :key555555555=3D>"value55555",
:yet_other_key=3D>"yet_more_blah", :key1=3D>"value1",
:key2=3D>"value2"}

Why would this approach be faster?
 
W

William James

Joe said:
Joe said:
I have a string like:

"some_key: blah, some_other_key: more_blah,
yet_other_key: yet_more_blah"

I want to build up a hash like

{ :some_key => "blah", :some_other_key => "more_blah",
:yet_other_key => "yet_more_blah" }

h={}
DATA.read.split(/\s*[:,\n]\s*/).inject(false){|a,b|
if a then h[a.to_sym]=b; a=false else a=b end }

p h
__END__
some_key: blah, some_other_key: more_blah, yet_other_key: yet_more_blah
key1:value1
key2: value2
key3 : value 3 , key4 :value4 , key555555555:value55555


----
Output:

{:key3=>"value 3", :some_key=>"blah", :key4=>"value4",
:some_other_key=>"more_blah", :key555555555=>"value55555",
:yet_other_key=>"yet_more_blah", :key1=>"value1",
:key2=>"value2"}

Why would this approach be faster?

data = []; DATA.each{|x| data << x.chomp}
iter = 100_000

h={}
start = Time.now

iter.times {
data.each{|line| line.split(/\s*[:,]\s*/).inject(false){|a,b|
if a then h[a.to_sym]=b; a=false else a=b end }}
}
t1 = Time.now - start

result_hash = {}
start = Time.now

iter.times {
data.each{|line| line.scan(/(\w+)\s*:\s*([^, ]*)/) do |k, v|
result_hash[k.to_sym] = v
end
} }
t2 = Time.now - start

p result_hash == h
p t1,t2

__END__
some_key: blah, some_other_key: more_blah, yet_other_key: yet_more_blah
key1:value1,key2: value2,key3 :value_3 ,
key4:value4,key555555555:value55555
x_position: 20, y_position: 40, z_position: 30,zz_position:85


Output:
true
22.859
28.094
 
S

Simon Kröger

This may not be exactly an improvement but at least ..hmm..
funny?

--------------------------------------------------------------

a = [[0, "x_position: 20, y_position: 40, z_position: 30"],
[1, "x_position: 20, y_position: 40, z_position: 30"]]

h = eval('{' + ("~" + a.join("~")).
gsub(/\s*([^:\~|]+):\s*([^,~]+),?/, ':\1=>\'\2\',').
gsub(/~(\d+)~/, '},\1=>{')[2..-2]+ '}}')

puts h[1][:y_position]

--------------------------------------------------------------

cheers

Simon
Joe Van Dyk <[email protected]> writes:


(original attempt.. was too slow)

attributes = message.split(",")
attributes.each do |attribute|
key, value = attribute.scan(/(\w+): (.+)/)[0]
result_hash[key.to_sym] = value.strip
end

How about this?

message.scan /(\w+)\s*:\s*([^, ]*)/ do |k, v|
result_hash[k.to_sym] = v end

Also, this will get ran potentially thousands of times per second,
so executation speed is of some concern.

I don't know if the above is the best you can do, but I do believe it
is a bit faster than your original version.


Bringing up an old thread.... I have the following code.

# Converts an array like
# [[0, "x_position: 20, y_position: 40, z_position: 30"],
# [1, "x_position: 20, y_position: 40, z_position: 30"]
# ]
#
# into a hash like
# { 0 => { :x_position => "20", :y_position => "40", :z_position => "30" },
# 1 => { :x_position => "20", :y_position => "40", :z_position => "30" }
# }
def self.convert_message_to_hash players_array
raise "Can't do anything with empty message!" if original_message.nil?
result_hash = {}
original_message.each do |id, message|
message.scan(/(\w+)\s*:\s*([^, ]*)/) do |k, v|
result_hash[id][k.to_sym] = v
end
end
result_hash
end
end

That code in my application leads to the following profiling:
% cumulative self self total
time seconds seconds calls ms/call ms/call name
37.84 261.23 261.23 5569 46.91 69.61 String#scan
8.12 317.28 56.05 201791 0.28 0.28 Hash#[]
6.42 409.72 44.31 150783 0.29 0.29 Hash#[]=
6.36 453.61 43.89 144782 0.30 0.30 String#to_sym

I believe this is probably the critical part of my code.

Ideas on how to improve this would be appreciated!
 
W

William James

Faster yet:

iter = 20_000

data = DATA.inject([]){|a,x| a << x.chomp}
times = []

h1={}
times << Time.now

iter.times {
data.each{ |line| tmp = line.split(/\s*[:,]\s*/)
(0...tmp.size).step(2){ |i| h1[tmp.to_sym]=tmp[i+1] }
}
}


h2={}
times << Time.now

iter.times {
data.each{ |line| line.split(/\s*[:,]\s*/).inject(false){|a,b|
if a then h2[a.to_sym]=b; a=false else a=b end }
}
}

result_hash = {}
times << Time.now

iter.times {
data.each{|line| line.scan(/(\w+)\s*:\s*([^, ]*)/) { |k, v|
result_hash[k.to_sym] = v }
}
}

times << Time.now

p result_hash == h2 && h2 == h1
(1...times.size).each{|i| p times-times[i-1]}

__END__
some_key: blah, some_other_key: more_blah, yet_other_key: yet_more_blah
key1:value1,key2: value2,key3 :value_3 ,
key4:value4,key555555555:value55555
x_position: 20, y_position: 40, z_position: 30,zz_position:85


Output (on a slower computer):
true
6.82
7.27
8.413
 
S

Simon Kröger

--------------050804040408030407010602
Content-Type: text/plain; charset=us-ascii; format=flowed
Content-Transfer-Encoding: 7bit

Hi Joe,

as a more serious contribution to your problem:
this is 10 times faster:

data.each{ |j, line|
k, v = -2, 0
while (v = line.index(58, k))
h5[j][line[(k+2)...v].intern] =
line[(v+2)...(k = line.index(44, v) || line.length)]
end
}

i attached the whole test script, the output is:

user system total real
inject 4.640000 0.046000 4.686000 ( 4.687000)
scan 5.204000 0.063000 5.267000 ( 5.875000)
eval 6.078000 0.016000 6.094000 ( 6.094000)
tmp 4.375000 0.000000 4.375000 ( 4.391000)
index 0.312000 0.000000 0.312000 ( 0.312000)
true

cheers

Simon

--------------050804040408030407010602
Content-Type: text/plain;
name="hack.rb"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline;
filename="hack.rb"

require 'benchmark'

a = []
100.times {|i| a << [i, "x_position: #{200+i}, y_position: #{400+i}, z_position: #{300+i}"]}

data = a
iter = 1000
h1 = h2 = h3 = h4 = h5 = {}

Benchmark.bm 20 do |bm|
bm.report("inject") do
iter.times {
data.each{|i, line| h1={}; line.split(/\s*[:,]\s*/).inject(false){|a,b|
if a then h1[a.to_sym]=b; a=false else a=b end }}
}
end

bm.report "scan" do
iter.times {
data.each{|i, line|h2={}; line.scan(/(\w+)\s*:\s*([^, ]*)/) do |k, v|
h2[k.to_sym] = v
end
} }
end

bm.report "eval" do
iter.times {
h3 = eval('{' << ('~' << data.join('~')).
gsub!(/\s*([^:~]+):\s*([^,~]+),?/, ':\1=>\'\2\',').
gsub!(/~(\d+)~/, '},\1=>{')[2..-2] << '}}')
}
end

bm.report "tmp" do
iter.times {
data.each{ |j, line| h4[j]={};tmp = line.split(/\s*[:,]\s*/)
(0...tmp.size).step(2){ |i| h4[j][tmp.to_sym]=tmp[i+1] }
}}
end

bm.report "index" do
iter.times {
data.each{ |j, line|
k, v = -2, 0
while (v = line.index(58, k))
h5[j][line[(k+2)...v].intern] = line[(v+2)...(k = line.index(44, v) || line.length)]
end
}
}
end

end

p((h1 == h2) && (h1 == h3) && (h1 == h4) && (h1 == h5))

--------------050804040408030407010602--
 
J

Joe Van Dyk

Faster yet:
=20
iter =3D 20_000
=20
data =3D DATA.inject([]){|a,x| a << x.chomp}

What is DATA? Does it have anything to do with __END__ at the bottom?

Joe
times =3D []
=20
h1=3D{}
times << Time.now
=20
iter.times {
data.each{ |line| tmp =3D line.split(/\s*[:,]\s*/)
(0...tmp.size).step(2){ |i| h1[tmp.to_sym]=3Dtmp[i+1] }
}
}
=20
=20
h2=3D{}
times << Time.now
=20
iter.times {
data.each{ |line| line.split(/\s*[:,]\s*/).inject(false){|a,b|
if a then h2[a.to_sym]=3Db; a=3Dfalse else a=3Db end }
}
}
=20
result_hash =3D {}
times << Time.now
=20
iter.times {
data.each{|line| line.scan(/(\w+)\s*:\s*([^, ]*)/) { |k, v|
result_hash[k.to_sym] =3D v }
}
}
=20
times << Time.now
=20
p result_hash =3D=3D h2 && h2 =3D=3D h1
(1...times.size).each{|i| p times-times[i-1]}
=20
__END__
some_key: blah, some_other_key: more_blah, yet_other_key: yet_more_blah
key1:value1,key2: value2,key3 :value_3 ,
key4:value4,key555555555:value55555
x_position: 20, y_position: 40, z_position: 30,zz_position:85
=20
=20
Output (on a slower computer):
true
6.82
7.27
8.413
=20
=20
 
J

Joe Van Dyk

Hi Joe,
=20
as a more serious contribution to your problem:
this is 10 times faster:
=20
data.each{ |j, line|
k, v =3D -2, 0
while (v =3D line.index(58, k))
h5[j][line[(k+2)...v].intern] =3D
line[(v+2)...(k =3D line.index(44, v) || line.length)]
end
}
=20
i attached the whole test script, the output is:
=20
user system total real
inject 4.640000 0.046000 4.686000 ( 4.687000)
scan 5.204000 0.063000 5.267000 ( 5.875000)
eval 6.078000 0.016000 6.094000 ( 6.094000)
tmp 4.375000 0.000000 4.375000 ( 4.391000)
index 0.312000 0.000000 0.312000 ( 0.312000)
true
=20
cheers
=20
Simon
=20
=20
require 'benchmark'
=20
a =3D []
100.times {|i| a << [i, "x_position: #{200+i}, y_position: #{400+i}, z_po= sition: #{300+i}"]}
=20
data =3D a
iter =3D 1000
h1 =3D h2 =3D h3 =3D h4 =3D h5 =3D {}
=20
Benchmark.bm 20 do |bm|
bm.report("inject") do
iter.times {
data.each{|i, line| h1=3D{}; line.split(/\s*[:= ,]\s*/).inject(false){|a,b|
if a then h1[a.to_sym]=3Db; a=3Dfalse else a= =3Db end }}
}
end
=20
bm.report "scan" do
iter.times {
data.each{|i, line|h2=3D{}; line.scan(/(\w+)\s*:\s*([^= , ]*)/) do |k, v|
h2[k.to_sym] =3D v
end
} }
end
=20
bm.report "eval" do
iter.times {
h3 =3D eval('{' << ('~' << data.join('~')).
gsub!(/\s*([^:~]+):\s*([^,~]+),?/, ':\1= =3D>\'\2\',').
gsub!(/~(\d+)~/, '},\1=3D>{')[2..= -2] << '}}')
}
end
=20
bm.report "tmp" do
iter.times {
data.each{ |j, line| h4[j]=3D{};tmp =3D line.spli= t(/\s*[:,]\s*/)
(0...tmp.size).step(2){ |i| h4[j][tmp.= to_sym]=3Dtmp[i+1] }
}}
end
=20
bm.report "index" do
iter.times {
data.each{ |j, line|
k, v =3D -2, 0
while (v =3D line.index(58, k))
h5[j][line[(k+2)...v].intern] =3D=

line[(v+2)...(k =3D line.index(44, v) || line.length)]
end
}
}
end
=20
end
=20
p((h1 =3D=3D h2) && (h1 =3D=3D h3) && (h1 =3D=3D h4) && (h1 =3D=3D h5))


Thank you! I must say that the logic in the 'index' one confuses me though=
 
S

Simon Kröger

data.each{ |j, line|
k, v = -2, 0
while (v = line.index(58, k))
h5[j][line[(k+2)...v].intern] =
line[(v+2)...(k = line.index(44, v) || line.length)]
end
}

Ok, lets walk this trough:

j is just the key in the outer hash.

line looks like:
"x_position: 200, y_position: 400, z_position: 300"

k is the index of the key (like 'x_position') in the line
v is the index of the value (like '200') in the line

58 is the ascii number of the char ':'
44 is the ascii number of the char ','

#index returns the index of the char in the string or nil
if no such char exists (after the index given as second
parameter)

while there is another ':' in the string
add key from last ',' to ':' => value from ':' to next ','
end

the +2 is there to skip the (',' or ':') and the space.

the initial k = -2 is there because no ',' is there to skip
at the beginning.

One is loosing readability of code if optimizing for speed
has top priority - even in ruby.

data.each{|j, line|
line.split(',').each{|kv|
k, v = kv.split(':')
h6[j][k.strip.intern] = v.strip
}
}

is much nicer, but look at the numbers:

user system total real
inject 4.672000 0.015000 4.687000 ( 5.281000)
scan 5.250000 0.063000 5.313000 ( 5.312000)
eval 6.140000 0.047000 6.187000 ( 6.219000)
tmp 4.407000 0.062000 4.469000 ( 4.469000)
index 0.375000 0.000000 0.375000 ( 0.375000)
split 10.625000 0.141000 10.766000 ( 10.781000)
true

cheers

Simon
 
J

Joe Van Dyk

=20
data.each{ |j, line|
k, v =3D -2, 0
while (v =3D line.index(58, k))
h5[j][line[(k+2)...v].intern] =3D
line[(v+2)...(k =3D line.index(44, v) || line.length)]
end
}
=20
Ok, lets walk this trough:
=20
j is just the key in the outer hash.
=20
line looks like:
"x_position: 200, y_position: 400, z_position: 300"
=20
k is the index of the key (like 'x_position') in the line
v is the index of the value (like '200') in the line
=20
58 is the ascii number of the char ':'
44 is the ascii number of the char ','
=20
#index returns the index of the char in the string or nil
if no such char exists (after the index given as second
parameter)
=20
while there is another ':' in the string
add key from last ',' to ':' =3D> value from ':' to next ','
end
=20
the +2 is there to skip the (',' or ':') and the space.
=20
the initial k =3D -2 is there because no ',' is there to skip
at the beginning.
=20
One is loosing readability of code if optimizing for speed
has top priority - even in ruby.
=20
data.each{|j, line|
line.split(',').each{|kv|
k, v =3D kv.split(':')
h6[j][k.strip.intern] =3D v.strip
}
}
=20
is much nicer, but look at the numbers:
=20
user system total real
inject 4.672000 0.015000 4.687000 ( 5.281000)
scan 5.250000 0.063000 5.313000 ( 5.312000)
eval 6.140000 0.047000 6.187000 ( 6.219000)
tmp 4.407000 0.062000 4.469000 ( 4.469000)
index 0.375000 0.000000 0.375000 ( 0.375000)
split 10.625000 0.141000 10.766000 ( 10.781000)
true

Ah, ok. In my application, there's a bunch more than 3 possible keys
and they are of differing length. I am in control of the format of
the incoming strings though, and so could modify their format to make
them easier/faster to parse. Any ideas on what would be a more
efficient format for transporting the data?

(for reference, the original string format was "id: 3, x_position: 39,
y_position: 209, z_position: 39" and in my real application, there's
about twenty different attributes that are in the string.)

Perhaps it would be more efficient to not convert the string into a hash?

All I really need to be able to do is access/display a player's data
via some mechanism, and a player's data should be updated once a
second, and there's up to 400 players. The above was the best way I
could come up with transporting and accessing the data, but perhaps
there's a better way of doing it.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,176
Messages
2,570,947
Members
47,501
Latest member
Ledmyplace

Latest Threads

Top