Possibly improvable code

R

Ralf Müller

Hi,

I made a comparison:

ram@lilith:~/src/ruby$time ruby -e 'num=STDIN.gets; a=Array.new; (0..9).each
{|i| a = 0}; (0..num.length).each {|i| a[num[i..i].to_i] += 1 };
a.each_index {|i| print "#{i} #{a}\n"} ' < pi.out
0 16096
1 16107
2 15965
3 15915
4 15967
5 16169
6 15934
7 16091
8 15999
9 15790

real 0m1.023s
user 0m0.970s
sys 0m0.010s


and


ram@lilith:~/src/ruby$time gawk '{i=0;j=length($0);while( i <= j) {i++;
z=substr($0,i,1); {count=count+ 1; a[z]=a[z]+1; }}}END{ for ( i in a ) {print
i ":" a;}print "Digits: " count;}' < pi.out | sort
0:16095
1:16107
2:15965
3:15915
4:15967
5:16169
6:15934
7:16091
8:15999
9:15790
:1
Digits: 160033

real 0m0.305s
user 0m0.310s
sys 0m0.000s


I'm really new to ruby. So I would like to know, if there is any possible
improvement of the ruby code.
pi.out contains about 160000 digits of PI and is made by "ruby sample/pi.rb >
pi.out' inside the src-dir of the ruby-*.tgz


Any hints???


thanks
ralf
 
R

Ruben

At Wed, 16 Jun 2004 23:13:26 +0900,
Ralf said:
ram@lilith:~/src/ruby$time ruby -e 'num=STDIN.gets; a=Array.new; (0..9).each
{|i| a = 0}; (0..num.length).each {|i| a[num[i..i].to_i] += 1 };
a.each_index {|i| print "#{i} #{a}\n"} ' < pi.out


# get the input
num = STDIN.gets;
# initialize array entries with 0
a = Array.new(255,0);
# count the occurence of each byte in the input
num.each_byte {|i| a += 1};
# only interested in the byte/characters for 0-9
(?0..?9).each {|i| print "#{.pack("c")} #{a}\n"};

I suspect the num[i..i].to_i is pretty expensive... and it's not
really needed in this case.

Cheers,
Ruben
 
R

Ralf Müller

Am Mittwoch, 16. Juni 2004 16:13 schrieb Ralf Müller:
Hi,

I made a comparison:

ram@lilith:~/src/ruby$time ruby -e 'num=STDIN.gets; a=Array.new;
(0..9).each {|i| a = 0}; (0..num.length).each {|i| a[num[i..i].to_i] +=
1 }; a.each_index {|i| print "#{i} #{a}\n"} ' < pi.out
0 16096
1 16107
2 15965
3 15915
4 15967
5 16169
6 15934
7 16091
8 15999
9 15790

real 0m1.023s
user 0m0.970s
sys 0m0.010s


and


ram@lilith:~/src/ruby$time gawk '{i=0;j=length($0);while( i <= j) {i++;
z=substr($0,i,1); {count=count+ 1; a[z]=a[z]+1; }}}END{ for ( i in a )
{print i ":" a;}print "Digits: " count;}' < pi.out | sort
0:16095
1:16107
2:15965
3:15915
4:15967
5:16169
6:15934
7:16091
8:15999
9:15790

:1

Digits: 160033

real 0m0.305s
user 0m0.310s
sys 0m0.000s


I'm really new to ruby. So I would like to know, if there is any possible
improvement of the ruby code.
pi.out contains about 160000 digits of PI and is made by "ruby sample/pi.rb
pi.out' inside the src-dir of the ruby-*.tgz


Any hints???


thanks
ralf


tried with Hash instead of Array:
ram@lilith:~/src/ruby$time ruby -e 'str=STDIN.gets; h=Hash.new(); (0..9).each
{|i| h[i.to_s] = 0}; (0..str.length-1).each {|i| h[str[i..i]] += 1 }; h.each
{|k,v| print "#{k} #{v}\n"} ' < pi.out | sort
0 16095
1 16107
2 15965
3 15915
4 15967
5 16169
6 15934
7 16091
8 15999
9 15790

real 0m0.920s
user 0m0.910s
sys 0m0.010s

gains 10%.
 
W

why the lucky stiff

Ralf said:
tried with Hash instead of Array:
ram@lilith:~/src/ruby$time ruby -e 'str=STDIN.gets; h=Hash.new(); (0..9).each
{|i| h[i.to_s] = 0}; (0..str.length-1).each {|i| h[str[i..i]] += 1 }; h.each
{|k,v| print "#{k} #{v}\n"} ' < pi.out | sort
why@stungun$ time ruby -n -e 'c={};$_.each_byte{|b|c||=0; c+=1};
c.sort.each {|k,v| puts "#{k - ?0} #{v}"}' < pi.out
0 3454
1 3514
2 3358
3 3452
4 3483
5 3510
6 3488
7 3431
8 3482
9 3464

real 0m0.052s
user 0m0.051s
sys 0m0.000s

same idea as yours. just runs 3x swift.

_why
 
R

Relm

Ralf said:
tried with Hash instead of Array:
ram@lilith:~/src/ruby$time ruby -e 'str=STDIN.gets; h=Hash.new(); (0..9).each
{|i| h[i.to_s] = 0}; (0..str.length-1).each {|i| h[str[i..i]] += 1 }; h.each
{|k,v| print "#{k} #{v}\n"} ' < pi.out | sort
why@stungun$ time ruby -n -e 'c={};$_.each_byte{|b|c||=0; c+=1};
c.sort.each {|k,v| puts "#{k - ?0} #{v}"}' < pi.out

same idea as yours. just runs 3x swift.


I typo a lot, so no command line. Maybe 150-170x speed of OP.

n = STDIN.gets
tot = 0
9.times do |i|
c = n.count(i.to_s)
tot += c
puts "#{i} #{c}"
end
puts "9 #{n.size - tot}"
 
R

Ralf Müller

11111111111111111111111111111111111111111111111111111111111111
ram@lilith:~/src/ruby$cat rec_pi.rb
n = STDIN.gets
tot = 0
9.times do |i|
c = n.count(i.to_s)
tot += c
puts "#{i} #{c}"
end
puts "9 #{n.size - tot}"

ram@lilith:~/src/ruby$time ruby rec_pi.rb < pi.out
0 16095
1 16107
2 15965
3 15915
4 15967
5 16169
6 15934
7 16091
8 15999
9 15790

real 0m0.045s
user 0m0.040s
sys 0m0.000s

22222222222222222222222222222222222222222222222222222222222222222
ram@lilith:~/src/ruby$cat rec_pi_a.rb
c={};
$_.each_byte{|b| c||=0; c+=1 };
c.sort.each {|k,v| puts "#{k - ?0} #{v}"}

ram@lilith:~/src/ruby$time ruby -n rec_pi_a.rb < pi.out
0 16095
1 16107
2 15965
3 15915
4 15967
5 16169
6 15934
7 16091
8 15999
9 15790

real 0m0.260s
user 0m0.240s
sys 0m0.010s

3333333333333333333333333333333333333333333333333333333333333333333
ram@lilith:~/src/ruby$cat rec_pi_b.rb
num = STDIN.gets;
a = Array.new(255,0);
num.each_byte {|i| a += 1};
(?0..?9).each {|i| print "#{.pack("c")} #{a}\n"};

ram@lilith:~/src/ruby$time ruby rec_pi_b.rb < pi.out
0 16095
1 16107
2 15965
3 15915
4 15967
5 16169
6 15934
7 16091
8 15999
9 15790

real 0m0.194s
user 0m0.160s
sys 0m0.010s

####################################################################

The main difference seems to be using "count" instead of "each_byte". But why
is it 5-6x faster to "count" 9.times than "each_byte" only once?
Or did i ignore anything?

ralf
(-,-)Zzz
 
R

Ruben

At Thu, 17 Jun 2004 16:48:21 +0900,
Ralf said:
11111111111111111111111111111111111111111111111111111111111111
ram@lilith:~/src/ruby$cat rec_pi.rb
n = STDIN.gets
tot = 0
9.times do |i|
c = n.count(i.to_s)
tot += c
puts "#{i} #{c}"
end
puts "9 #{n.size - tot}"

The main difference seems to be using "count" instead of "each_byte". But why
is it 5-6x faster to "count" 9.times than "each_byte" only once?
Or did i ignore anything?

The above code is pretty nice.. I didn't know about the
'count'-method. The reason this is faster, is because the 'big loop',
the scan over the string, is in this code done internally, in the C
implementation of the count method. The other code has the loop
explicitly in ruby-code, which is slower.

The difference however is that the above code does the scan over the
String several times (# of chars to check - 1) in C, while all the other
code scans the String only once in ruby.

Probably there is a trade-off... maybe if you would wanna count all
occurences of [a-zA-Z0-9], then you would have to do the
'count'-method 57 times (57 because the above code assumes that only
the characters you count appear in the String), effectively scanning
the String 57 times. This might be slower than scanning the String
only once in ruby-code. (didn't test this though...)

Ruben
 
R

Relm

The difference however is that the above code does the scan over the
String several times (# of chars to check - 1) in C, while all the other
code scans the String only once in ruby.

Probably there is a trade-off... maybe if you would wanna count all
occurences of [a-zA-Z0-9], then you would have to do the
'count'-method 57 times (57 because the above code assumes that only
the characters you count appear in the String), effectively scanning
the String 57 times. This might be slower than scanning the String
only once in ruby-code. (didn't test this though...)

There is a trade-off, but you may need to throw in some kanji or unicode
to get there. With just [a-zA-Z0-9], a single scan in Ruby is still
slower. Here are some timings:

user system total real
array: 0.880000 0.080000 0.960000 ( 0.958946)
hash: 1.210000 0.070000 1.280000 ( 1.330615)
count: 0.100000 0.000000 0.100000 ( 0.091805)
delete: 0.030000 0.020000 0.050000 ( 0.051760)

Both 'array' and 'hash' use #each_byte (M iterations in Ruby code),
'count' uses #count repeatedly (62*M iterations in C code), and 'delete'
uses #delete to partition the string into 5 substrings and #delete! to
count characters (5*M + 62*M/5 = about 17*M iterations in C code).

And the benchmark code:

#!/usr/bin/ruby -w

str = ['a'..'z','A'..'Z',0..9].map {|x| x.to_a}.flatten.join * 10000

require 'benchmark'

Benchmark.bm(8) do |bm|
bm.report('array:') do
a = Array.new(255,0);
str.each_byte {|i| a += 1};
[?0..?9,?a..?z,?A..?Z].map {|x| x.to_a}.flatten.each {|i|
"#{.pack("c")} #{a}"
};
end
bm.report('hash:') do
c={}
str.each_byte{|b|c||=0; c+=1}
c.sort.each {|k,v| "#{k.chr} #{v}"}
end
bm.report('count:') do
['0'..'9','a'..'z','A'..'Z'].map {|x| x.to_a}.flatten.each do |i|
"#{i} #{str.count(i)}"
end
end
bm.report('delete:') do
['0'..'9','a'..'m','n'..'z','A'..'M','N'..'Z'].each do |range|
s = str.delete("^#{range.first}-#{range.last}")
range.each do |i|
n = s.size
s.delete!(i)
"#{i} #{n - s.size}"
end
end
end
end

__END__
 
R

Ralf Müller

Am Donnerstag, 17. Juni 2004 00:00 schrieb Relm:
Ralf said:
tried with Hash instead of Array:
ram@lilith:~/src/ruby$time ruby -e 'str=STDIN.gets; h=Hash.new();
(0..9).each {|i| h[i.to_s] = 0}; (0..str.length-1).each {|i|
h[str[i..i]] += 1 }; h.each {|k,v| print "#{k} #{v}\n"} ' < pi.out |
sort

why@stungun$ time ruby -n -e 'c={};$_.each_byte{|b|c||=0; c+=1};
c.sort.each {|k,v| puts "#{k - ?0} #{v}"}' < pi.out

same idea as yours. just runs 3x swift.


I typo a lot, so no command line. Maybe 150-170x speed of OP.

n = STDIN.gets
tot = 0
9.times do |i|
c = n.count(i.to_s)
tot += c
puts "#{i} #{c}"
end
puts "9 #{n.size - tot}"


I tried this

ram@lilith:~/src/ruby$cat trcount-simple.rb
#!/usr/local/bin/ruby
#num=STDIN.gets
num=STDIN.gets
# prepare an array
a = Array.new;
(0..9).each {|i| a = 0}
# for num in an single line
#(0..num.length).each {|n| a[num[n..n].to_i] += 1 }
(0..num.length-1).each {|n| a[num[n..n].to_i] += 1 }
# print the result
a.each_index {|index| print "#{index} #{a[index]}\n" }
ram@lilith:~/src/ruby$time ruby trcount-simple.rb < pi.out
0 16095
1 16107
2 15965
3 15915
4 15967
5 16169
6 15934
7 16091
8 15999
9 15790

real 0m0.964s
user 0m0.950s
sys 0m0.010s
ram@lilith:~/src/ruby$time ruby -e 'num=STDIN.gets; a=Array.new; (0..9).each
{|i| a = 0
}; (0..num.length).each {|i| a[num[i..i].to_i] += 1 }; a.each_index {|i| print
"#{i} #{a[i
]}\n"} ' < pi.out
0 16096
1 16107
2 15965
3 15915
4 15967
5 16169
6 15934
7 16091
8 15999
9 15790

real 0m0.955s
user 0m0.940s
sys 0m0.000s

It does not seem to make any difference if the source i given at the command
line or in a separate file:
ram@lilith:~/src/ruby$cat rec_pi.rb
n = STDIN.gets
tot = 0
9.times do |i|
c = n.count(i.to_s)
tot += c
puts "#{i} #{c}"
end
puts "9 #{n.size - tot}"
ram@lilith:~/src/ruby$time ruby rec_pi.rb < pi.out
0 16095
1 16107
2 15965
3 15915
4 15967
5 16169
6 15934
7 16091
8 15999
9 15790

real 0m0.048s
user 0m0.040s
sys 0m0.010s
ram@lilith:~/src/ruby$time ruby -e 'n = STDIN.gets; tot = 0; 9.times do |i| c
= n.count(i.to_s); tot += c; puts "#{i} #{c}"; end; puts "9 #{n.size - tot}"'
<pi.out
0 16095
1 16107
2 15965
3 15915
4 15967
5 16169
6 15934
7 16091
8 15999
9 15790

real 0m0.043s
user 0m0.010s
sys 0m0.030s

But maybe, you're right and these conditions do just not lead to that
difference. Under which circumstances does the difference occur?

regards
ralf
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,146
Messages
2,570,832
Members
47,375
Latest member
FelishaCma

Latest Threads

Top