1.9 significantly slower than 1.8 on Mac

D

Derek Chesterfield

I know 1.9.0 is still a work in progress, but I was expecting to see
some benefit from YARV already. However, this simple script showing a
typical bit of logfile processing takes over 70% LONGER under Ruby 1.9
$ cat logreport2.rb
require 'benchmark'
include Benchmark

logfile=<<END_OF_STRING
23:59:16 drop 10.14.241.252 >eth2c1 rule: 1015; rule_uid:
{6AADF426-0D0C-4C20-A027-06A6DC8C6CE2}; src: 172.25.20.79; dst:
10.14.65.137; proto: tcp; product: VPN-1 & FireWall-1; service:
lotus; s_port: 57150;
23:59:15 drop 10.14.241.251 >eth2c1 rule: 1015; rule_uid:
{6AADF426-0D0C-4C20-A027-06A6DC8C6CE2}; src: 172.25.40.88; dst:
10.14.65.37; proto: tcp; product: VPN-1 & FireWall-1; service:
TCP1801; s_port: 24511;
23:59:15 drop 10.14.241.251 >eth1c0 rule: 1015; rule_uid:
{6AADF426-0D0C-4C20-A027-06A6DC8C6CE2}; src: 10.14.163.59; dst:
10.24.0.2; proto: udp; product: VPN-1 & FireWall-1; service: domain-
udp; s_port: udp1025-1027;
END_OF_STRING

bm(12) do |test|
test.report('Simple:') do
500000.times do
src=Hash.new(0)
logfile.each_line do |line|
if (m=line.match /src: (.*?);/)
src[m[1]]= src[m[1]] +1
end
end
end
end
end

$ ruby -v
ruby 1.8.6 (2007-09-24 patchlevel 111) [universal-darwin9.0]

$ ruby logreport2.rb
user system total real
Simple: 7.960000 0.000000 7.960000 ( 7.965567)

$ ruby19 -v
ruby 1.9.0 (2007-12-25 revision 14709) [i686-darwin9.1.0]

$ ruby19 logreport2.rb
 
M

Marcin Mielżyński

Derek Chesterfield pisze:
$ ruby -v
ruby 1.8.6 (2007-09-24 patchlevel 111) [universal-darwin9.0]

$ ruby logreport2.rb
user system total real
Simple: 7.960000 0.000000 7.960000 ( 7.965567)

$ ruby19 -v
ruby 1.9.0 (2007-12-25 revision 14709) [i686-darwin9.1.0]

$ ruby19 logreport2.rb
user system total real
Simple: 13.750000 0.010000 13.760000 ( 13.754742)

Ruby 1.9 uses Oniguruma which is an encoding agnostic engine.
e.g: /src: (.*?);/ compiles down to:

[exact5:src: ]
[mem-start:1]
[jump:(1)]
[anychar]
[push:(-6)]
[mem-end:1]
[exact1:;]
[end]

seemingly simple match:

/src: (.*?);/ =~ "src: 172.25.20.79;"

turns out to:

0> "src: 17..." [exact5:src: ]
5> "172.25...." [mem-start:1]
5> "172.25...." [jump:(1)]
5> "172.25...." [push:(-6)]
5> "172.25...." [mem-end:1]
5> "172.25...." [exact1:;]
5> "172.25...." [anychar]
6> "72.25.2..." [push:(-6)]
6> "72.25.2..." [mem-end:1]
6> "72.25.2..." [exact1:;]
6> "72.25.2..." [anychar]
7> "2.25.20..." [push:(-6)]
7> "2.25.20..." [mem-end:1]
7> "2.25.20..." [exact1:;]
7> "2.25.20..." [anychar]
8> ".25.20...." [push:(-6)]
8> ".25.20...." [mem-end:1]
8> ".25.20...." [exact1:;]
8> ".25.20...." [anychar]
9> "25.20.7..." [push:(-6)]
9> "25.20.7..." [mem-end:1]
9> "25.20.7..." [exact1:;]
9> "25.20.7..." [anychar]
10> "5.20.79..." [push:(-6)]
10> "5.20.79..." [mem-end:1]
10> "5.20.79..." [exact1:;]
10> "5.20.79..." [anychar]
11> ".20.79;" [push:(-6)]
11> ".20.79;" [mem-end:1]
11> ".20.79;" [exact1:;]
11> ".20.79;" [anychar]
12> "20.79;" [push:(-6)]
12> "20.79;" [mem-end:1]
12> "20.79;" [exact1:;]
12> "20.79;" [anychar]
13> "0.79;" [push:(-6)]
13> "0.79;" [mem-end:1]
13> "0.79;" [exact1:;]
13> "0.79;" [anychar]
14> ".79;" [push:(-6)]
14> ".79;" [mem-end:1]
14> ".79;" [exact1:;]
14> ".79;" [anychar]
15> "79;" [push:(-6)]
15> "79;" [mem-end:1]
15> "79;" [exact1:;]
15> "79;" [anychar]
16> "9;" [push:(-6)]
16> "9;" [mem-end:1]
16> "9;" [exact1:;]
16> "9;" [anychar]
17> ";" [push:(-6)]
17> ";" [mem-end:1]
17> ";" [exact1:;]
18> "" [end]
match at 0


The bottleneck here is the anychar (OP_ANYCHAR in regexec.c) instruction
which uses 'enclen' and 'is_mbc_newline' (which are function pointers,
so impossible to be inlined).

Joni goes a bit further and uses a bunch of singlebyte specialized
opcodes when it detects singlebyte encoding being used.



lopex
 
C

Charles Oliver Nutter

Marcin said:
The bottleneck here is the anychar (OP_ANYCHAR in regexec.c) instruction
which uses 'enclen' and 'is_mbc_newline' (which are function pointers,
so impossible to be inlined).

Joni goes a bit further and uses a bunch of singlebyte specialized
opcodes when it detects singlebyte encoding being used.

FYI, for those not following it, Joni is the new regular expression
engine for JRuby. It is a port of Oniguruma, the regexp engine for Ruby 1.9.

- Charlie
 
D

Derek Chesterfield

Ruby 1.9 uses Oniguruma which is an encoding agnostic engine.

So Ruby 1.9.0's Regexp is slower than 1.8's?
Damn - All my Ruby scripts make extensive use of Regexp.
Will this always be the case, or might 1.9.1 be worth me upgrading to?=
 
T

Thomas Hurst

* Derek Chesterfield ([email protected]) said:
So Ruby 1.9.0's Regexp is slower than 1.8's?

In some cases, probably; it does quite a bit more, not just in encoding
support, but more regexp features. e.g, from a quick glance through
http://www.geocities.jp/kosako3/oniguruma/doc/RE.txt :

Zero-width lookbehind assertions (/(?<=foo)bar/ =~ 'foobar' # $& => "bar")

Named captures (/(?<foo>bar))/ =~ 'foobar' # $~[:foo] => "bar")

Fancier backreferences (/(?<foo>foo|bar)\k<foo>/ =~ "foobar" # => nil,
.. =~ 'foofoo' # $& => "foofoo")

The ability to call subexpressions
(/(?<foo>foo|bar)\g<foo>/ =~ "foobar" # $& => "foobar", $2 => bar)

You might consider installing a separate Ruby 1.8 with Oniguruma patched
in, or installing the Oniguruma gem, so you can test the performance
without bringing YARV into the equation.
Damn - All my Ruby scripts make extensive use of Regexp. Will this
always be the case, or might 1.9.1 be worth me upgrading to?

Oniguruma's pretty mature by now, I wouldn't expect the general
performance profile to change that significantly, although the recent
port to Java might trigger some changes.

I wouldn't be too worried; I've used it for well over a year and not
really noticed any significant performance issues. In the worst case,
alternative engines could always be supported through extensions.
 
M

Marcin Mielżyński

Thomas Hurst pisze:
In some cases, probably; it does quite a bit more, not just in encoding
support, but more regexp features. e.g, from a quick glance through
http://www.geocities.jp/kosako3/oniguruma/doc/RE.txt :

When those features are not used they don't affect performance at all
(for matching). The main reasons of bottlenecks are the encoding
agnostic functions and main switch execution instead of inner loops in
some cases.

Oniguruma's pretty mature by now, I wouldn't expect the general
performance profile to change that significantly, although the recent
port to Java might trigger some changes.

Joni jvm bytecode compiler is on the way (now it uses int[] array for
the compiled code), it should greatly surpass Oniguruma performance once
it's finished (it already does for some patterns even without the asm
compiler).

lopex
 
D

Derek Chesterfield

I wouldn't be too worried; I've used it for well over a year and not
really noticed any significant performance issues. In the worst case,
alternative engines could always be supported through extensions.

In your usage maybe. But in my experiments with 1.9, ALL my scripts
[all of them using Regexp extensively to parse files] are a *minimum*
of 40% slower, and the simple script in my original post takes 70%
more time to run.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,999
Messages
2,570,243
Members
46,836
Latest member
login dogas

Latest Threads

Top