when /^([A-Za-z0-9,]+), '([^']+)', '([^']+)', '([^']+)'/
* Use .+? instead of [^']+
.+? does a non-greedy match, which is what you're really trying to say
with the [^']+
Actually, no, the way he had it is better. Check this out:
http://perlmonks.org/?node=Death to Dot Star!
Using /.+?/ (which is really equivalent to /..*?/ and thus in the same
camp as the article) can be incorrect and also slower. In this case --
I think -- the '.+?' would yield correct results since there's a one
character terminator, but the speed is still an issue. For a simple
string and pair of regexes, it's not much:
$ cat regex-bm.rb
require 'benchmark'
TIMES = 10_000_000
REGEX1 = /'([^']+)'/
REGEX2 = /'(.+?)'/
STRING = "'Woah,' John said, 'there're multiple quotes!'"
Benchmark.bmbm do |x|
x.report("[^']+") { TIMES.times{ STRING =~ REGEX1 } }
x.report(".+?") { TIMES.times{ STRING =~ REGEX2 } }
end
$ ruby -v regex-bm.rb
ruby 1.8.4 (2005-12-24) [powerpc-darwin7.9.0]
Rehearsal -----------------------------------------
[^']+ 24.130000 0.050000 24.180000 ( 24.261712)
.+? 25.490000 0.040000 25.530000 ( 25.799146)
------------------------------- total: 49.710000sec
user system total real
[^']+ 24.160000 0.070000 24.230000 ( 24.729463)
.+? 25.410000 0.060000 25.470000 ( 25.572987)
But it is present. And it gets worse as both the regex being used and
the string being matched get more complex. For a simple case like
Giles, I wouldn't worry too much about the performance difference
between .+? and [^']+, and the correctness is fine. So you can use .+?
-- it *is* more readable. But it also hides subtleties, which raises
flags for me.
Jacob Fugal