Michal said:
Hello
I tried scanning for multiple occurences of a group in a string and
match/scan would return only one.
"ajabcabck".match /^a*j(?:b*(a+)b+c*)+k$/
=> #<MatchData "ajabcabck" 1:"a">
"ajabcabck".scan /^a*j(?:b*(a+)b+c*)+k$/
=> [["a"]]
clearly the a+ group must match twice to match the string from ^ to $
but only single match is returned.
It is possible to use split instead but using a single match would be
much nicer.
I would only use #split if you really want to split the string.
Otherwise please see below.
Any workaround?
ruby 1.8.7 (2010-01-10 patchlevel 249) [i486-linux]
as far as i know, nested groups are not allowed. regular expressions
do not form a language.
Nested groups *are* allowed. However, one must understand how group
matching works: for each matching group only at most *one* capture is
recorded:
irb(main):001:0> s="abaab"
=> "abaab"
irb(main):002:0> /(?
a+)b)+/.match s
=> #<MatchData "abaab" 1:"aa">
irb(main):003:0> md = /(?
a+)b)+/.match s
=> #<MatchData "abaab" 1:"aa">
irb(main):004:0> md.to_a
=> ["abaab", "aa"]
irb(main):005:0> md[1]
=> "aa"
irb(main):006:0>
As you can see from this 1.9.1 test, it is the *last* match. I cannot
provide an official rationale for this, but one likely reason: The
memory overhead for storing arbitrary amount of matches per group can
be significant. Also, the number of groups is known at compile time
of a regular expression while the number of matches of each group is
only known at match time. This makes it easier to allocate the memory
needed for storing a single capture per group because it can be done
when the regular expression is compiled. Please also note that all
regular expression engines I know handle it that way, i.e. you get at
most one capture per group.
In those cases I usually employ a two level approach:
irb(main):015:0> s = "ajabcaabck"
=> "ajabcaabck"
irb(main):016:0> if /^a*j((?:b*a+b+c*)+)k$/ =~ s
irb(main):017:1> $1.scan(/b*(a+)b+c*/){|m| p m, $1}
irb(main):018:1> end
["a"]
"a"
["aa"]
"aa"
=> "abcaabc"
irb(main):019:0>
Because of the way how #scan works we can do:
irb(main):022:0> if /^a*j((?:b*a+b+c*)+)k$/ =~ s
irb(main):023:1> $1.scan(/b*(a+)b+c*/){|m| p m}
irb(main):024:1> end
["a"]
["aa"]
=> "abcaabc"
irb(main):025:0>