ako... said:
thank you. yes, it seems to be the only way. just that it is a shame
that we have to match the same expression again! the information was
available already, it was just discarded during the first match in
your sample.
I still didn't get what exactly you want. Does this help?
=3D> ["a", "b", "c"]
Now that I've read the responses in this thread a few times, I think
I understand what he wants to do. And I don't think it can be done
via scan.
First: He wants a single regex which will verify the syntax of an
entire line. So, first he wants a true/false value, saying "The line
is valid, or it is not valid". Never mind any values in the line, just
"is the line *completely valid*?".
Then, if the line is valid, he wants to break out individual pieces
of what was scanned, and he wants to do that without re-doing
any of the scans he did in the first regex. The trick is that some
of those pieces are a repeating group, such as /(\s\w)*/.
What is confusing us is that he describes this using a simple
example, and when we solve the simple example he then says
"you don't get the bigger picture!". Ugh.
Let me give an example, and see if someone can solve it. My
example might still be something other than what he's thinking
of, but maybe it will help.
Let's say I'm expecting command lines of the form:
first word is either 'copy' or 'duplicate'
followed by one or more words
followed by the word 'before' or 'after'
followed by one or more words
So I could do the first step with the regexp:
/^(copy|duplicate) \s+ (\w+\s+)+ (before|after) \s+ (\w+\s*)+ $/x
(hopefully I've done that right!). *IF* that matches, then I know
the entire line is valid. Then, after I know the line is valid, I want
the array of source-words, and the array of destination-words
which were matched. I want to do that by picking out information
in Matchdata, not by doing a new scan. The thing is, I don't think
I have a way of knowing how many times the first '(\d+\s+)+' was
matched. So I can't just do a slice of $~.captures because I don't
know what the starting and ending indexes of that slice would be.
I could put another set of parenthesis around the two repeating
groups:
/^(copy|duplicate) \s+ ((\w+\s+)+) (before|after) \s+ ((\w+\s*)+) $/x
But that doesn't really give me two separate arrays of the
individual values that made up each group. It just matches
each group as a whole.
Given two data lines of:
copy apple pear plum peach after bill bob
duplicate tomato before joe alice alfred tommy jane
in the first case I want a way to set two arrays:
srcfood =3D ["apple ", "pear ", "plum ", "peach "]
destword =3D ["bill ", "bob"]
from the first line, and
srcfood =3D ["tomato "]
destword =3D ["joe ", "alice", "alfred ", "tommy ", "jane"]
from the second line.
I'll agree this is a weird example, but I think it shows the issue.
If I apply the above pattern to the first line, I'll see a Matchdata
result where:
$~.captures =3D=3D
["copy", "apple pear plum peach ", "peach ", "after", "bill bob", "bob"]
Notice: There isn't *any* element which contains a value of just "apple ",
or just "pear ", or just "plum ", even though the regex obviously had to
match each one of those.