S
Simon Mullis
Hi All,
Let's get down to it...
I have a long string of the form:
string = <<-EOVAR
XD 1 * 100000436 3441863 1550663 1161254 951982
XD 1 479903531056 47988002622 21360568539 18276299303 15476234490
XD 1 66934 5552 321640438 40297830 0
XD 1 0 3235 2197 10907 1631621
XD 1 15488078 210564267 574075997 2405132745 7805716381
XD 1 0 4949 0 58361 0
(goes for about 17 lines, all separated by \n)
<<EOVAR
I'm building a regex for this string and it's pretty straightforward.
Only prerequisite is to capture all numbers for later Ruby fun:
regex = %r{XD\s2\s\*\s(\d+)\s+(\d+)\s+(\d+)\s+(\d+)\s+(\d+)\n ...etc... }mx
I would like to pare it down a bit, using term binding:
regex = %r{XD 1 \* (\d+\s+){5}\n ...etc...}mx
If I do this then only the last group is captured
pp var.scan(regex)
[["951982\n"]]
If this worked, I could shorten it much much more.. all of the lines
after the first one have exactly the same format and I need to capture
all of the variables.
mother_of_all_regexen = %r{XD\s1\s\*\s((\d+\s+){5})\n(XD\s1
(\d+\s+){5})){17} }mx
or something
So,
- Can I use capture groups and term binding?
- Why am I only capturing the last term?
- Should I just stop trying to be clever and explicitly match against
all parts of the string?
The reason I want to do this as a single regex is that I've written a
framework that grabs files, monkeys around with them and then applies
a rule-set from a YAML file to create output. For each "signature" in
the YAML file one can choose a defined action (match, count, compare
etc) which relate to methods in the main code. This allows the editor
of the YAML to add signatures etc to their hearts desire... And more
importantly, it means that I won't have to maintain the ruleset.
(woohoo!)
Thanks in advance for any suggestion
SM
Let's get down to it...
I have a long string of the form:
string = <<-EOVAR
XD 1 * 100000436 3441863 1550663 1161254 951982
XD 1 479903531056 47988002622 21360568539 18276299303 15476234490
XD 1 66934 5552 321640438 40297830 0
XD 1 0 3235 2197 10907 1631621
XD 1 15488078 210564267 574075997 2405132745 7805716381
XD 1 0 4949 0 58361 0
(goes for about 17 lines, all separated by \n)
<<EOVAR
I'm building a regex for this string and it's pretty straightforward.
Only prerequisite is to capture all numbers for later Ruby fun:
regex = %r{XD\s2\s\*\s(\d+)\s+(\d+)\s+(\d+)\s+(\d+)\s+(\d+)\n ...etc... }mx
I would like to pare it down a bit, using term binding:
regex = %r{XD 1 \* (\d+\s+){5}\n ...etc...}mx
If I do this then only the last group is captured
pp var.scan(regex)
[["951982\n"]]
If this worked, I could shorten it much much more.. all of the lines
after the first one have exactly the same format and I need to capture
all of the variables.
mother_of_all_regexen = %r{XD\s1\s\*\s((\d+\s+){5})\n(XD\s1
(\d+\s+){5})){17} }mx
or something
So,
- Can I use capture groups and term binding?
- Why am I only capturing the last term?
- Should I just stop trying to be clever and explicitly match against
all parts of the string?
The reason I want to do this as a single regex is that I've written a
framework that grabs files, monkeys around with them and then applies
a rule-set from a YAML file to create output. For each "signature" in
the YAML file one can choose a defined action (match, count, compare
etc) which relate to methods in the main code. This allows the editor
of the YAML to add signatures etc to their hearts desire... And more
importantly, it means that I won't have to maintain the ruleset.
(woohoo!)
Thanks in advance for any suggestion
SM