Rexexp Parsing final group of string

D

dkmd_nielsen

I'm attempting to parse parameter controls that are in the following
format:

keyword = char(lg) {| | block}

where (lg) and {| | block} are optional parameters. Doesn't seem
difficult. However, I'm having problems capturing the last block.
Some setups catch the block; and some do not. The ones that do not are
the ones that would indicate it is optional.

The following are my test runs. The first three capture the block
within ellipses. Note: I'm using a derivative of the show_regexp
function provided in Dave Thomas' book. The 1/2/3/4 are $1, $2, $3,
and $4. The original expression identified four blocks of code.

=+= The following work fine =+=
======================= Must be there
'priority = h(5) {}', /(\{.*\}[\s,]*)/
priority = h(5) <<{}>>
1:({}) 2:() 3:() 4:()
=======================
======================= Must be at end of string
'priority = h(5) {}', /(\{.*\}[\s,]*)$/
priority = h(5) <<{}>>
1:({}) 2:() 3:() 4:()
=======================
======================= There must be one
'priority = h(5) {}', /(\{.*\}[\s,]*){1}/
priority = h(5) <<{}>>
1:({}) 2:() 3:() 4:()
=======================


These last three fail. These are the ones that say (or I think they
say) "the block may or may not be there." But they fail to identify
the existing block.


=+= The following fail. =+=
======================= Can be zero or more
'priority = h(5) {}', /(\{.*\}[\s,]*)*/
<<>>priority = h(5) {}
1:() 2:() 3:() 4:()
=======================
======================= At least zero, but not more than one
'priority = h(5) {}', /(\{.*\}[\s,]*){0,1}/
<<>>priority = h(5) {}
1:() 2:() 3:() 4:()
=======================
======================= Zero or one occurrence (one I would like to
use)
'priority = h(5) {}', /(\{.*\}[\s,]*)?/
<<>>priority = h(5) {}
1:() 2:() 3:() 4:()
=======================


The entire expression that I thought would work is this:

Regexp.new('(\w+) *= *([A-Za-z]{1,2})(\(\d{1,2}\))*(\{.+\}[\s,]*)*,*')

Thanks for your time and consideration.
dvn
 
R

Robert Klemme

I'm attempting to parse parameter controls that are in the following
format:

keyword = char(lg) {| | block}

where (lg) and {| | block} are optional parameters. Doesn't seem
difficult. However, I'm having problems capturing the last block.
Some setups catch the block; and some do not. The ones that do not are
the ones that would indicate it is optional.

The following are my test runs. The first three capture the block
within ellipses. Note: I'm using a derivative of the show_regexp
function provided in Dave Thomas' book. The 1/2/3/4 are $1, $2, $3,
and $4. The original expression identified four blocks of code.

=+= The following work fine =+=
======================= Must be there
'priority = h(5) {}', /(\{.*\}[\s,]*)/
priority = h(5) <<{}>>
1:({}) 2:() 3:() 4:()
=======================
======================= Must be at end of string
'priority = h(5) {}', /(\{.*\}[\s,]*)$/
priority = h(5) <<{}>>
1:({}) 2:() 3:() 4:()
=======================
======================= There must be one
'priority = h(5) {}', /(\{.*\}[\s,]*){1}/
priority = h(5) <<{}>>
1:({}) 2:() 3:() 4:()
=======================


These last three fail. These are the ones that say (or I think they
say) "the block may or may not be there." But they fail to identify
the existing block.


=+= The following fail. =+=
======================= Can be zero or more
'priority = h(5) {}', /(\{.*\}[\s,]*)*/
<<>>priority = h(5) {}
1:() 2:() 3:() 4:()
=======================
======================= At least zero, but not more than one
'priority = h(5) {}', /(\{.*\}[\s,]*){0,1}/
<<>>priority = h(5) {}
1:() 2:() 3:() 4:()
=======================
======================= Zero or one occurrence (one I would like to
use)
'priority = h(5) {}', /(\{.*\}[\s,]*)?/
<<>>priority = h(5) {}
1:() 2:() 3:() 4:()
=======================


The entire expression that I thought would work is this:

Regexp.new('(\w+) *= *([A-Za-z]{1,2})(\(\d{1,2}\))*(\{.+\}[\s,]*)*,*')

First hint, don't use the string constructor in this case - you make
your life harder than necessary. Rather use a literal regexp.

Here's what I'd do:

irb(main):001:0> s='priority = h(5) {}'
=> "priority = h(5) {}"

irb(main):008:0>
%r{(\w+)\s*=\s*(\w+)\s*(?:(\([^)]+\))\s*)?(\{[^\}]*\})?}.match(s).to_a
=> ["priority = h(5) {}", "priority", "h", "(5)", "{}"]
irb(main):009:0>
%r{(\w+)\s*=\s*(\w+)\s*(?:(\([^)]+\))\s*)?(\{[^\}]*\})?}.match("a=b").to_a
=> ["a=b", "a", "b", nil, nil]
irb(main):010:0>
%r{(\w+)\s*=\s*(\w+)\s*(?:(\([^)]+\))\s*)?(\{[^\}]*\})?}.match("a=b(c)").to_a
=> ["a=b(c)", "a", "b", "(c)", nil]
irb(main):011:0>
%r{(\w+)\s*=\s*(\w+)\s*(?:(\([^)]+\))\s*)?(\{[^\}]*\})?}.match("a=b
{}").to_a
=> ["a=b {}", "a", "b", nil, "{}"]

You can as well use the form with whitespace and comments to make it
more clear:

%r<
(\w+) # first token
\s*
=
\s*
(\w+) # second token
\s*
(?:(\([^)]+\))\s*)? # optional parens with trailing WS
(\{[^}]*\})? # optional block

Kind regards

robert
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,207
Messages
2,571,078
Members
47,681
Latest member
hanrywillsonnn

Latest Threads

Top