regex dynamic count modifier {min, max} ?

J

jOhn

[Note: parts of this message were removed to make it a legal post.]

Here is an idea and tell me if it could be accomplished by some other means.


To parse a logic statement like this:

fn:function3(fn:function2(fn:function1(xargs)))

Using a regex sorta like so to grep the function-start-pattern(s) :

/\A((fn\:[\w\-]+)[ ]*\([ ]*)+\z/i

It would be nice to ensure the proper count of ')', without confusion if say
the xargs had ')' literal or escaped string value(s) in there.

One way is to provide a count ref for function-start-pattern, so I could
then group a pattern for match on post-xargs ')' and force the {min,max}
count by some backref to count-of-(function-start-pattern) =X and put that
in there for the (function-end-pattern)+{X,X}.

Then it might be something like this (ignore the lack of a match on possible
xargs for now) :

/\A((fn\:[\w\-]+)[ ]*\([ ]*)+[ ]*(\)){$#1,$#1}\z/i

Where $#1 would be the count ref of the first group etc. Then there would be
matching count-left-side-( and count-right-side-).

Or I don't understand enuf about the internals of regex to know that this is
impossible.

-ntcm
 
T

t3chn0n3rd

[Note:  parts of this message were removed to make it a legal post.]

Here is an idea and tell me if it could be accomplished by some other means.

To parse a logic statement like this:

fn:function3(fn:function2(fn:function1(xargs)))

Using a regex sorta like so to grep the function-start-pattern(s) :

/\A((fn\:[\w\-]+)[ ]*\([ ]*)+\z/i

It would be nice to ensure the proper count of ')', without confusion if say
the xargs had ')' literal or escaped string value(s) in there.

One way is to provide a count ref for function-start-pattern, so I could
then group a pattern for match on post-xargs ')' and force the {min,max}
count by some backref to count-of-(function-start-pattern) =X and put that
in there for the (function-end-pattern)+{X,X}.

Then it might be something like this (ignore the lack of a match on possible
xargs for now) :

/\A((fn\:[\w\-]+)[ ]*\([ ]*)+[ ]*(\)){$#1,$#1}\z/i

Where $#1 would be the count ref of the first group etc. Then there would be
matching count-left-side-( and count-right-side-).

Or I don't understand enuf about the internals of regex to know that this is
impossible.

-ntcm

Could you use the awk statement to further parse here?
 
R

Robert Klemme

2008/2/8 said:
Here is an idea and tell me if it could be accomplished by some other means.


To parse a logic statement like this:

fn:function3(fn:function2(fn:function1(xargs)))

Using a regex sorta like so to grep the function-start-pattern(s) :

/\A((fn\:[\w\-]+)[ ]*\([ ]*)+\z/i

It would be nice to ensure the proper count of ')', without confusion if say
the xargs had ')' literal or escaped string value(s) in there.

One way is to provide a count ref for function-start-pattern, so I could
then group a pattern for match on post-xargs ')' and force the {min,max}
count by some backref to count-of-(function-start-pattern) =X and put that
in there for the (function-end-pattern)+{X,X}.

Then it might be something like this (ignore the lack of a match on possible
xargs for now) :

/\A((fn\:[\w\-]+)[ ]*\([ ]*)+[ ]*(\)){$#1,$#1}\z/i

Where $#1 would be the count ref of the first group etc. Then there would be
matching count-left-side-( and count-right-side-).

Or I don't understand enuf about the internals of regex to know that this is
impossible.

Parsing nested structures is not possible with standard regular
expressions. IIRC they added something to Perl regexps to do that and
it may be possible with Ruby 1.9; but I do not know the 1.9 regexp
engine good enough to answer that off the top of my head.

So the usual approach is to use a context free grammar and parser.
You can find parser generators in the RAA.

If you just want to ensure counts match you could do something like this:

raise "brackets do not match!" if
str.scan(/\(/).size != str.scan(/\)/).size

However, this does not ensure proper nesting. I bit more sophisticated:

c = 0
str.scan /[()]/ do |m|
case m
when "("
c += 1
when ")"
c -= 1
raise "Mismatch at '#$`'" if c < 0
else
raise "Programming error"
end
end
raise "Mismatch" unless c == 0

But now you get pretty close to a decent parser. :)

Kind regards

robert
 
T

tho_mica_l

To parse a logic statement like this:
fn:function3(fn:function2(fn:function1(xargs)))

Are you looking for something like this (ruby19):

def get_fns(string, count=0)
m = /(?<fn>
fn:[\w\-]+
\s*\(\s*
(\g<fn>|[^)]*)
\s*\)
)\s*/xi.match(string)
if m
n = /^(fn:[\w\-]+)(\s*)\((.*?)\)$/.match(m['fn'])
puts "FN #{count}: #{n[1]} with args #{n[3]}"
get_fns(n[3], count + 1)
end
end
a = "foo fn:function3(fn:function2(fn:function1(xargs))) foo(bar) bar"
get_fns(a)
a = "foo fn:function3(fn:function2(fn:function1(xargs))) foo(bar) bar"
get_fns(a)
FN 0: fn:function3 with args fn:function2(fn:function1(xargs))
FN 1: fn:function2 with args fn:function1(xargs)
FN 2: fn:function1 with args xargs
=> nil

Regards,
Thomas.
 
J

jOhn

[Note: parts of this message were removed to make it a legal post.]

wow good job thomas.

To parse a logic statement like this:

fn:function3(fn:function2(fn:function1(xargs)))

Are you looking for something like this (ruby19):

def get_fns(string, count=0)
m = /(?<fn>
fn:[\w\-]+
\s*\(\s*
(\g<fn>|[^)]*)
\s*\)
)\s*/xi.match(string)
if m
n = /^(fn:[\w\-]+)(\s*)\((.*?)\)$/.match(m['fn'])
puts "FN #{count}: #{n[1]} with args #{n[3]}"
get_fns(n[3], count + 1)
end
end
a = "foo fn:function3(fn:function2(fn:function1(xargs))) foo(bar) bar"
get_fns(a)
a = "foo fn:function3(fn:function2(fn:function1(xargs))) foo(bar) bar"
get_fns(a)
FN 0: fn:function3 with args fn:function2(fn:function1(xargs))
FN 1: fn:function2 with args fn:function1(xargs)
FN 2: fn:function1 with args xargs
=> nil

Regards,
Thomas.
 
J

jOhn

[Note: parts of this message were removed to make it a legal post.]

I modified slightly to avoid parenthesis within quotes or double quotes

def get_fns(string, count=0)
m = /(?<fn>
fn:[\w\-]+
\s*\(\s*
(\g<fn>|(".+")*('.+')*[^)]*)
\s*\)
)\s*/xi.match(string)
if m
n = /^(fn:[\w\-]+)(\s*)\((.*?)\)$/.match(m['fn'])
puts "FN #{count}: #{n[1]} with args #{n[3]}"
get_fns(n[3], count + 1)
end
end


wow good job thomas.

To parse a logic statement like this:

fn:function3(fn:function2(fn:function1(xargs)))

Are you looking for something like this (ruby19):

def get_fns(string, count=0)
m = /(?<fn>
fn:[\w\-]+
\s*\(\s*
(\g<fn>|[^)]*)
\s*\)
)\s*/xi.match(string)
if m
n = /^(fn:[\w\-]+)(\s*)\((.*?)\)$/.match(m['fn'])
puts "FN #{count}: #{n[1]} with args #{n[3]}"
get_fns(n[3], count + 1)
end
end
a = "foo fn:function3(fn:function2(fn:function1(xargs))) foo(bar) bar"
get_fns(a)
a = "foo fn:function3(fn:function2(fn:function1(xargs))) foo(bar) bar"
get_fns(a)
FN 0: fn:function3 with args fn:function2(fn:function1(xargs))
FN 1: fn:function2 with args fn:function1(xargs)
FN 2: fn:function1 with args xargs
=> nil

Regards,
Thomas.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,982
Messages
2,570,185
Members
46,738
Latest member
JinaMacvit

Latest Threads

Top